Clang xtensa target #2

gerekon · 2024-11-28T13:48:04Z

This PR implement support for generic Xtensa target in Clang

Structs with delicate packing are often larger in MSVC than Itanium. 099a52f did not make sizeof(InputSection) smaller for MSVC. Just exclude MSVC.

Glue operand is only present if there are variadic register operands, which makes it optional. Also, change the number of fixed operands to 1 (the trap ID).

Wide shift nodes produce two results, not one. Reuse the added type profile to define the standard "shift parts" nodes.

)

It's needed for llvm#116409, which hangs with slow unwind.

…read can hold (llvm#116409) I've run into an issue where TSan can't be used on some code without turning off deadlock detection because a thread tries to hold too many mutexes. It would be preferable to be able to use deadlock detection as that is a major benefit of TSan. Its mentioned in google/sanitizers#950 that the 64 mutex limit was an arbitrary number. I've increased it to 128 and all the tests still pass. Considering the increasing number of cores on CPUs and how programs can now use more threads to take advantage of it, I think raising the limit to 128 would be some good future proofing --------- Co-authored-by: Vitaly Buka <[email protected]>

…ies (llvm#115930) This fixes a bug where variadic segment properties would not be elided when printing `prop-dict`.

…6281) Inferring the ARM64EC target can lead to errors. The `-machine:arm64ec` option may include x86_64 input files, and any valid ARM64EC input is also valid for `-machine:arm64x`. MSVC requires an explicit `-machine` argument with informative diagnostics; this patch adopts the same behavior.

…pInterface` interfaces (llvm#99566) This patch adds the `ConvertToLLVMAttrInterface` and `ConvertToLLVMOpInterface` interfaces. It also modifies the `convert-to-llvm` pass to use these interfaces when available. The `ConvertToLLVMAttrInterface` interfaces allows attributes to configure conversion to LLVM, including the conversion target, LLVM type converter, and populating conversion patterns. See the `NVVMTargetAttr` implementation of this interface for an example of how this interface can be used to configure conversion to LLVM. The `ConvertToLLVMOpInterface` interface collects all convert to LLVM attributes stored in an operation. Finally, the `convert-to-llvm` pass was modified to use these interfaces when available. This allows applying `convert-to-llvm` to GPU modules and letting the `NVVMTargetAttr` decide which patterns to populate.

…version script." (llvm#117444) Commit llvm@eaa0a21 has fixed the build problem already so the change in llvm#117342 does not make sense any more. I am reverting it.

Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.

This test fails on the `clang-x64-windows-msvc` builder: .---command stderr------------ | C:\b\slave\clang-x64-windows-msvc\llvm-project\llvm\test\CodeGen\Hexagon\widen-not-load.ll:7:16: error: CHECK-LABEL: expected string not found in input | ; CHECK-LABEL: test1 | ^ | <stdin>:1:1: note: scanning from here | llc.exe: Unknown command line argument '-debug-only=hexagon-load-store-widening'. Try: 'c:\b\slave\clang-x64-windows-msvc\build\stage1\bin\llc.exe --help' | ^ | <stdin>:1:35: note: possible intended match here | llc.exe: Unknown command line argument '-debug-only=hexagon-load-store-widening'. Try: 'c:\b\slave\clang-x64-windows-msvc\build\stage1\bin\llc.exe --help' | ^

The folded load variants almost never require Port5 for length changing conversions (just for SNB ymm cases), and don't typically use an extra uop for the load. Confirmed with a mixture of Agner + uops.info comparisons.

Add complete IvyBridge schedule (which is included in the SandyBridge model, IvyBridge was the first to support F16C) - split rr/rm schedules as they usually have very different port usage. Haswell/Broadwell use Port1 not Port0. Confirmed with a mixture of Agner + uops.info comparisons.

ELF core debugging fix llvm#117070 broke TestLoadUnload.py tests due to GetModuleSpec call, ProcessGDBRemote fetches modules from remote. Revise the original PR, renamed FindBuildId to FindModuleUUID.

Restructure and slightly simplify code to re-use existing basic blocks.

Allow setting the name to use for the generated IR value of the derived IV in preparations for llvm#112145. This is analogous to VPInstruction::Name.

Existing implementation may trigger infinite cycles when collecting effects above or below the current block after wrapping around a loop-like construct. Limit this case to only looking at the immediate block (loop body). This is correct because wrap around is intended to consider effects of different iterations of the same loop and shouldn't be existing the loop block. Reported-by: Fabian Mora <[email protected]> Co-authored-by: Fabian Mora <[email protected]>

…117481) Detected by misc-use-internal-linkage

…ctions with inputs not signed-extended. (llvm#116764) Two options for clang -mdiv32: Use div.w[u] and mod.w[u] instructions with input not sign-extended. -mno-div32: Do not use div.w[u] and mod.w[u] instructions with input not sign-extended. The default is -mno-div32.

… segments (llvm#92815)" This caused test failures, see comment on the PR: Failed Tests (2): BOLT-Unit :: Core/./CoreTests/AArch64/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0 BOLT-Unit :: Core/./CoreTests/X86/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0 > When a binary has multiple text segments, the Size is computed as the > difference of the last address of these segments from the BaseAddress. > The base addresses of all text segments must be the same. > > Introduces flag 'perf-script-events' for testing. It allows passing perf events > without BOLT having to parse them using 'perf script'. The flag is used to > pass a mock perf profile that has two memory mappings for a mock binary > that has two text segments. The size of the mapping is updated as this > change `parseMMapEvents` processes all text segments. This reverts commit 4b71b37.

…y is FixedVectorType. (llvm#117536)

…lvm#116673) Drop commas from split barrier operations assembly format. Signed-off-by: Victor Perez <[email protected]> Depends on llvm#116648, review ec8d354 only. --------- Signed-off-by: Victor Perez <[email protected]>

The pattern `select %x, true, false => %x` is only valid in case that the return type is identical to the type of `%x` (i.e., i1). Hence, the check `isInteger(1)` was replaced with `isSignlessInteger(1)`. Fixes: llvm#117554

This got recently added to SmallVectorExtras: llvm#117460.

This should act like range. Previously ConstantRangeList assumed a 64-bit range. Now query from the actual entries. This also means that the empty range has no bitwidth, so move asserts to avoid checking the bitwidth of empty ranges.

) llvm#116220 clarified that violations of aliasing metadata are UB. Only set the AA metadata after hoisting a log, if it is guaranteed to execute in the original loop. PR: llvm#117204

…dFieldReferenceExpr (llvm#116965) The original code assumed that only special methods might be defined as defaulted. Since C++20 comparison operators might be defaulted too, and we *do* want to consider those as using the fields of the class. Fixes: llvm#116961

…ot (llvm#117320) The optimiser will produce empty blocks that are unconditionally executed according to the CFG -- while it may not be meaningful code, and won't get a prologue_end position, we need to not crash on this input. The fault comes from assuming that there's always a next block with some instructions in it, that will eventually produce some meaningful control flow to stop at -- in the given reproducer in issue llvm#117206 this isn't true, because the function terminates with `unreachable`. Thus, I've refactored the "get next instruction logic" into a helper that'll step through all blocks and terminate if there aren't any more. Reproducer from aeubanks

…es with mismatched streaming attributes (llvm#116391) If `__attribute__((flatten))` is used on a function, or `[[clang::always_inline]]` on a statement, don't inline any callees with incompatible streaming attributes. Without this check, clang may produce incorrect code when these attributes are used in code with streaming functions. Note: The docs for flatten say it can be ignored when inlining is impossible: "causes calls within the attributed function to be inlined unless it is impossible to do so". Similarly, the (clang-only) `[[clang::always_inline]]` statement attribute is more relaxed than the GNU `__attribute__((always_inline))` (which says it should error it if it can't inline), saying only "If a statement is marked [[clang::always_inline]] and contains calls, the compiler attempts to inline those calls.". The docs also go on to show an example of where `[[clang::always_inline]]` has no effect.

Currently, the Vector dialect TD file includes the following "vector" type definitions: ```mlir def AnyVector : VectorOf<[AnyType]>; def AnyVectorOfAnyRank : VectorOfAnyRankOf<[AnyType]>; def AnyFixedVector : FixedVectorOf<[AnyType]>; def AnyScalableVector : ScalableVectorOf<[AnyType]>; ``` In short: * `AnyVector` _excludes_ 0-D vectors. * `AnyVectorOfAnyRank`, `AnyFixedVector`, and `AnyScalableVector` _include_ 0-D vectors. The naming for "groups" that include 0-D vectors is inconsistent and can be misleading, and `AnyVector` implies that 0-D vectors are included, which is not the case. This patch renames these definitions for clarity: ```mlir def AnyVectorOfNonZeroRank : VectorOfNonZeroRankOf<[AnyType]>; def AnyVectorOfAnyRank : VectorOfAnyRankOf<[AnyType]>; def AnyFixedVectorOfAnyRank : FixedVectorOfAnyRank<[AnyType]>; def AnyScalableVectorOfAnyRank : ScalableVectorOfAnyRank<[AnyType]>; ``` Rationale: * The updated names are more explicit about 0-D vector support. * It becomes clearer that scalable vectors currently allow 0-D vectors - this might warrant a revisit. * The renaming paves the way for adding a new group for "fixed-width vectors excluding 0-D vectors" (e.g., AnyFixedVector), which I plan to introduce in a follow-up patch.

I noticed while working on another test that I never used the PCH trickery to get this to validate that serialization/deserialization works correctly. It DOES, but we weren't testing it with this test like the others.

llvm#117700) This MR fixes failed test `CodeGen/RISCV/compress-opt-select.ll`. It was failed due to previously merged commit `[TTI][RISCV] Unconditionally break critical edges to sink ADDI (PR llvm#108889)`. So, regenerated `compress-opt-select` test.

…lvm#117727) This reverts commit 4866447 as requested by the commit author. Buildbots fail: * https://lab.llvm.org/buildbot/#/builders/164/builds/4945 * https://lab.llvm.org/buildbot/#/builders/52/builds/4021

…` is used (llvm#91524) As described in issue llvm#91518, a previous PR llvm#78484 introduced the `defaultMemorySpaceFn` into bufferization options, allowing one to inform OneShotBufferize that it should use a specified function to derive the memory space attribute from the encoding attribute attached to tensor types. However, introducing this feature exposed unhandled edge cases, examples of which are introduced by this change in the new test under `test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir`. Fixing the inconsistencies introduced by `defaultMemorySpaceFn` is pretty simple. This change: - Updates the `bufferization.to_memref` and `bufferization.to_tensor` operations to explicitly include operand and destination types, whereas previously they relied on type inference to deduce the tensor types. Since the type inference cannot recover the correct tensor encoding/memory space, the operand and result types must be explicitly included. This is a small assembly format change, but it touches a large number of test files. - Makes minor updates to other bufferization functions to handle the changes in building the above ops. - Updates bufferization of `tensor.from_elements` to handle memory space. Integration/upgrade guide: In downstream projects, if you have tests or MLIR files that explicitly use `bufferization.to_tensor` or `bufferization.to_memref`, then update them to the new assembly format as follows: ``` %1 = bufferization.to_memref %0 : memref<10xf32> %2 = bufferization.to_tensor %1 : memref<10xf32> ``` becomes ``` %1 = bufferization.to_memref %0 : tensor<10xf32> to memref<10xf32> %2 = bufferization.to_tensor %0 : memref<10xf32> to tensor<10xf32> ```

In order to align with `svext` and NEON `vext`/`vextq`, this patch changes immediate argument in `svextq` such that it refers to elements of the size of those of the source vector, rather than bytes. The [spec for this intrinsic](https://github.com/ARM-software/acle/blob/main/main/acle.md#extq) is ambiguous about the meaning of this argument, this issue was raised after there was a differing interpretation for it from the implementers of the ACLE in GCC. For example (with our current implementation): `svextq_f64(zn_f64, zm_f64, 1)` would, for each 128-bit segment of `zn_f64,` concatenate the highest 15 bytes of this segment with the first byte of the corresponding segment of `zm_f64`. After this patch, the behavior of `svextq_f64(zn_f64, zm_f64, 1)` would be, for each 128-bit vector segment of `zn_f64`, to concatenate the higher doubleword of this segment with the lower doubleword of the corresponding segment of `zm_f64`. The range of the immediate argument in `svextq` would be modified such that it is: - [0,15] for `svextq_{s8,u8}` - [0,7] for `svextq_{s16,u16,f16,bf16}` - [0,3] for `svextq_{s32,u32,f32}` - [0,1] for `svextq_{s64,u64,f64}`

compress is intented to match vcompress from the ISA manual. Note that deinterleave is a subset of this, and is already tested elsewhere. decompress is the synthetic pattern defined in same - though we can often do better than the mentioned iota/vrgather. Note that some of these can also be expressed as interleave with at least one undef source, and is already tested elsewhere. repeat repeats each input element N times in the output. It can be described as as a interleave operations, but we can sometimes do better lowering wise.

We should leave these for EXPENSIVE_CHECKS builds. Some of these were near the top of slowest tests.

…non build dependent size (llvm#117604) On llvm#110065 the changes to LinuxSigInfo Struct introduced some variables that will differ in size on 32b or 64b. I've rectified this by setting them all to build independent types.

Support SV_GroupID attribute. Translate it into dx.group.id in clang codeGen. Fixes: llvm#70120

This is another clause where the parsing does all the required enforcement besides the construct it appertains to, so this patch removes the restriction and adds sufficient test coverage for combined constructs.

MaskRay and others added 30 commits November 23, 2024 21:42

[ELF] Exclude sizeof(InputSection) to _WIN32

1a2cc2b

Structs with delicate packing are often larger in MSVC than Itanium. 099a52f did not make sizeof(InputSection) smaller for MSVC. Just exclude MSVC.

[nfc][sancov] Remove unnecessary default argument (llvm#117463)

ae01e3a

[nfc][sancov] Remove unnecessary default argument (llvm#117464)

215f3dd

[AMDGPU] Fix AMDGPUISD::TRAP description (llvm#117453)

8d65073

Glue operand is only present if there are variadic register operands, which makes it optional. Also, change the number of fixed operands to 1 (the trap ID).

[AVR] Fix shift node descriptions (llvm#117456)

c85c77c

Wide shift nodes produce two results, not one. Reuse the added type profile to define the standard "shift parts" nodes.

[clang][analysis][NFC]place the comment to correct position (llvm#117467

68a48ec

)

[ELF] Simplify reportMissingFeature. NFC

eb4d2f2

[tsan] Unwind for CHECK according to fast_unwind_on_fatal (llvm#117470)

5fa0345

It's needed for llvm#116409, which hangs with slow unwind.

[mlir][ods] Fix missing property elision for variadic segment propert…

2af6ddb

…ies (llvm#115930) This fixes a bug where variadic segment properties would not be elided when printing `prop-dict`.

[bazel] Port 776476c

bd7d6c8

[bazel] Add missing dependencies for a0ef12c

c4d656a

[clang-tidy][NFC] fix release note order (llvm#117484)

0c21ed4

Revert "[AIX] Fix AIX BuildBot failure as AIX linker doesn't support …

08bf901

…version script." (llvm#117444) Commit llvm@eaa0a21 has fixed the build problem already so the change in llvm#117342 does not make sense any more. I am reverting it.

[X86] Split rr/rm CVT schedules on SNB/HSW/BDW (llvm#117494)

6cfaddf

The folded load variants almost never require Port5 for length changing conversions (just for SNB ymm cases), and don't typically use an extra uop for the load. Confirmed with a mixture of Agner + uops.info comparisons.

[ELF] Simplif reportUndefinedSymbol. NFC

ff97b28

[lldb] Fix TestLoadUnload.py (llvm#117416)

c2ffb42

ELF core debugging fix llvm#117070 broke TestLoadUnload.py tests due to GetModuleSpec call, ProcessGDBRemote fetches modules from remote. Revise the original PR, renamed FindBuildId to FindModuleUUID.

[VPlan] Simplify code to re-use existing basic blocks (NFCI).

0dbdc6d

Restructure and slightly simplify code to re-use existing basic blocks.

[ELF] Change getLocation to use ELFSyncStream. NFC

d8495ed

[test] Improve symbol-location.s to check --defsym

360718f

[ELF] isCompatile: avoid a toStr and 2 ErrAlways

c790d6f

[ELF] Avoid some toStr and ErrAlways

c4dc5ed

[ELF] Remove unneeded Twine in ELFSyncStream

1cd6275

[VPlan] Allow setting IR name for VPDerivedIVRecipe (NFCI).

590f451

Allow setting the name to use for the generated IR value of the derived IV in preparations for llvm#112145. This is analogous to VPInstruction::Name.

HerrCai0907 and others added 29 commits November 26, 2024 21:50

[clang][analysis][NFC]add static for internal linkage function (llvm#…

65c3617

…117481) Detected by misc-use-internal-linkage

[InstCombine] Add alias.scope & noalias metadata to test.

46fcdbb

[SLP][REVEC] getScalarizationOverhead should not be used when ScalarT…

ead3a2f

…y is FixedVectorType. (llvm#117536)

[MLIR][Arith] SelectOp fix invalid folding (llvm#117555)

619e4b7

The pattern `select %x, true, false => %x` is only valid in case that the return type is identical to the type of `%x` (i.e., i1). Hence, the check `isInteger(1)` was replaced with `isSignlessInteger(1)`. Fixes: llvm#117554

[mlir] Use llvm::filter_to_vector. NFC. (llvm#117655)

f4d7586

This got recently added to SmallVectorExtras: llvm#117460.

[LICM] Only set AA metadata on hoisted load if it executes. (llvm#117204

ab6677e

) llvm#116220 clarified that violations of aliasing metadata are UB. Only set the AA metadata after hoisting a log, if it is guaranteed to execute in the original loop. PR: llvm#117204

[OpenACC][NFC] Update varlist-ast test to check serialization

7577284

I noticed while working on another test that I never used the PCH trickery to get this to validate that serialization/deserialization works correctly. It DOES, but we weren't testing it with this test like the others.

Revert "[Clang] Fix name lookup for dependent bases (llvm#114978)" (l…

f7dc1d0

…lvm#117727) This reverts commit 4866447 as requested by the commit author. Buildbots fail: * https://lab.llvm.org/buildbot/#/builders/164/builds/4945 * https://lab.llvm.org/buildbot/#/builders/52/builds/4021

Fix return value of 'PluginManager::RegisterPlugin()'. (llvm#114120)

86f7f08

[InstCombine] Add tests for llvm#113301 (NFC)

88cff86

[Clang][NFC] Remove trailing whitespace from Attr{,Docs}.td

bf440f7

[NFC][Test] Fix PowerPC test gcov_ctr_ref_init.ll (llvm#117577)

b1a34b8

AMDGPU: Remove some -verify-machineinstrs from tests (llvm#117736)

5a3299a

We should leave these for EXPENSIVE_CHECKS builds. Some of these were near the top of slowest tests.

[HLSL] Implement SV_GroupID semantic (llvm#115911)

5fd4f32

Support SV_GroupID attribute. Translate it into dx.group.id in clang codeGen. Fixes: llvm#70120

[OpenACC] Implement 'present' for combined constructs.

78c7024

This is another clause where the parsing does all the required enforcement besides the construct it appertains to, so this patch removes the restriction and adds sufficient test coverage for combined constructs.

[Clang][Xtensa] Add Xtensa target.

bf04885

gerekon closed this Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clang xtensa target #2

Clang xtensa target #2

Uh oh!

gerekon commented Nov 28, 2024

Uh oh!

Clang xtensa target #2

Clang xtensa target #2

Uh oh!

Conversation

gerekon commented Nov 28, 2024

Uh oh!