Skip to content

Clang xtensa target #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10,000 commits into from
Closed

Conversation

gerekon
Copy link

@gerekon gerekon commented Nov 28, 2024

This PR implement support for generic Xtensa target in Clang

MaskRay and others added 30 commits November 23, 2024 21:42
Structs with delicate packing are often larger in MSVC than Itanium.
099a52f did not make
sizeof(InputSection) smaller for MSVC. Just exclude MSVC.
Glue operand is only present if there are variadic register operands,
which makes it optional.
Also, change the number of fixed operands to 1 (the trap ID).
Wide shift nodes produce two results, not one.
Reuse the added type profile to define the standard "shift parts" nodes.
…read can hold (llvm#116409)

I've run into an issue where TSan can't be used on some code without
turning off deadlock detection because a thread tries to hold too many
mutexes. It would be preferable to be able to use deadlock detection as
that is a major benefit of TSan.

Its mentioned in google/sanitizers#950 that
the 64 mutex limit was an arbitrary number. I've increased it to 128 and
all the tests still pass. Considering the increasing number of cores on
CPUs and how programs can now use more threads to take advantage of it,
I think raising the limit to 128 would be some good future proofing

---------

Co-authored-by: Vitaly Buka <[email protected]>
…ies (llvm#115930)

This fixes a bug where variadic segment properties would not be elided
when printing `prop-dict`.
…6281)

Inferring the ARM64EC target can lead to errors. The `-machine:arm64ec`
option may include x86_64 input files, and any valid ARM64EC input is
also valid for `-machine:arm64x`. MSVC requires an explicit `-machine`
argument with informative diagnostics; this patch adopts the same
behavior.
…pInterface` interfaces (llvm#99566)

This patch adds the `ConvertToLLVMAttrInterface` and
`ConvertToLLVMOpInterface` interfaces. It also modifies the
`convert-to-llvm` pass to use these interfaces when available.

The `ConvertToLLVMAttrInterface` interfaces allows attributes to
configure conversion to LLVM, including the conversion target, LLVM type
converter, and populating conversion patterns. See the `NVVMTargetAttr`
implementation of this interface for an example of how this interface
can be used to configure conversion to LLVM.

The `ConvertToLLVMOpInterface` interface collects all convert to LLVM
attributes stored in an operation.

Finally, the `convert-to-llvm` pass was modified to use these interfaces
when available. This allows applying `convert-to-llvm` to GPU modules
and letting the `NVVMTargetAttr` decide which patterns to populate.
…version script." (llvm#117444)

Commit
llvm@eaa0a21
has fixed the build problem already so the change in
llvm#117342 does not make sense any more. I am reverting
it.
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
This test fails on the `clang-x64-windows-msvc` builder:

      .---command stderr------------
|
C:\b\slave\clang-x64-windows-msvc\llvm-project\llvm\test\CodeGen\Hexagon\widen-not-load.ll:7:16:
error: CHECK-LABEL: expected string not found in input
      | ; CHECK-LABEL: test1
      |                ^
      | <stdin>:1:1: note: scanning from here
| llc.exe: Unknown command line argument
'-debug-only=hexagon-load-store-widening'. Try:
'c:\b\slave\clang-x64-windows-msvc\build\stage1\bin\llc.exe --help'
      | ^
      | <stdin>:1:35: note: possible intended match here
| llc.exe: Unknown command line argument
'-debug-only=hexagon-load-store-widening'. Try:
'c:\b\slave\clang-x64-windows-msvc\build\stage1\bin\llc.exe --help'
      |                                   ^
The folded load variants almost never require Port5 for length changing conversions (just for SNB ymm cases), and don't typically use an extra uop for the load.

Confirmed with a mixture of Agner + uops.info comparisons.
Add complete IvyBridge schedule (which is included in the SandyBridge model, IvyBridge was the first to support F16C) - split rr/rm schedules as they usually have very different port usage.

Haswell/Broadwell use Port1 not Port0.

Confirmed with a mixture of Agner + uops.info comparisons.
ELF core debugging fix llvm#117070 broke TestLoadUnload.py tests due to
GetModuleSpec call, ProcessGDBRemote fetches modules from remote. Revise
the original PR, renamed FindBuildId to FindModuleUUID.
Restructure and slightly simplify code to re-use existing basic blocks.
Allow setting the name to use for the generated IR value of the derived
IV in preparations for llvm#112145.

This is analogous to VPInstruction::Name.
Existing implementation may trigger infinite cycles when collecting
effects above or below the current block after wrapping around a
loop-like construct. Limit this case to only looking at the immediate
block (loop body). This is correct because wrap around is intended to
consider effects of different iterations of the same loop and shouldn't
be existing the loop block.

Reported-by: Fabian Mora <[email protected]>
Co-authored-by: Fabian Mora <[email protected]>
HerrCai0907 and others added 29 commits November 26, 2024 21:50
…ctions with inputs not signed-extended. (llvm#116764)

Two options for clang
-mdiv32: Use div.w[u] and mod.w[u] instructions with input not
sign-extended.
-mno-div32: Do not use div.w[u] and mod.w[u] instructions with input not
sign-extended.
The default is -mno-div32.
… segments (llvm#92815)"

This caused test failures, see comment on the PR:

  Failed Tests (2):
    BOLT-Unit :: Core/./CoreTests/AArch64/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0
    BOLT-Unit :: Core/./CoreTests/X86/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0

> When a binary has multiple text segments, the Size is computed as the
> difference of the last address of these segments from the BaseAddress.
> The base addresses of all text segments must be the same.
>
> Introduces flag 'perf-script-events' for testing. It allows passing perf events
> without BOLT having to parse them using 'perf script'. The flag is used to
> pass a mock perf profile that has two memory mappings for a mock binary
> that has two text segments. The size of the mapping is updated as this
> change `parseMMapEvents` processes all text segments.

This reverts commit 4b71b37.
…lvm#116673)

Drop commas from split barrier operations assembly format.

Signed-off-by: Victor Perez <[email protected]>


Depends on llvm#116648, review ec8d354
only.

---------

Signed-off-by: Victor Perez <[email protected]>
The pattern `select %x, true, false => %x` is only valid in case that
the return type is identical to the type of `%x` (i.e., i1). Hence, the
check `isInteger(1)` was replaced with `isSignlessInteger(1)`.

Fixes: llvm#117554
This got recently added to SmallVectorExtras:
llvm#117460.
This should act like range.

Previously ConstantRangeList assumed a 64-bit range. Now query from the
actual entries. This also means that the empty range has no bitwidth, so
move asserts to avoid checking the bitwidth of empty ranges.
)

llvm#116220 clarified that
violations of aliasing metadata are UB.

Only set the AA metadata after hoisting a log, if it is guaranteed to
execute in the original loop.

PR: llvm#117204
…dFieldReferenceExpr (llvm#116965)

The original code assumed that only special methods might be defined as
defaulted. Since C++20 comparison operators might be defaulted too, and
we *do* want to consider those as using the fields of the class.

Fixes: llvm#116961
…ot (llvm#117320)

The optimiser will produce empty blocks that are unconditionally
executed according to the CFG -- while it may not be meaningful code,
and won't get a prologue_end position, we need to not crash on this
input.

The fault comes from assuming that there's always a next block with some
instructions in it, that will eventually produce some meaningful control
flow to stop at -- in the given reproducer in issue llvm#117206 this isn't
true, because the function terminates with `unreachable`. Thus, I've
refactored the "get next instruction logic" into a helper that'll step
through all blocks and terminate if there aren't any more.

Reproducer from aeubanks
…es with mismatched streaming attributes (llvm#116391)

If `__attribute__((flatten))` is used on a function, or
`[[clang::always_inline]]` on a statement, don't inline any callees with
incompatible streaming attributes. Without this check, clang may produce
incorrect code when these attributes are used in code with streaming
functions.

Note: The docs for flatten say it can be ignored when inlining is
impossible: "causes calls within the attributed function to be inlined
unless it is impossible to do so".

Similarly, the (clang-only) `[[clang::always_inline]]` statement
attribute is more relaxed than the GNU `__attribute__((always_inline))`
(which says it should error it if it can't inline), saying only "If a
statement is marked [[clang::always_inline]] and contains calls, the
compiler attempts to inline those calls.". The docs also go on to show
an example of where `[[clang::always_inline]]` has no effect.
Currently, the Vector dialect TD file includes the following "vector"
type definitions:

```mlir
def AnyVector : VectorOf<[AnyType]>;
def AnyVectorOfAnyRank : VectorOfAnyRankOf<[AnyType]>;
def AnyFixedVector : FixedVectorOf<[AnyType]>;
def AnyScalableVector : ScalableVectorOf<[AnyType]>;
```

In short:

  * `AnyVector` _excludes_ 0-D vectors.
  * `AnyVectorOfAnyRank`, `AnyFixedVector`, and `AnyScalableVector`
    _include_ 0-D vectors.

The naming for "groups" that include 0-D vectors is inconsistent and can
be misleading, and `AnyVector` implies that 0-D vectors are included,
which is not the case.

This patch renames these definitions for clarity:

```mlir
def AnyVectorOfNonZeroRank : VectorOfNonZeroRankOf<[AnyType]>;
def AnyVectorOfAnyRank : VectorOfAnyRankOf<[AnyType]>;
def AnyFixedVectorOfAnyRank : FixedVectorOfAnyRank<[AnyType]>;
def AnyScalableVectorOfAnyRank : ScalableVectorOfAnyRank<[AnyType]>;
```

Rationale:
* The updated names are more explicit about 0-D vector support.
* It becomes clearer that scalable vectors currently allow 0-D vectors -
  this might warrant a revisit.
* The renaming paves the way for adding a new group for "fixed-width
  vectors excluding 0-D vectors" (e.g., AnyFixedVector), which I plan to
  introduce in a follow-up patch.
I noticed while working on another test that I never used the PCH
trickery to get this to validate that serialization/deserialization
works correctly.  It DOES, but we weren't testing it with this test like
the others.
llvm#117700)

This MR fixes failed test `CodeGen/RISCV/compress-opt-select.ll`.

It was failed due to previously merged commit `[TTI][RISCV]
Unconditionally break critical edges to sink ADDI (PR llvm#108889)`.

So, regenerated `compress-opt-select` test.
…` is used (llvm#91524)

As described in issue llvm#91518, a previous PR
llvm#78484 introduced the `defaultMemorySpaceFn` into
bufferization options, allowing one to inform OneShotBufferize that it
should use a specified function to derive the memory space attribute
from the encoding attribute attached to tensor types.

However, introducing this feature exposed unhandled edge cases,
examples of which are introduced by this change in the new test under

`test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir`.

Fixing the inconsistencies introduced by `defaultMemorySpaceFn` is
pretty simple. This change:

- Updates the `bufferization.to_memref` and `bufferization.to_tensor`
  operations to explicitly include operand and destination types,
  whereas previously they relied on type inference to deduce the
  tensor types. Since the type inference cannot recover the correct
  tensor encoding/memory space, the operand and result types must be
  explicitly included. This is a small assembly format change, but it
  touches a large number of test files.

- Makes minor updates to other bufferization functions to handle the
  changes in building the above ops.

- Updates bufferization of `tensor.from_elements` to handle memory
  space.


Integration/upgrade guide:

In downstream projects, if you have tests or MLIR files that explicitly
use
`bufferization.to_tensor` or `bufferization.to_memref`, then update
them to the new assembly format as follows:

```
%1 = bufferization.to_memref %0 : memref<10xf32>
%2 = bufferization.to_tensor %1 : memref<10xf32>
```

becomes

```
%1 = bufferization.to_memref %0 : tensor<10xf32> to memref<10xf32>
%2 = bufferization.to_tensor %0 : memref<10xf32> to tensor<10xf32> 
```
In order to align with `svext` and NEON `vext`/`vextq`, this patch
changes immediate argument in `svextq` such that it refers to elements
of the size of those of the source vector, rather than bytes. The [spec
for this
intrinsic](https://github.com/ARM-software/acle/blob/main/main/acle.md#extq)
is ambiguous about the meaning of this argument, this issue was raised
after there was a differing interpretation for it from the implementers
of the ACLE in GCC.

For example (with our current implementation):

`svextq_f64(zn_f64, zm_f64, 1)` would, for each 128-bit segment of
`zn_f64,` concatenate the highest 15 bytes of this segment with the
first byte of the corresponding segment of `zm_f64`.

After this patch, the behavior of `svextq_f64(zn_f64, zm_f64, 1)` would
be, for each 128-bit vector segment of `zn_f64`, to concatenate the
higher doubleword of this segment with the lower doubleword of the
corresponding segment of `zm_f64`.

The range of the immediate argument in `svextq` would be modified such
that it is:
- [0,15] for `svextq_{s8,u8}`
- [0,7] for `svextq_{s16,u16,f16,bf16}`
- [0,3] for `svextq_{s32,u32,f32}`
- [0,1] for `svextq_{s64,u64,f64}`
compress is intented to match vcompress from the ISA manual. Note that
  deinterleave is a subset of this, and is already tested elsewhere.

decompress is the synthetic pattern defined in same - though we can often
  do better than the mentioned iota/vrgather.  Note that some of these
  can also be expressed as interleave with at least one undef source,
  and is already tested elsewhere.

repeat repeats each input element N times in the output.  It can be
  described as as a interleave operations, but we can sometimes do
  better lowering wise.
We should leave these for EXPENSIVE_CHECKS builds. Some of these
were near the top of slowest tests.
…non build dependent size (llvm#117604)

On llvm#110065 the changes to LinuxSigInfo Struct introduced some variables
that will differ in size on 32b or 64b. I've rectified this by setting
them all to build independent types.
Support SV_GroupID attribute.
Translate it into dx.group.id in clang codeGen.

Fixes: llvm#70120
This is another clause where the parsing does all the required
enforcement besides the construct it appertains to, so this patch
removes the restriction and adds sufficient test coverage for combined
constructs.
@gerekon gerekon closed this Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet