-
Notifications
You must be signed in to change notification settings - Fork 260
Add vector memref support for mlir-cpu-runner func return #103
Conversation
I'm posting this mainly to get feedback - perhaps from @ftynse or @joker-eph
|
08402db
to
6b9f512
Compare
The problem here was alignment. MLIR store op's on vector types are being turned into aligned vector store instructions (vmovaps) here -- however, the lowered MLIR alloc's (via malloc's) aren't aligned at those boundaries (AVX 256-bit / 32-byte in this case). The store op lowering for the LLVM dialect isn't adding any alignment attributes, and so the generated LLVM store instructions are assuming 32 here. This leads to a general protection fault since malloc's on x86-64 are 16-byte aligned.
This PR is incomplete without a fix to the lowering. |
Sorry for the delay, I was away for some time. It's not the first time there is an alignment problem at the LLVM boundary, so it would be great to come up with a future-proof solution to this. One thing that we may want to have anyway is an alignment attribute/property on the memref type itself. There are two possible quick fixes if the issue is blocking for you:
|
If adding an alignment attribute to |
I actually already implemented two workarounds/fixes. The first one was to add an alignment attribute for all the load/stores; then, setting the alignment to 16 for all load/stores on memrefs that had element type larger than 16 make this work (since GNU malloc already aligns to 16 byte boundaries on x86-64). The second one was to allocate more (vector_size - 1 as you point out) and align the load/stores at element size boundaries (for all memref's with element size larger than malloc's alignment size). There are a couple of things here.
|
Yes, this works, but we don't need 'align 1', 'align 16' will work on x86-64. (malloc will already guarantee 16 byte alignment, and so we need to specify this attribute only for memref's with element types larger than 16 bytes; for those smaller, it's all already aligned to element size boundaries). |
That's why I am thinking alignment should be related to the type. If we have a memref<..., align=4>, we will make the alignment contract explicit to whoever allocates the memory.
|
- allow mlir-cpu-runner to execute functions that return memrefs with vector of f32 in addition to just those with f32. - align memref's being allocated from mlir-cpu-runner (for input/output args of entry point function) to memref element size boundaries. Signed-off-by: Uday Bondhugula <[email protected]>
6b9f512
to
5ce428f
Compare
This is ready for review now. The lowering of stores/loads/allocs is a separate issue, but this PR updates the allocations from mlir-cpu-runner to align to vector size boundaries. I've made use of posix_memalign, although this isn't used anywhere else in the LLVM codebase - I'm assuming it's portable enough. |
size *= numElements; | ||
size_t alignment = llvm::PowerOf2Ceil(numElements * sizeof(float)); | ||
posix_memalign(reinterpret_cast<void **>(&descriptor->data), alignment, | ||
size * sizeof(float)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work on windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work on windows?
I don't have a setup to test this. But I hope someone who has MLIR working on Windows could confirm. There are other alternatives and we could use malloc and shift to align but the code in context becomes relatively messier since we'd have to keep track of it for freeing the memref's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You won't be able to test it because the LLVM jit doesn't support windows right now, but I can confirm that posix_memalign is not available on windows. The closest thing would be _aligned_malloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a setup to test this. But I hope someone who has MLIR working on Windows could confirm
I kicked the Kokoro builds after asking :)
The windows build failed actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you have access to the log from the results below, but here it is:
7297
"T:\src\build\projects\mlir\test\check-mlir.vcxproj" (default target) (1) ->
7298
"T:\src\build\projects\mlir\tools\mlir-cpu-runner\mlir-cpu-runner.vcxproj" (default target) (44) ->
7299
"T:\src\build\projects\mlir\lib\ExecutionEngine\MLIRExecutionEngine.vcxproj" (default target) (98) ->
7300
(ClCompile target) ->
7301
t:\src\github\llvm\llvm\projects\mlir\lib\executionengine\memrefutils.cpp(71): error C3861: 'posix_memalign': identifier not found [T:\src\build\projects\mlir\lib\ExecutionEngine\MLIRExecutionEngine.vcxproj]
7302
7303
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Would there be a way to write conditionally compiled code so that _aligned_malloc is used for a Windows build? (I didn't see any such examples in MLIR.)
This is no longer relevant since we don't pass memrefs from outside the cpu runner anymore. |
allow mlir-cpu-runner to execute functions that return memrefs with
vector of f32 in addition to just those with f32.
align memref's being allocated from mlir-cpu-runner (for input/output
args of entry point function) to memref element size boundaries.
Signed-off-by: Uday Bondhugula [email protected]