Add vector memref support for mlir-cpu-runner func return #103

bondhugula · 2019-08-25T21:44:48Z

allow mlir-cpu-runner to execute functions that return memrefs with
vector of f32 in addition to just those with f32.
align memref's being allocated from mlir-cpu-runner (for input/output
args of entry point function) to memref element size boundaries.

Signed-off-by: Uday Bondhugula [email protected]

bondhugula · 2019-08-25T21:50:01Z

I'm posting this mainly to get feedback - perhaps from @ftynse or @joker-eph
Is this commit missing something that's making the second test (func @ crash) fail? Looks like it should just work and I've reduced it to the minimal form. Changing the test to use vector<4xf32> or the memref to 32x32 makes it run fine.

func @crash(%arg2: memref<128x128xvector<8xf32>>) -> memref<128x128xvector<8xf32>> {
  %c0 = constant 0 : index
  %v  = constant dense<1.0> : vector<8xf32>
  store %v, %arg2[%c0, %c0] : memref<128x128xvector<8xf32>>
  return %arg2 : memref<128x128xvector<8xf32>>
}

0.	Program arguments: ../../../build/bin/mlir-cpu-runner -e crash -init-value=1.0 /tmp/out.mlir 
 #0 0x0000000000ae1099 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/uday/llvm-project-bondhugula/llvm/lib/Support/Unix/Signals.inc:532:11
 #1 0x0000000000ae1249 PrintStackTraceSignalHandler(void*) /home/uday/llvm-project-bondhugula/llvm/lib/Support/Unix/Signals.inc:593:1
 #2 0x0000000000adfab6 llvm::sys::RunSignalHandlers() /home/uday/llvm-project-bondhugula/llvm/lib/Support/Signals.cpp:67:5
 #3 0x0000000000ae19ab SignalHandler(int) /home/uday/llvm-project-bondhugula/llvm/lib/Support/Unix/Signals.inc:384:1
 #4 0x00007ff747a3ce80 __restore_rt (/lib64/libpthread.so.0+0x12e80)
 #5 0x00007ff747a9d00f 
 #6 0x00007ff747a9d03a 
 #7 0x000000000073652e compileAndExecute(mlir::ModuleOp, llvm::StringRef, std::function<llvm::Error (llvm::Module*)>, void**) /home/uday/llvm-project-bondhugula/llvm/projects/mlir/lib/Support/JitRunner.cpp:194:10
 #8 0x0000000000735fe9 compileAndExecuteFunctionWithMemRefs(mlir::ModuleOp, llvm::StringRef, std::function<llvm::Error (llvm::Module*)>) /home/uday/llvm-project-bondhugula/llvm/projects/mlir/lib/Support/JitRunner.cpp:232:20
 #9 0x000000000073569a mlir::JitRunnerMain(int, char**, llvm::function_ref<mlir::LogicalResult (mlir::ModuleOp)>) /home/uday/llvm-project-bondhugula/llvm/projects/mlir/lib/Support/JitRunner.cpp:354:7
#10 0x00000000005e6bfb main /home/uday/llvm-project-bondhugula/llvm/projects/mlir/tools/mlir-cpu-runner/mlir-cpu-runner.cpp:27:3
#11 0x00007ff7474d3f33 __libc_start_main (/lib64/libc.so.6+0x23f33)
#12 0x0000000000452c1e _start (../../../build/bin/mlir-cpu-runner+0x452c1e)
Segmentation fault (core dumped)

bondhugula · 2019-08-31T12:30:30Z

I'm posting this mainly to get feedback - perhaps from @ftynse or @joker-eph
Is this commit missing something that's making the second test (func @ crash) fail? Looks like it

The problem here was alignment. MLIR store op's on vector types are being turned into aligned vector store instructions (vmovaps) here -- however, the lowered MLIR alloc's (via malloc's) aren't aligned at those boundaries (AVX 256-bit / 32-byte in this case). The store op lowering for the LLVM dialect isn't adding any alignment attributes, and so the generated LLVM store instructions are assuming 32 here. This leads to a general protection fault since malloc's on x86-64 are 16-byte aligned.

$ mlir-opt -lower-to-llvm    test/mlir-cpu-runner/crash.mlir  | mlir-translate -mlir-to-llvmir  
; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"

declare i8* @malloc(i64)

declare void @free(i8*)

define <8 x float>* @crash(<8 x float>* %0) {
  %2 = getelementptr <8 x float>, <8 x float>* %0, i64 16383
  store <8 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, <8 x float>* %2
  ret <8 x float>* %0
}

$ mlir-opt -lower-to-llvm    test/mlir-cpu-runner/crash.mlir  | mlir-translate -mlir-to-llvmir  | ../../../build/bin/opt -O3  -S
; ModuleID = '<stdin>'
source_filename = "LLVMDialectModule"

; Function Attrs: nofree norecurse nounwind writeonly
define <8 x float>* @crash(<8 x float>* returned %0) local_unnamed_addr #0 {
  %2 = getelementptr <8 x float>, <8 x float>* %0, i64 16383
  store <8 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, <8 x float>* %2, align 32
  ret <8 x float>* %0
}

attributes #0 = { nofree norecurse nounwind writeonly }

$ ../../../build/bin/llvm-objdump --disassemble    test/mlir-cpu-runner/crash.mlir.o 

test/mlir-cpu-runner/crash.mlir.o:	file format ELF64-x86-64

Disassembly of section .text:

0000000000000000 crash:
       0: 48 b8 00 00 00 00 00 00 00 00	movabsq	$0, %rax
       a: c4 e2 7d 18 00               	vbroadcastss	(%rax), %ymm0
       f: c5 fc 29 87 e0 ff 07 00      	vmovaps	%ymm0, 524256(%rdi)
      17: 48 89 f8                     	movq	%rdi, %rax
      1a: c5 f8 77                     	vzeroupper
      1d: c3                           	retq
      1e: 66 90                        	nop

This PR is incomplete without a fix to the lowering.

ftynse · 2019-09-03T22:24:38Z

Sorry for the delay, I was away for some time. It's not the first time there is an alignment problem at the LLVM boundary, so it would be great to come up with a future-proof solution to this. One thing that we may want to have anyway is an alignment attribute/property on the memref type itself.

There are two possible quick fixes if the issue is blocking for you:

add an alignment attribute to alloc op (but this may conflict with Configurable call-back method name for AllocOp and DeallocOp #55 and the related discussion);
force the std->llvm lowering to ensure the allocation is aligned by the vector size, e.g. by allocating (vector size - 1) more bytes and rounding the address.

dcaballe · 2019-09-04T00:19:41Z

add an alignment attribute to alloc op (but this may conflict with Configurable call-back method name for AllocOp and DeallocOp #55 and the related discussion);

If adding an alignment attribute to alloc is not possible right now, another temporary quick fix might be adding align 1 attribute by default to LLVM vector stores and loads. That should result in unaligned vector loads and stores (note that unaligned vector loads/stores on aligned memory addresses don't have performance penalty in later x86 architectures).

bondhugula · 2019-09-04T02:54:52Z

Sorry for the delay, I was away for some time. It's not the first time there is an alignment problem at the LLVM boundary, so it would be great to come up with a future-proof solution to this. One thing that

I actually already implemented two workarounds/fixes. The first one was to add an alignment attribute for all the load/stores; then, setting the alignment to 16 for all load/stores on memrefs that had element type larger than 16 make this work (since GNU malloc already aligns to 16 byte boundaries on x86-64). The second one was to allocate more (vector_size - 1 as you point out) and align the load/stores at element size boundaries (for all memref's with element size larger than malloc's alignment size).

There are a couple of things here.

We'll need to query or assume what alignment malloc provides (for eg. it's 8 bytes on x86 systems, 16 bytes on x86-64) so as to not adjust alignment everywhere. Moreover, adjusting the load/stores to align at the right boundaries impacts other LLVM optimizations:
http://lists.llvm.org/pipermail/llvm-dev/2019-September/134910.html
https://godbolt.org/z/U0o-cy
Some of the memref's may be allocated from outside (i.e., their alloc's aren't being lowered by us); so we can't use the second approach for that, but will have to just leave those load/stores for default alignment if nothing was specified, and LLVM will assume ABI alignment there.

bondhugula · 2019-09-04T03:02:34Z

add an alignment attribute to alloc op (but this may conflict with Configurable call-back method name for AllocOp and DeallocOp #55 and the related discussion);

If adding an alignment attribute to alloc is not possible right now, another temporary quick fix might be adding align 1 attribute by default to LLVM vector stores and loads. That should result in unaligned vector loads and stores (note that unaligned vector loads/stores on aligned memory addresses don't have performance penalty in later x86 architectures).

Yes, this works, but we don't need 'align 1', 'align 16' will work on x86-64. (malloc will already guarantee 16 byte alignment, and so we need to specify this attribute only for memref's with element types larger than 16 bytes; for those smaller, it's all already aligned to element size boundaries).

ftynse · 2019-09-04T11:44:58Z

Some of the memref's may be allocated from outside (i.e., their alloc's aren't being lowered by us); so we can't use the second approach for that, but will have to just leave those load/stores for default alignment if nothing was specified, and LLVM will assume ABI alignment there.

That's why I am thinking alignment should be related to the type. If we have a memref<..., align=4>, we will make the alignment contract explicit to whoever allocates the memory.

@nicolasvasilache @andydavis1 had similar needs

- allow mlir-cpu-runner to execute functions that return memrefs with vector of f32 in addition to just those with f32. - align memref's being allocated from mlir-cpu-runner (for input/output args of entry point function) to memref element size boundaries. Signed-off-by: Uday Bondhugula <[email protected]>

bondhugula · 2019-09-08T19:34:22Z

This PR is incomplete without a fix to the lowering.

This is ready for review now. The lowering of stores/loads/allocs is a separate issue, but this PR updates the allocations from mlir-cpu-runner to align to vector size boundaries. I've made use of posix_memalign, although this isn't used anywhere else in the LLVM codebase - I'm assuming it's portable enough.

joker-eph · 2019-09-08T20:20:10Z

lib/ExecutionEngine/MemRefUtils.cpp

+    size *=  numElements;
+    size_t alignment = llvm::PowerOf2Ceil(numElements * sizeof(float));
+    posix_memalign(reinterpret_cast<void **>(&descriptor->data), alignment,
+                   size * sizeof(float));


Does it work on windows?

Does it work on windows?

I don't have a setup to test this. But I hope someone who has MLIR working on Windows could confirm. There are other alternatives and we could use malloc and shift to align but the code in context becomes relatively messier since we'd have to keep track of it for freeing the memref's.

You won't be able to test it because the LLVM jit doesn't support windows right now, but I can confirm that posix_memalign is not available on windows. The closest thing would be _aligned_malloc.

I don't have a setup to test this. But I hope someone who has MLIR working on Windows could confirm

I kicked the Kokoro builds after asking :)

The windows build failed actually.

I assume you have access to the log from the results below, but here it is:

7297 "T:\src\build\projects\mlir\test\check-mlir.vcxproj" (default target) (1) -> 7298 "T:\src\build\projects\mlir\tools\mlir-cpu-runner\mlir-cpu-runner.vcxproj" (default target) (44) -> 7299 "T:\src\build\projects\mlir\lib\ExecutionEngine\MLIRExecutionEngine.vcxproj" (default target) (98) -> 7300 (ClCompile target) -> 7301 t:\src\github\llvm\llvm\projects\mlir\lib\executionengine\memrefutils.cpp(71): error C3861: 'posix_memalign': identifier not found [T:\src\build\projects\mlir\lib\ExecutionEngine\MLIRExecutionEngine.vcxproj] 7302 7303

Thanks! Would there be a way to write conditionally compiled code so that _aligned_malloc is used for a Windows build? (I didn't see any such examples in MLIR.)

bondhugula · 2019-10-31T14:15:37Z

This is no longer relevant since we don't pass memrefs from outside the cpu runner anymore.

googlebot added the cla: yes label Aug 25, 2019

bondhugula force-pushed the vecrunner branch 2 times, most recently from 08402db to 6b9f512 Compare August 25, 2019 21:50

tatianashp requested review from ftynse and joker-eph August 26, 2019 03:53

bondhugula force-pushed the vecrunner branch from 6b9f512 to 5ce428f Compare September 8, 2019 19:28

bondhugula changed the title ~~[WIP] add vector memref support for mlir-cpu-runner func return~~ Add vector memref support for mlir-cpu-runner func return Sep 8, 2019

joker-eph reviewed Sep 8, 2019

View reviewed changes

joker-eph added the kokoro:run label Sep 8, 2019

kokoro-team removed the kokoro:run label Sep 8, 2019

bondhugula closed this Oct 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vector memref support for mlir-cpu-runner func return #103

Add vector memref support for mlir-cpu-runner func return #103

bondhugula commented Aug 25, 2019 •

edited

Loading

bondhugula commented Aug 25, 2019 •

edited

Loading

bondhugula commented Aug 31, 2019 •

edited

Loading

ftynse commented Sep 3, 2019

dcaballe commented Sep 4, 2019

bondhugula commented Sep 4, 2019

bondhugula commented Sep 4, 2019

ftynse commented Sep 4, 2019 •

edited

Loading

bondhugula commented Sep 8, 2019

joker-eph Sep 8, 2019

bondhugula Sep 8, 2019

River707 Sep 8, 2019

joker-eph Sep 8, 2019

joker-eph Sep 8, 2019

bondhugula Sep 10, 2019

bondhugula commented Oct 31, 2019

Add vector memref support for mlir-cpu-runner func return #103

Add vector memref support for mlir-cpu-runner func return #103

Conversation

bondhugula commented Aug 25, 2019 • edited Loading

bondhugula commented Aug 25, 2019 • edited Loading

bondhugula commented Aug 31, 2019 • edited Loading

ftynse commented Sep 3, 2019

dcaballe commented Sep 4, 2019

bondhugula commented Sep 4, 2019

bondhugula commented Sep 4, 2019

ftynse commented Sep 4, 2019 • edited Loading

bondhugula commented Sep 8, 2019

joker-eph Sep 8, 2019

Choose a reason for hiding this comment

bondhugula Sep 8, 2019

Choose a reason for hiding this comment

River707 Sep 8, 2019

Choose a reason for hiding this comment

joker-eph Sep 8, 2019

Choose a reason for hiding this comment

joker-eph Sep 8, 2019

Choose a reason for hiding this comment

bondhugula Sep 10, 2019

Choose a reason for hiding this comment

bondhugula commented Oct 31, 2019

bondhugula commented Aug 25, 2019 •

edited

Loading

bondhugula commented Aug 25, 2019 •

edited

Loading

bondhugula commented Aug 31, 2019 •

edited

Loading

ftynse commented Sep 4, 2019 •

edited

Loading