Skip to content
This repository was archived by the owner on Apr 23, 2021. It is now read-only.

Add vector memref support for mlir-cpu-runner func return #103

Closed
wants to merge 1 commit into from

Conversation

bondhugula
Copy link
Contributor

@bondhugula bondhugula commented Aug 25, 2019

  • allow mlir-cpu-runner to execute functions that return memrefs with
    vector of f32 in addition to just those with f32.

  • align memref's being allocated from mlir-cpu-runner (for input/output
    args of entry point function) to memref element size boundaries.

Signed-off-by: Uday Bondhugula [email protected]

@bondhugula
Copy link
Contributor Author

bondhugula commented Aug 25, 2019

I'm posting this mainly to get feedback - perhaps from @ftynse or @joker-eph
Is this commit missing something that's making the second test (func @ crash) fail? Looks like it should just work and I've reduced it to the minimal form. Changing the test to use vector<4xf32> or the memref to 32x32 makes it run fine.

func @crash(%arg2: memref<128x128xvector<8xf32>>) -> memref<128x128xvector<8xf32>> {
  %c0 = constant 0 : index
  %v  = constant dense<1.0> : vector<8xf32>
  store %v, %arg2[%c0, %c0] : memref<128x128xvector<8xf32>>
  return %arg2 : memref<128x128xvector<8xf32>>
}
0.	Program arguments: ../../../build/bin/mlir-cpu-runner -e crash -init-value=1.0 /tmp/out.mlir 
 #0 0x0000000000ae1099 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/uday/llvm-project-bondhugula/llvm/lib/Support/Unix/Signals.inc:532:11
 #1 0x0000000000ae1249 PrintStackTraceSignalHandler(void*) /home/uday/llvm-project-bondhugula/llvm/lib/Support/Unix/Signals.inc:593:1
 #2 0x0000000000adfab6 llvm::sys::RunSignalHandlers() /home/uday/llvm-project-bondhugula/llvm/lib/Support/Signals.cpp:67:5
 #3 0x0000000000ae19ab SignalHandler(int) /home/uday/llvm-project-bondhugula/llvm/lib/Support/Unix/Signals.inc:384:1
 #4 0x00007ff747a3ce80 __restore_rt (/lib64/libpthread.so.0+0x12e80)
 #5 0x00007ff747a9d00f 
 #6 0x00007ff747a9d03a 
 #7 0x000000000073652e compileAndExecute(mlir::ModuleOp, llvm::StringRef, std::function<llvm::Error (llvm::Module*)>, void**) /home/uday/llvm-project-bondhugula/llvm/projects/mlir/lib/Support/JitRunner.cpp:194:10
 #8 0x0000000000735fe9 compileAndExecuteFunctionWithMemRefs(mlir::ModuleOp, llvm::StringRef, std::function<llvm::Error (llvm::Module*)>) /home/uday/llvm-project-bondhugula/llvm/projects/mlir/lib/Support/JitRunner.cpp:232:20
 #9 0x000000000073569a mlir::JitRunnerMain(int, char**, llvm::function_ref<mlir::LogicalResult (mlir::ModuleOp)>) /home/uday/llvm-project-bondhugula/llvm/projects/mlir/lib/Support/JitRunner.cpp:354:7
#10 0x00000000005e6bfb main /home/uday/llvm-project-bondhugula/llvm/projects/mlir/tools/mlir-cpu-runner/mlir-cpu-runner.cpp:27:3
#11 0x00007ff7474d3f33 __libc_start_main (/lib64/libc.so.6+0x23f33)
#12 0x0000000000452c1e _start (../../../build/bin/mlir-cpu-runner+0x452c1e)
Segmentation fault (core dumped) 

@bondhugula bondhugula force-pushed the vecrunner branch 2 times, most recently from 08402db to 6b9f512 Compare August 25, 2019 21:50
@tatianashp tatianashp requested review from ftynse and joker-eph August 26, 2019 03:53
@bondhugula
Copy link
Contributor Author

bondhugula commented Aug 31, 2019

I'm posting this mainly to get feedback - perhaps from @ftynse or @joker-eph
Is this commit missing something that's making the second test (func @ crash) fail? Looks like it

The problem here was alignment. MLIR store op's on vector types are being turned into aligned vector store instructions (vmovaps) here -- however, the lowered MLIR alloc's (via malloc's) aren't aligned at those boundaries (AVX 256-bit / 32-byte in this case). The store op lowering for the LLVM dialect isn't adding any alignment attributes, and so the generated LLVM store instructions are assuming 32 here. This leads to a general protection fault since malloc's on x86-64 are 16-byte aligned.

$ mlir-opt -lower-to-llvm    test/mlir-cpu-runner/crash.mlir  | mlir-translate -mlir-to-llvmir  
; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"

declare i8* @malloc(i64)

declare void @free(i8*)

define <8 x float>* @crash(<8 x float>* %0) {
  %2 = getelementptr <8 x float>, <8 x float>* %0, i64 16383
  store <8 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, <8 x float>* %2
  ret <8 x float>* %0
}
$ mlir-opt -lower-to-llvm    test/mlir-cpu-runner/crash.mlir  | mlir-translate -mlir-to-llvmir  | ../../../build/bin/opt -O3  -S
; ModuleID = '<stdin>'
source_filename = "LLVMDialectModule"

; Function Attrs: nofree norecurse nounwind writeonly
define <8 x float>* @crash(<8 x float>* returned %0) local_unnamed_addr #0 {
  %2 = getelementptr <8 x float>, <8 x float>* %0, i64 16383
  store <8 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, <8 x float>* %2, align 32
  ret <8 x float>* %0
}

attributes #0 = { nofree norecurse nounwind writeonly }
$ ../../../build/bin/llvm-objdump --disassemble    test/mlir-cpu-runner/crash.mlir.o 

test/mlir-cpu-runner/crash.mlir.o:	file format ELF64-x86-64

Disassembly of section .text:

0000000000000000 crash:
       0: 48 b8 00 00 00 00 00 00 00 00	movabsq	$0, %rax
       a: c4 e2 7d 18 00               	vbroadcastss	(%rax), %ymm0
       f: c5 fc 29 87 e0 ff 07 00      	vmovaps	%ymm0, 524256(%rdi)
      17: 48 89 f8                     	movq	%rdi, %rax
      1a: c5 f8 77                     	vzeroupper
      1d: c3                           	retq
      1e: 66 90                        	nop

This PR is incomplete without a fix to the lowering.

@ftynse
Copy link
Contributor

ftynse commented Sep 3, 2019

Sorry for the delay, I was away for some time. It's not the first time there is an alignment problem at the LLVM boundary, so it would be great to come up with a future-proof solution to this. One thing that we may want to have anyway is an alignment attribute/property on the memref type itself.

There are two possible quick fixes if the issue is blocking for you:

@dcaballe
Copy link
Contributor

dcaballe commented Sep 4, 2019

If adding an alignment attribute to alloc is not possible right now, another temporary quick fix might be adding align 1 attribute by default to LLVM vector stores and loads. That should result in unaligned vector loads and stores (note that unaligned vector loads/stores on aligned memory addresses don't have performance penalty in later x86 architectures).

@bondhugula
Copy link
Contributor Author

Sorry for the delay, I was away for some time. It's not the first time there is an alignment problem at the LLVM boundary, so it would be great to come up with a future-proof solution to this. One thing that

I actually already implemented two workarounds/fixes. The first one was to add an alignment attribute for all the load/stores; then, setting the alignment to 16 for all load/stores on memrefs that had element type larger than 16 make this work (since GNU malloc already aligns to 16 byte boundaries on x86-64). The second one was to allocate more (vector_size - 1 as you point out) and align the load/stores at element size boundaries (for all memref's with element size larger than malloc's alignment size).

There are a couple of things here.

  1. We'll need to query or assume what alignment malloc provides (for eg. it's 8 bytes on x86 systems, 16 bytes on x86-64) so as to not adjust alignment everywhere. Moreover, adjusting the load/stores to align at the right boundaries impacts other LLVM optimizations:
    http://lists.llvm.org/pipermail/llvm-dev/2019-September/134910.html
    https://godbolt.org/z/U0o-cy

  2. Some of the memref's may be allocated from outside (i.e., their alloc's aren't being lowered by us); so we can't use the second approach for that, but will have to just leave those load/stores for default alignment if nothing was specified, and LLVM will assume ABI alignment there.

@bondhugula
Copy link
Contributor Author

If adding an alignment attribute to alloc is not possible right now, another temporary quick fix might be adding align 1 attribute by default to LLVM vector stores and loads. That should result in unaligned vector loads and stores (note that unaligned vector loads/stores on aligned memory addresses don't have performance penalty in later x86 architectures).

Yes, this works, but we don't need 'align 1', 'align 16' will work on x86-64. (malloc will already guarantee 16 byte alignment, and so we need to specify this attribute only for memref's with element types larger than 16 bytes; for those smaller, it's all already aligned to element size boundaries).

@ftynse
Copy link
Contributor

ftynse commented Sep 4, 2019

Some of the memref's may be allocated from outside (i.e., their alloc's aren't being lowered by us); so we can't use the second approach for that, but will have to just leave those load/stores for default alignment if nothing was specified, and LLVM will assume ABI alignment there.

That's why I am thinking alignment should be related to the type. If we have a memref<..., align=4>, we will make the alignment contract explicit to whoever allocates the memory.

- allow mlir-cpu-runner to execute functions that return memrefs with
  vector of f32 in addition to just those with f32.

- align memref's being allocated from mlir-cpu-runner (for input/output
  args of entry point function) to memref element size boundaries.

Signed-off-by: Uday Bondhugula <[email protected]>
@bondhugula bondhugula changed the title [WIP] add vector memref support for mlir-cpu-runner func return Add vector memref support for mlir-cpu-runner func return Sep 8, 2019
@bondhugula
Copy link
Contributor Author

This PR is incomplete without a fix to the lowering.

This is ready for review now. The lowering of stores/loads/allocs is a separate issue, but this PR updates the allocations from mlir-cpu-runner to align to vector size boundaries. I've made use of posix_memalign, although this isn't used anywhere else in the LLVM codebase - I'm assuming it's portable enough.

size *= numElements;
size_t alignment = llvm::PowerOf2Ceil(numElements * sizeof(float));
posix_memalign(reinterpret_cast<void **>(&descriptor->data), alignment,
size * sizeof(float));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work on windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work on windows?

I don't have a setup to test this. But I hope someone who has MLIR working on Windows could confirm. There are other alternatives and we could use malloc and shift to align but the code in context becomes relatively messier since we'd have to keep track of it for freeing the memref's.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't be able to test it because the LLVM jit doesn't support windows right now, but I can confirm that posix_memalign is not available on windows. The closest thing would be _aligned_malloc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a setup to test this. But I hope someone who has MLIR working on Windows could confirm

I kicked the Kokoro builds after asking :)

The windows build failed actually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you have access to the log from the results below, but here it is:


7297
       "T:\src\build\projects\mlir\test\check-mlir.vcxproj" (default target) (1) ->
7298
       "T:\src\build\projects\mlir\tools\mlir-cpu-runner\mlir-cpu-runner.vcxproj" (default target) (44) ->
7299
       "T:\src\build\projects\mlir\lib\ExecutionEngine\MLIRExecutionEngine.vcxproj" (default target) (98) ->
7300
       (ClCompile target) ->
7301
         t:\src\github\llvm\llvm\projects\mlir\lib\executionengine\memrefutils.cpp(71): error C3861: 'posix_memalign': identifier not found [T:\src\build\projects\mlir\lib\ExecutionEngine\MLIRExecutionEngine.vcxproj]
7302
7303

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Would there be a way to write conditionally compiled code so that _aligned_malloc is used for a Windows build? (I didn't see any such examples in MLIR.)

@bondhugula
Copy link
Contributor Author

This is no longer relevant since we don't pass memrefs from outside the cpu runner anymore.

@bondhugula bondhugula closed this Oct 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants