-
Notifications
You must be signed in to change notification settings - Fork 261
Use alloca instead of malloc for 0-d memref's? Getting scalar replacement working. #65
Comments
Why not emitting just a load to a scalar value and use this value instead of creating a 0-d alloc? Also your examples are a bit contrived: the free() are missing and the code does not produce anything so everything can be DCE'd. |
Because the scalar could be live out of the loop (while being stored to in the loop).
That was just to keep the examples short! :-) Just consider doing scalar replacement on something being reduced - like Y[i] below.
There's no way to represent the scalar replacement of C[i] or Y[i] on the loop forms in MLIR without going through memory (single element memref's). |
Oh right... I got short-sighted by your into in which you mentioned "hoisting a loop invariant load", but it was just a smaller use-case than the general one. LLVM does not seem to be bother much by the malloc though, for instance the following code generate the exact same LLVM IR (after -O2): https://godbolt.org/z/2N4xVl
(That said, it does not mean we can't do better modeling at the MLIR level) |
This sounds pretty much the same kind of problem as the one discussed here #55 (comment). I think we should try to follow the same approach for both. I see the value of turning mallocs into llvm allocas for 0D memrefs and even for other small memrefs. However, this optimization should be optional. We may not want to apply it when mlir alloc is going to be lowered to a custom memory allocator since the memory allocator may expect to have the control of all the allocated memory.
Diego
|
That's right - my initial examples didn't capture the general case.
Thanks very much for trying this out. I was just running mem2reg in isolation, but the -O2 pipeline probably includes a pass that replaces heap allocations with stack ones or something to that effect, after which mem2reg works. I can confirm that opt -O2 does get rid of heap allocations of all single element memref's (not just 0-d memref's, but also memref's like memref<1x1xf32>, etc.), and they are all replaced by scalars. (In all these use cases in context, these single-element memref's do not escape -- they are the trivial copy buffers generated with -affine-data-copy-generate). This looks fine for now as long as one relies on LLVM, but at some point, this could be done in MLIR for a more unified approach. |
Yes, I agree. We can continue the discussion on #55. |
…uences When using a custom mutator (e.g. thrift mutator, similar to LPM) that calls back into libfuzzer's mutations via `LLVMFuzzerMutate`, the mutation sequences needed to achieve new coverage can get prohibitively large. Printing these large sequences has two downsides: 1) It makes the logs hard to understand for a human. 2) The performance cost slows down fuzzing. In this patch I change the `PrintMutationSequence` function to take a max number of entries, to achieve this goal. I also update `PrintStatusForNewUnit` to default to printing only 10 entries, in the default verbosity level (1), requiring the user to set verbosity to 2 if they want the full mutation sequence. For our use case, turning off verbosity is not an option, as that would also disable `PrintStats()` which is very useful for infrastructure that analyzes the logs in realtime. I imagine most users of libfuzzer always want those logs in the default. I built a fuzzer locally with this patch applied to libfuzzer. When running with the default verbosity, I see logs like this: tensorflow#65 NEW cov: 4799 ft: 10443 corp: 41/1447Kb lim: 64000 exec/s: 1 rss: 575Mb L: 28658/62542 MS: 196 Custom-CrossOver-ChangeBit-EraseBytes-ChangeBit-ChangeBit-ChangeBit-CrossOver-ChangeBit-CrossOver- DE: "\xff\xff\xff\x0e"-"\xfe\xff\xff\x7f"-"\xfe\xff\xff\x7f"-"\x17\x00\x00\x00\x00\x00\x00\x00"-"\x00\x00\x00\xf9"-"\xff\xff\xff\xff"-"\xfa\xff\xff\xff"-"\xf7\xff\xff\xff"-"@\xff\xff\xff\xff\xff\xff\xff"-"E\x00"- tensorflow#67 NEW cov: 4810 ft: 10462 corp: 42/1486Kb lim: 64000 exec/s: 1 rss: 577Mb L: 39823/62542 MS: 135 Custom-CopyPart-ShuffleBytes-ShuffleBytes-ChangeBit-ChangeBinInt-EraseBytes-ChangeBit-ChangeBinInt-ChangeBit- DE: "\x01\x00\x00\x00\x00\x00\x01\xf1"-"\x00\x00\x00\x07"-"\x00\x0d"-"\xfd\xff\xff\xff"-"\xfe\xff\xff\xf4"-"\xe3\xff\xff\xff"-"\xff\xff\xff\xf1"-"\xea\xff\xff\xff"-"\x00\x00\x00\xfd"-"\x01\x00\x00\x05"- Staring hard at the logs it's clear that the cap of 10 is applied. When running with verbosity level 2, the logs look like the below: tensorflow#66 NEW cov: 4700 ft: 10188 corp: 37/1186Kb lim: 64000 exec/s: 2 rss: 509Mb L: 47616/61231 MS: 520 Custom-CopyPart-ChangeBinInt-ChangeBit-ChangeByte-EraseBytes-PersAutoDict-CopyPart-ShuffleBytes-ChangeBit-ShuffleBytes-CopyPart-EraseBytes-CopyPart-ChangeBinInt-CopyPart-ChangeByte-ShuffleBytes-ChangeBinInt-ShuffleBytes-ChangeBit-CMP-ShuffleBytes-ChangeBit-CrossOver-ChangeBinInt-ChangeByte-ShuffleBytes-CrossOver-EraseBytes-ChangeBinInt-InsertRepeatedBytes-PersAutoDict-InsertRepeatedBytes-InsertRepeatedBytes-CrossOver-ChangeByte-ShuffleBytes-CopyPart-ShuffleBytes-CopyPart-CrossOver-ChangeBit-ShuffleBytes-CrossOver-PersAutoDict-ChangeByte-ChangeBit-ShuffleBytes-CrossOver-ChangeByte-EraseBytes-CopyPart-ChangeBinInt-PersAutoDict-CrossOver-ShuffleBytes-CrossOver-CrossOver-EraseBytes-CrossOver-EraseBytes-CrossOver-ChangeBit-ChangeBinInt-ChangeByte-EraseBytes-ShuffleBytes-ShuffleBytes-ChangeBit-EraseBytes-ChangeBinInt-ChangeBit-ChangeBinInt-CopyPart-EraseBytes-PersAutoDict-EraseBytes-CopyPart-ChangeBinInt-ChangeByte-CrossOver-ChangeBinInt-ShuffleBytes-PersAutoDict-PersAutoDict-ChangeBinInt-CopyPart-ChangeBinInt-CrossOver-ChangeBit-ChangeBinInt-CopyPart-ChangeByte-ChangeBit-CopyPart-CrossOver-ChangeByte-ChangeBit-ChangeByte-ShuffleBytes-CMP-ChangeBit-CopyPart-ChangeBit-ChangeByte-ChangeBinInt-PersAutoDict-ChangeBinInt-CrossOver-ChangeBinInt-ChangeBit-ChangeBinInt-ChangeBinInt-PersAutoDict-ChangeBinInt-ChangeBinInt-ChangeByte-CopyPart-ShuffleBytes-ChangeByte-ChangeBit-ChangeByte-ChangeByte-EraseBytes-CrossOver-ChangeByte-ChangeByte-EraseBytes-EraseBytes-InsertRepeatedBytes-ShuffleBytes-CopyPart-CopyPart-ChangeBit-ShuffleBytes-PersAutoDict-ShuffleBytes-ChangeBit-ChangeByte-ChangeBit-ShuffleBytes-ChangeByte-ChangeBinInt-CrossOver-ChangeBinInt-ChangeBit-EraseBytes-CopyPart-ChangeByte-CrossOver-EraseBytes-CrossOver-ChangeByte-ShuffleBytes-ChangeByte-ChangeBinInt-CrossOver-ChangeByte-InsertRepeatedBytes-InsertByte-ShuffleBytes-PersAutoDict-ChangeBit-ChangeByte-ChangeBit-ShuffleBytes-ShuffleBytes-CopyPart-ShuffleBytes-EraseBytes-ShuffleBytes-ShuffleBytes-CrossOver-ChangeBinInt-CopyPart-CopyPart-CopyPart-EraseBytes-EraseBytes-ChangeByte-ChangeBinInt-ShuffleBytes-CMP-InsertByte-EraseBytes-ShuffleBytes-CopyPart-ChangeBit-CrossOver-CopyPart-CopyPart-ShuffleBytes-ChangeByte-ChangeByte-ChangeBinInt-EraseBytes-ChangeByte-ChangeBinInt-ChangeBit-ChangeBit-ChangeByte-ShuffleBytes-PersAutoDict-PersAutoDict-CMP-ChangeBit-ShuffleBytes-PersAutoDict-ChangeBinInt-EraseBytes-EraseBytes-ShuffleBytes-ChangeByte-ShuffleBytes-ChangeBit-EraseBytes-CMP-ShuffleBytes-ChangeByte-ChangeBinInt-EraseBytes-ChangeBinInt-ChangeByte-EraseBytes-ChangeByte-CrossOver-ShuffleBytes-EraseBytes-EraseBytes-ShuffleBytes-ChangeBit-EraseBytes-CopyPart-ShuffleBytes-ShuffleBytes-CrossOver-CopyPart-ChangeBinInt-ShuffleBytes-CrossOver-InsertByte-InsertByte-ChangeBinInt-ChangeBinInt-CopyPart-EraseBytes-ShuffleBytes-ChangeBit-ChangeBit-EraseBytes-ChangeByte-ChangeByte-ChangeBinInt-CrossOver-ChangeBinInt-ChangeBinInt-ShuffleBytes-ShuffleBytes-ChangeByte-ChangeByte-ChangeBinInt-ShuffleBytes-CrossOver-EraseBytes-CopyPart-CopyPart-CopyPart-ChangeBit-ShuffleBytes-ChangeByte-EraseBytes-ChangeByte-InsertRepeatedBytes-InsertByte-InsertRepeatedBytes-PersAutoDict-EraseBytes-ShuffleBytes-ChangeByte-ShuffleBytes-ChangeBinInt-ShuffleBytes-ChangeBinInt-ChangeBit-CrossOver-CrossOver-ShuffleBytes-CrossOver-CopyPart-CrossOver-CrossOver-CopyPart-ChangeByte-ChangeByte-CrossOver-ChangeBit-ChangeBinInt-EraseBytes-ShuffleBytes-EraseBytes-CMP-PersAutoDict-PersAutoDict-InsertByte-ChangeBit-ChangeByte-CopyPart-CrossOver-ChangeByte-ChangeBit-ChangeByte-CopyPart-ChangeBinInt-EraseBytes-CrossOver-ChangeBit-CrossOver-PersAutoDict-CrossOver-ChangeByte-CrossOver-ChangeByte-ChangeByte-CrossOver-ShuffleBytes-CopyPart-CopyPart-ShuffleBytes-ChangeByte-ChangeByte-ChangeBinInt-ChangeBinInt-ChangeBinInt-ChangeBinInt-ShuffleBytes-CrossOver-ChangeBinInt-ShuffleBytes-ChangeBit-PersAutoDict-ChangeBinInt-ShuffleBytes-ChangeBinInt-ChangeByte-CrossOver-ChangeBit-CopyPart-ChangeBit-ChangeBit-CopyPart-ChangeByte-PersAutoDict-ChangeBit-ShuffleBytes-ChangeByte-ChangeBit-CrossOver-ChangeByte-CrossOver-ChangeByte-CrossOver-ChangeBit-ChangeByte-ChangeBinInt-PersAutoDict-CopyPart-ChangeBinInt-ChangeBit-CrossOver-ChangeBit-PersAutoDict-ShuffleBytes-EraseBytes-CrossOver-ChangeByte-ChangeBinInt-ShuffleBytes-ChangeBinInt-InsertRepeatedBytes-PersAutoDict-CrossOver-ChangeByte-Custom-PersAutoDict-CopyPart-CopyPart-ChangeBinInt-ShuffleBytes-ChangeBinInt-ChangeBit-ShuffleBytes-CrossOver-CMP-ChangeByte-CopyPart-ShuffleBytes-CopyPart-CopyPart-CrossOver-CrossOver-CrossOver-ShuffleBytes-ChangeByte-ChangeBinInt-ChangeBit-ChangeBit-ChangeBit-ChangeByte-EraseBytes-ChangeByte-ChangeBit-ChangeByte-ChangeByte-CopyPart-PersAutoDict-ChangeBinInt-PersAutoDict-PersAutoDict-PersAutoDict-CopyPart-CopyPart-CrossOver-ChangeByte-ChangeBinInt-ShuffleBytes-ChangeBit-CopyPart-EraseBytes-CopyPart-CopyPart-CrossOver-ChangeByte-EraseBytes-ShuffleBytes-ChangeByte-CopyPart-EraseBytes-CopyPart-CrossOver-ChangeBinInt-ChangeBinInt-InsertByte-ChangeBinInt-ChangeBit-ChangeByte-CopyPart-ChangeByte-EraseBytes-ChangeByte-ChangeBit-ChangeByte-ShuffleBytes-CopyPart-ChangeBinInt-EraseBytes-CrossOver-ChangeBit-ChangeBit-CrossOver-EraseBytes-ChangeBinInt-CopyPart-CopyPart-ChangeBinInt-ChangeBit-EraseBytes-InsertRepeatedBytes-EraseBytes-ChangeBit-CrossOver-CrossOver-EraseBytes-EraseBytes-ChangeByte-CopyPart-CopyPart-ShuffleBytes-ChangeByte-ChangeBit-ChangeByte-EraseBytes-ChangeBit-ChangeByte-ChangeByte-CrossOver-CopyPart-EraseBytes-ChangeByte-EraseBytes-ChangeByte-ShuffleBytes-ShuffleBytes-ChangeByte-CopyPart-ChangeByte-ChangeByte-ChangeBit-CopyPart-ChangeBit-ChangeBinInt-CopyPart-ShuffleBytes-ChangeBit-ChangeBinInt-ChangeBit-EraseBytes-CMP-CrossOver-CopyPart-ChangeBinInt-CrossOver-CrossOver-CopyPart-CrossOver-CrossOver-InsertByte-InsertByte-CopyPart-Custom- DE: "warn"-"\x00\x00\x00\x80"-"\xfe\xff\xff\xfb"-"\xff\xff"-"\x10\x00\x00\x00"-"\xfe\xff\xff\xff"-"\xff\xff\xff\xf6"-"U\x01\x00\x00\x00\x00\x00\x00"-"\xd9\xff\xff\xff"-"\xfe\xff\xff\xea"-"\xf0\xff\xff\xff"-"\xfc\xff\xff\xff"-"warn"-"\xff\xff\xff\xff"-"\xfe\xff\xff\xfb"-"\x00\x00\x00\x80"-"\xfe\xff\xff\xf1"-"\xfe\xff\xff\xea"-"\x00\x00\x00\x00\x00\x00\x012"-"\xe2\x00"-"\xfb\xff\xff\xff"-"\x00\x00\x00\x00"-"\xe9\xff\xff\xff"-"\xff\xff"-"\x00\x00\x00\x80"-"\x01\x00\x04\xc9"-"\xf0\xff\xff\xff"-"\xf9\xff\xff\xff"-"\xff\xff\xff\xff\xff\xff\xff\x12"-"\xe2\x00"-"\xfe\xff\xff\xff"-"\xfe\xff\xff\xea"-"\xff\xff\xff\xff"-"\xf4\xff\xff\xff"-"\xe9\xff\xff\xff"-"\xf1\xff\xff\xff"- tensorflow#48 NEW cov: 4502 ft: 9151 corp: 27/750Kb lim: 64000 exec/s: 2 rss: 458Mb L: 50772/50772 MS: 259 ChangeByte-ShuffleBytes-ChangeBinInt-ChangeByte-ChangeByte-ChangeByte-ChangeByte-ChangeBit-CopyPart-CrossOver-CopyPart-ChangeByte-CrossOver-CopyPart-ChangeBit-ChangeByte-EraseBytes-ChangeByte-CopyPart-CopyPart-CopyPart-ChangeBit-EraseBytes-ChangeBinInt-CrossOver-CopyPart-CrossOver-CopyPart-ChangeBit-ChangeByte-ChangeBit-InsertByte-CrossOver-InsertRepeatedBytes-InsertRepeatedBytes-InsertRepeatedBytes-ChangeBinInt-EraseBytes-InsertRepeatedBytes-InsertByte-ChangeBit-ShuffleBytes-ChangeBit-ChangeBit-CopyPart-ChangeBit-ChangeByte-CrossOver-ChangeBinInt-ChangeByte-CrossOver-CMP-ChangeByte-CrossOver-ChangeByte-ShuffleBytes-ShuffleBytes-ChangeByte-ChangeBinInt-CopyPart-EraseBytes-CrossOver-ChangeBit-ChangeBinInt-InsertByte-ChangeBit-CopyPart-ChangeBinInt-ChangeByte-CrossOver-ChangeBit-EraseBytes-CopyPart-ChangeBinInt-ChangeBit-ChangeBit-ChangeByte-CopyPart-ChangeBinInt-CrossOver-PersAutoDict-ChangeByte-ChangeBit-ChangeByte-ChangeBinInt-ChangeBinInt-EraseBytes-CopyPart-CopyPart-ChangeByte-ChangeByte-EraseBytes-PersAutoDict-CopyPart-ChangeByte-ChangeByte-EraseBytes-CrossOver-CopyPart-CopyPart-CopyPart-ChangeByte-ChangeBit-CMP-CopyPart-ChangeBinInt-ChangeBinInt-CrossOver-ChangeBit-ChangeBit-EraseBytes-ChangeByte-ShuffleBytes-ChangeBit-ChangeBinInt-CMP-InsertRepeatedBytes-CopyPart-Custom-ChangeByte-CrossOver-EraseBytes-ChangeBit-CopyPart-CrossOver-CMP-ShuffleBytes-EraseBytes-CrossOver-PersAutoDict-ChangeByte-CrossOver-CopyPart-CrossOver-CrossOver-ShuffleBytes-ChangeBinInt-CrossOver-ChangeBinInt-ShuffleBytes-PersAutoDict-ChangeByte-EraseBytes-ChangeBit-CrossOver-EraseBytes-CrossOver-ChangeBit-ChangeBinInt-EraseBytes-InsertByte-InsertRepeatedBytes-InsertByte-InsertByte-ChangeByte-ChangeBinInt-ChangeBit-CrossOver-ChangeByte-CrossOver-EraseBytes-ChangeByte-ShuffleBytes-ChangeBit-ChangeBit-ShuffleBytes-CopyPart-ChangeByte-PersAutoDict-ChangeBit-ChangeByte-InsertRepeatedBytes-CMP-CrossOver-ChangeByte-EraseBytes-ShuffleBytes-CrossOver-ShuffleBytes-ChangeBinInt-ChangeBinInt-CopyPart-PersAutoDict-ShuffleBytes-ChangeBit-CopyPart-ShuffleBytes-CopyPart-EraseBytes-ChangeByte-ChangeBit-ChangeBit-ChangeBinInt-ChangeByte-CopyPart-EraseBytes-ChangeBinInt-EraseBytes-EraseBytes-PersAutoDict-CMP-PersAutoDict-CrossOver-CrossOver-ChangeBit-CrossOver-PersAutoDict-CrossOver-CopyPart-ChangeByte-EraseBytes-ChangeByte-ShuffleBytes-ChangeByte-ChangeByte-CrossOver-ChangeBit-EraseBytes-ChangeByte-EraseBytes-ChangeBinInt-CrossOver-CrossOver-EraseBytes-ChangeBinInt-CrossOver-ChangeBit-ShuffleBytes-ChangeBit-ChangeByte-EraseBytes-ChangeBit-CrossOver-CrossOver-CrossOver-ChangeByte-ChangeBit-ShuffleBytes-ChangeBit-ChangeBit-EraseBytes-CrossOver-CrossOver-CopyPart-ShuffleBytes-ChangeByte-ChangeByte-CopyPart-CrossOver-CopyPart-CrossOver-CrossOver-EraseBytes-EraseBytes-ShuffleBytes-InsertRepeatedBytes-ChangeBit-CopyPart-Custom- DE: "\xfe\xff\xff\xfc"-"\x00\x00\x00\x00"-"F\x00"-"\xf3\xff\xff\xff"-"St9exception"-"_\x00\x00\x00"-"\xf6\xff\xff\xff"-"\xfe\xff\xff\xff"-"\x00\x00\x00\x00"-"p\x02\x00\x00\x00\x00\x00\x00"-"\xfe\xff\xff\xfb"-"\xff\xff"-"\xff\xff\xff\xff"-"\x01\x00\x00\x07"-"\xfe\xff\xff\xfe"- These are prohibitively large and of limited value in the default case (when someone is running the fuzzer, not debugging it), in my opinion. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D86658
…uences When using a custom mutator (e.g. thrift mutator, similar to LPM) that calls back into libfuzzer's mutations via `LLVMFuzzerMutate`, the mutation sequences needed to achieve new coverage can get prohibitively large. Printing these large sequences has two downsides: 1) It makes the logs hard to understand for a human. 2) The performance cost slows down fuzzing. In this patch I change the `PrintMutationSequence` function to take a max number of entries, to achieve this goal. I also update `PrintStatusForNewUnit` to default to printing only 10 entries, in the default verbosity level (1), requiring the user to set verbosity to 2 if they want the full mutation sequence. For our use case, turning off verbosity is not an option, as that would also disable `PrintStats()` which is very useful for infrastructure that analyzes the logs in realtime. I imagine most users of libfuzzer always want those logs in the default. I built a fuzzer locally with this patch applied to libfuzzer. When running with the default verbosity, I see logs like this: tensorflow#65 NEW cov: 4799 ft: 10443 corp: 41/1447Kb lim: 64000 exec/s: 1 rss: 575Mb L: 28658/62542 MS: 196 Custom-CrossOver-ChangeBit-EraseBytes-ChangeBit-ChangeBit-ChangeBit-CrossOver-ChangeBit-CrossOver- DE: "\xff\xff\xff\x0e"-"\xfe\xff\xff\x7f"-"\xfe\xff\xff\x7f"-"\x17\x00\x00\x00\x00\x00\x00\x00"-"\x00\x00\x00\xf9"-"\xff\xff\xff\xff"-"\xfa\xff\xff\xff"-"\xf7\xff\xff\xff"-"@\xff\xff\xff\xff\xff\xff\xff"-"E\x00"- tensorflow#67 NEW cov: 4810 ft: 10462 corp: 42/1486Kb lim: 64000 exec/s: 1 rss: 577Mb L: 39823/62542 MS: 135 Custom-CopyPart-ShuffleBytes-ShuffleBytes-ChangeBit-ChangeBinInt-EraseBytes-ChangeBit-ChangeBinInt-ChangeBit- DE: "\x01\x00\x00\x00\x00\x00\x01\xf1"-"\x00\x00\x00\x07"-"\x00\x0d"-"\xfd\xff\xff\xff"-"\xfe\xff\xff\xf4"-"\xe3\xff\xff\xff"-"\xff\xff\xff\xf1"-"\xea\xff\xff\xff"-"\x00\x00\x00\xfd"-"\x01\x00\x00\x05"- Staring hard at the logs it's clear that the cap of 10 is applied. When running with verbosity level 2, the logs look like the below: tensorflow#66 NEW cov: 4700 ft: 10188 corp: 37/1186Kb lim: 64000 exec/s: 2 rss: 509Mb L: 47616/61231 MS: 520 Custom-CopyPart-ChangeBinInt-ChangeBit-ChangeByte-EraseBytes-PersAutoDict-CopyPart-ShuffleBytes-ChangeBit-ShuffleBytes-CopyPart-EraseBytes-CopyPart-ChangeBinInt-CopyPart-ChangeByte-ShuffleBytes-ChangeBinInt-ShuffleBytes-ChangeBit-CMP-ShuffleBytes-ChangeBit-CrossOver-ChangeBinInt-ChangeByte-ShuffleBytes-CrossOver-EraseBytes-ChangeBinInt-InsertRepeatedBytes-PersAutoDict-InsertRepeatedBytes-InsertRepeatedBytes-CrossOver-ChangeByte-ShuffleBytes-CopyPart-ShuffleBytes-CopyPart-CrossOver-ChangeBit-ShuffleBytes-CrossOver-PersAutoDict-ChangeByte-ChangeBit-ShuffleBytes-CrossOver-ChangeByte-EraseBytes-CopyPart-ChangeBinInt-PersAutoDict-CrossOver-ShuffleBytes-CrossOver-CrossOver-EraseBytes-CrossOver-EraseBytes-CrossOver-ChangeBit-ChangeBinInt-ChangeByte-EraseBytes-ShuffleBytes-ShuffleBytes-ChangeBit-EraseBytes-ChangeBinInt-ChangeBit-ChangeBinInt-CopyPart-EraseBytes-PersAutoDict-EraseBytes-CopyPart-ChangeBinInt-ChangeByte-CrossOver-ChangeBinInt-ShuffleBytes-PersAutoDict-PersAutoDict-ChangeBinInt-CopyPart-ChangeBinInt-CrossOver-ChangeBit-ChangeBinInt-CopyPart-ChangeByte-ChangeBit-CopyPart-CrossOver-ChangeByte-ChangeBit-ChangeByte-ShuffleBytes-CMP-ChangeBit-CopyPart-ChangeBit-ChangeByte-ChangeBinInt-PersAutoDict-ChangeBinInt-CrossOver-ChangeBinInt-ChangeBit-ChangeBinInt-ChangeBinInt-PersAutoDict-ChangeBinInt-ChangeBinInt-ChangeByte-CopyPart-ShuffleBytes-ChangeByte-ChangeBit-ChangeByte-ChangeByte-EraseBytes-CrossOver-ChangeByte-ChangeByte-EraseBytes-EraseBytes-InsertRepeatedBytes-ShuffleBytes-CopyPart-CopyPart-ChangeBit-ShuffleBytes-PersAutoDict-ShuffleBytes-ChangeBit-ChangeByte-ChangeBit-ShuffleBytes-ChangeByte-ChangeBinInt-CrossOver-ChangeBinInt-ChangeBit-EraseBytes-CopyPart-ChangeByte-CrossOver-EraseBytes-CrossOver-ChangeByte-ShuffleBytes-ChangeByte-ChangeBinInt-CrossOver-ChangeByte-InsertRepeatedBytes-InsertByte-ShuffleBytes-PersAutoDict-ChangeBit-ChangeByte-ChangeBit-ShuffleBytes-ShuffleBytes-CopyPart-ShuffleBytes-EraseBytes-ShuffleBytes-ShuffleBytes-CrossOver-ChangeBinInt-CopyPart-CopyPart-CopyPart-EraseBytes-EraseBytes-ChangeByte-ChangeBinInt-ShuffleBytes-CMP-InsertByte-EraseBytes-ShuffleBytes-CopyPart-ChangeBit-CrossOver-CopyPart-CopyPart-ShuffleBytes-ChangeByte-ChangeByte-ChangeBinInt-EraseBytes-ChangeByte-ChangeBinInt-ChangeBit-ChangeBit-ChangeByte-ShuffleBytes-PersAutoDict-PersAutoDict-CMP-ChangeBit-ShuffleBytes-PersAutoDict-ChangeBinInt-EraseBytes-EraseBytes-ShuffleBytes-ChangeByte-ShuffleBytes-ChangeBit-EraseBytes-CMP-ShuffleBytes-ChangeByte-ChangeBinInt-EraseBytes-ChangeBinInt-ChangeByte-EraseBytes-ChangeByte-CrossOver-ShuffleBytes-EraseBytes-EraseBytes-ShuffleBytes-ChangeBit-EraseBytes-CopyPart-ShuffleBytes-ShuffleBytes-CrossOver-CopyPart-ChangeBinInt-ShuffleBytes-CrossOver-InsertByte-InsertByte-ChangeBinInt-ChangeBinInt-CopyPart-EraseBytes-ShuffleBytes-ChangeBit-ChangeBit-EraseBytes-ChangeByte-ChangeByte-ChangeBinInt-CrossOver-ChangeBinInt-ChangeBinInt-ShuffleBytes-ShuffleBytes-ChangeByte-ChangeByte-ChangeBinInt-ShuffleBytes-CrossOver-EraseBytes-CopyPart-CopyPart-CopyPart-ChangeBit-ShuffleBytes-ChangeByte-EraseBytes-ChangeByte-InsertRepeatedBytes-InsertByte-InsertRepeatedBytes-PersAutoDict-EraseBytes-ShuffleBytes-ChangeByte-ShuffleBytes-ChangeBinInt-ShuffleBytes-ChangeBinInt-ChangeBit-CrossOver-CrossOver-ShuffleBytes-CrossOver-CopyPart-CrossOver-CrossOver-CopyPart-ChangeByte-ChangeByte-CrossOver-ChangeBit-ChangeBinInt-EraseBytes-ShuffleBytes-EraseBytes-CMP-PersAutoDict-PersAutoDict-InsertByte-ChangeBit-ChangeByte-CopyPart-CrossOver-ChangeByte-ChangeBit-ChangeByte-CopyPart-ChangeBinInt-EraseBytes-CrossOver-ChangeBit-CrossOver-PersAutoDict-CrossOver-ChangeByte-CrossOver-ChangeByte-ChangeByte-CrossOver-ShuffleBytes-CopyPart-CopyPart-ShuffleBytes-ChangeByte-ChangeByte-ChangeBinInt-ChangeBinInt-ChangeBinInt-ChangeBinInt-ShuffleBytes-CrossOver-ChangeBinInt-ShuffleBytes-ChangeBit-PersAutoDict-ChangeBinInt-ShuffleBytes-ChangeBinInt-ChangeByte-CrossOver-ChangeBit-CopyPart-ChangeBit-ChangeBit-CopyPart-ChangeByte-PersAutoDict-ChangeBit-ShuffleBytes-ChangeByte-ChangeBit-CrossOver-ChangeByte-CrossOver-ChangeByte-CrossOver-ChangeBit-ChangeByte-ChangeBinInt-PersAutoDict-CopyPart-ChangeBinInt-ChangeBit-CrossOver-ChangeBit-PersAutoDict-ShuffleBytes-EraseBytes-CrossOver-ChangeByte-ChangeBinInt-ShuffleBytes-ChangeBinInt-InsertRepeatedBytes-PersAutoDict-CrossOver-ChangeByte-Custom-PersAutoDict-CopyPart-CopyPart-ChangeBinInt-ShuffleBytes-ChangeBinInt-ChangeBit-ShuffleBytes-CrossOver-CMP-ChangeByte-CopyPart-ShuffleBytes-CopyPart-CopyPart-CrossOver-CrossOver-CrossOver-ShuffleBytes-ChangeByte-ChangeBinInt-ChangeBit-ChangeBit-ChangeBit-ChangeByte-EraseBytes-ChangeByte-ChangeBit-ChangeByte-ChangeByte-CopyPart-PersAutoDict-ChangeBinInt-PersAutoDict-PersAutoDict-PersAutoDict-CopyPart-CopyPart-CrossOver-ChangeByte-ChangeBinInt-ShuffleBytes-ChangeBit-CopyPart-EraseBytes-CopyPart-CopyPart-CrossOver-ChangeByte-EraseBytes-ShuffleBytes-ChangeByte-CopyPart-EraseBytes-CopyPart-CrossOver-ChangeBinInt-ChangeBinInt-InsertByte-ChangeBinInt-ChangeBit-ChangeByte-CopyPart-ChangeByte-EraseBytes-ChangeByte-ChangeBit-ChangeByte-ShuffleBytes-CopyPart-ChangeBinInt-EraseBytes-CrossOver-ChangeBit-ChangeBit-CrossOver-EraseBytes-ChangeBinInt-CopyPart-CopyPart-ChangeBinInt-ChangeBit-EraseBytes-InsertRepeatedBytes-EraseBytes-ChangeBit-CrossOver-CrossOver-EraseBytes-EraseBytes-ChangeByte-CopyPart-CopyPart-ShuffleBytes-ChangeByte-ChangeBit-ChangeByte-EraseBytes-ChangeBit-ChangeByte-ChangeByte-CrossOver-CopyPart-EraseBytes-ChangeByte-EraseBytes-ChangeByte-ShuffleBytes-ShuffleBytes-ChangeByte-CopyPart-ChangeByte-ChangeByte-ChangeBit-CopyPart-ChangeBit-ChangeBinInt-CopyPart-ShuffleBytes-ChangeBit-ChangeBinInt-ChangeBit-EraseBytes-CMP-CrossOver-CopyPart-ChangeBinInt-CrossOver-CrossOver-CopyPart-CrossOver-CrossOver-InsertByte-InsertByte-CopyPart-Custom- DE: "warn"-"\x00\x00\x00\x80"-"\xfe\xff\xff\xfb"-"\xff\xff"-"\x10\x00\x00\x00"-"\xfe\xff\xff\xff"-"\xff\xff\xff\xf6"-"U\x01\x00\x00\x00\x00\x00\x00"-"\xd9\xff\xff\xff"-"\xfe\xff\xff\xea"-"\xf0\xff\xff\xff"-"\xfc\xff\xff\xff"-"warn"-"\xff\xff\xff\xff"-"\xfe\xff\xff\xfb"-"\x00\x00\x00\x80"-"\xfe\xff\xff\xf1"-"\xfe\xff\xff\xea"-"\x00\x00\x00\x00\x00\x00\x012"-"\xe2\x00"-"\xfb\xff\xff\xff"-"\x00\x00\x00\x00"-"\xe9\xff\xff\xff"-"\xff\xff"-"\x00\x00\x00\x80"-"\x01\x00\x04\xc9"-"\xf0\xff\xff\xff"-"\xf9\xff\xff\xff"-"\xff\xff\xff\xff\xff\xff\xff\x12"-"\xe2\x00"-"\xfe\xff\xff\xff"-"\xfe\xff\xff\xea"-"\xff\xff\xff\xff"-"\xf4\xff\xff\xff"-"\xe9\xff\xff\xff"-"\xf1\xff\xff\xff"- tensorflow#48 NEW cov: 4502 ft: 9151 corp: 27/750Kb lim: 64000 exec/s: 2 rss: 458Mb L: 50772/50772 MS: 259 ChangeByte-ShuffleBytes-ChangeBinInt-ChangeByte-ChangeByte-ChangeByte-ChangeByte-ChangeBit-CopyPart-CrossOver-CopyPart-ChangeByte-CrossOver-CopyPart-ChangeBit-ChangeByte-EraseBytes-ChangeByte-CopyPart-CopyPart-CopyPart-ChangeBit-EraseBytes-ChangeBinInt-CrossOver-CopyPart-CrossOver-CopyPart-ChangeBit-ChangeByte-ChangeBit-InsertByte-CrossOver-InsertRepeatedBytes-InsertRepeatedBytes-InsertRepeatedBytes-ChangeBinInt-EraseBytes-InsertRepeatedBytes-InsertByte-ChangeBit-ShuffleBytes-ChangeBit-ChangeBit-CopyPart-ChangeBit-ChangeByte-CrossOver-ChangeBinInt-ChangeByte-CrossOver-CMP-ChangeByte-CrossOver-ChangeByte-ShuffleBytes-ShuffleBytes-ChangeByte-ChangeBinInt-CopyPart-EraseBytes-CrossOver-ChangeBit-ChangeBinInt-InsertByte-ChangeBit-CopyPart-ChangeBinInt-ChangeByte-CrossOver-ChangeBit-EraseBytes-CopyPart-ChangeBinInt-ChangeBit-ChangeBit-ChangeByte-CopyPart-ChangeBinInt-CrossOver-PersAutoDict-ChangeByte-ChangeBit-ChangeByte-ChangeBinInt-ChangeBinInt-EraseBytes-CopyPart-CopyPart-ChangeByte-ChangeByte-EraseBytes-PersAutoDict-CopyPart-ChangeByte-ChangeByte-EraseBytes-CrossOver-CopyPart-CopyPart-CopyPart-ChangeByte-ChangeBit-CMP-CopyPart-ChangeBinInt-ChangeBinInt-CrossOver-ChangeBit-ChangeBit-EraseBytes-ChangeByte-ShuffleBytes-ChangeBit-ChangeBinInt-CMP-InsertRepeatedBytes-CopyPart-Custom-ChangeByte-CrossOver-EraseBytes-ChangeBit-CopyPart-CrossOver-CMP-ShuffleBytes-EraseBytes-CrossOver-PersAutoDict-ChangeByte-CrossOver-CopyPart-CrossOver-CrossOver-ShuffleBytes-ChangeBinInt-CrossOver-ChangeBinInt-ShuffleBytes-PersAutoDict-ChangeByte-EraseBytes-ChangeBit-CrossOver-EraseBytes-CrossOver-ChangeBit-ChangeBinInt-EraseBytes-InsertByte-InsertRepeatedBytes-InsertByte-InsertByte-ChangeByte-ChangeBinInt-ChangeBit-CrossOver-ChangeByte-CrossOver-EraseBytes-ChangeByte-ShuffleBytes-ChangeBit-ChangeBit-ShuffleBytes-CopyPart-ChangeByte-PersAutoDict-ChangeBit-ChangeByte-InsertRepeatedBytes-CMP-CrossOver-ChangeByte-EraseBytes-ShuffleBytes-CrossOver-ShuffleBytes-ChangeBinInt-ChangeBinInt-CopyPart-PersAutoDict-ShuffleBytes-ChangeBit-CopyPart-ShuffleBytes-CopyPart-EraseBytes-ChangeByte-ChangeBit-ChangeBit-ChangeBinInt-ChangeByte-CopyPart-EraseBytes-ChangeBinInt-EraseBytes-EraseBytes-PersAutoDict-CMP-PersAutoDict-CrossOver-CrossOver-ChangeBit-CrossOver-PersAutoDict-CrossOver-CopyPart-ChangeByte-EraseBytes-ChangeByte-ShuffleBytes-ChangeByte-ChangeByte-CrossOver-ChangeBit-EraseBytes-ChangeByte-EraseBytes-ChangeBinInt-CrossOver-CrossOver-EraseBytes-ChangeBinInt-CrossOver-ChangeBit-ShuffleBytes-ChangeBit-ChangeByte-EraseBytes-ChangeBit-CrossOver-CrossOver-CrossOver-ChangeByte-ChangeBit-ShuffleBytes-ChangeBit-ChangeBit-EraseBytes-CrossOver-CrossOver-CopyPart-ShuffleBytes-ChangeByte-ChangeByte-CopyPart-CrossOver-CopyPart-CrossOver-CrossOver-EraseBytes-EraseBytes-ShuffleBytes-InsertRepeatedBytes-ChangeBit-CopyPart-Custom- DE: "\xfe\xff\xff\xfc"-"\x00\x00\x00\x00"-"F\x00"-"\xf3\xff\xff\xff"-"St9exception"-"_\x00\x00\x00"-"\xf6\xff\xff\xff"-"\xfe\xff\xff\xff"-"\x00\x00\x00\x00"-"p\x02\x00\x00\x00\x00\x00\x00"-"\xfe\xff\xff\xfb"-"\xff\xff"-"\xff\xff\xff\xff"-"\x01\x00\x00\x07"-"\xfe\xff\xff\xfe"- These are prohibitively large and of limited value in the default case (when someone is running the fuzzer, not debugging it), in my opinion. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D86658
One way to hoist loop invariant memref load/store's and replace such subscripted accesses by scalars is by introducing 0-d memref's (load/store from the main memref to those 0-d memref's, then access the latter inside the loop), and rely on LLVM's mem2reg to eliminate the lowered counterparts of those 0-d memref's away, substituting (scalar) SSA values instead. However, currently, the lowering of MLIR's alloc to LLVM's malloc (in the -lower-to-llvm pass) gets in the way, but is trivially fixable as shown below.
Here's simple MLIR obtained after hoisting a loop invariant load (assume that the 0-d memref %s was automatically introduced by a pass/utility that performed the hosting for a loop invariant load on the original %A):
The current lowering to LLVM yields:
(-lower-affine -lower-to-llvm | mlir-translate -mlir-to-llvmir)
'mem2reg' on this doesn't yield the desired replacement (expected?).
However, using an alloca for the single element memref makes the whole thing work:
One option here would be to have AllocOpLowering use 'alloca' for 0-d memrefs (they have a single element by definition) instead of a malloc + bitcast,
mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
Line 432 in 10d0c61
but this would require the lowering to check for lifetimes - a concern that shouldn't be part of the lowering. Instead, one would want to have a pass/utility to replace alloc's with llvm.alloca when appropriate, but this isn't "lowering path neutral" and thus not reusable infra. Should one have a new standard op alloca for stack allocation so that a pass/utility could replace with when appropriate or in the first place generate such alloca's in place of alloc? The current alloc op doesn't say where memory gets allocated from.
(On a side note, another way to accomplish such scalar replacement of accesses is by leveraging support for loop live-out scalars when the IR supports it, but there are still good reasons for doing this via single element memref's and then relying on mem2reg, and also being able to go from one form to the other in MLIR itself.)
The text was updated successfully, but these errors were encountered: