Avoid excessive heap utilisation due to in memory creation of md5s#222
Avoid excessive heap utilisation due to in memory creation of md5s#222DanielThomas wants to merge 2 commits intotcurdt:masterfrom
Conversation
|
Thanks for the contribution! I am a little puzzled though - why (even with 100k files) this was a problem. So I assume 100k files, times (random guess) 100 chars per line - that's 10.000.000 chars. That's probably around 20MB of RAM needed. Is that already what you meant by excessive? How much memory usage did you see? I am just wondering if this really was the problem. |
|
Going to set a breakpoint and catch the length, and get a heap dump and tell you exactly what the utilisation is. Certainly in the hundreds of megabytes, due to the length of the paths. |
|
Awesome - thanks! Hundreds of megabytes? That sounds quite fishy. |
|
The final md5sums file is 33M. The StringBuilder will retain double that of course, thanks to Java's 2-byte representation of strings: And two more copies again of the same bytes:
So a little over 220MB I guess. Background is here incidentally (we've got our fair share of heap issues in our Gradle plugin!): |
|
Thanks for digging into this. I guess the two copies are where the problems turns into excessive. I am wondering if we could dial back the crazy by getting rid of those copies. On the other hand in-memory will always hurt scalability. I am not so eager to use temp files - but on the first look the PR looks reasonable. I need to poke around a bit more but I am inclined to accept it. Thanks for your work! (I so need to get started on jdeb2) |
We noticed in a application with > 100K files that we ran into problems while generating the checksums. This writes to a file and streams from that file, to the output stream to avoid heap utilisation during that phase.