Avoid excessive heap utilisation due to in memory creation of md5s by DanielThomas · Pull Request #222 · tcurdt/jdeb

DanielThomas · 2015-12-04T00:36:47Z

We noticed in a application with > 100K files that we ran into problems while generating the checksums. This writes to a file and streams from that file, to the output stream to avoid heap utilisation during that phase.

tcurdt · 2015-12-04T01:01:58Z

Thanks for the contribution! I am a little puzzled though - why (even with 100k files) this was a problem. So I assume 100k files, times (random guess) 100 chars per line - that's 10.000.000 chars. That's probably around 20MB of RAM needed. Is that already what you meant by excessive? How much memory usage did you see? I am just wondering if this really was the problem.

DanielThomas · 2015-12-04T01:11:34Z

Going to set a breakpoint and catch the length, and get a heap dump and tell you exactly what the utilisation is. Certainly in the hundreds of megabytes, due to the length of the paths.

tcurdt · 2015-12-04T01:23:55Z

Awesome - thanks!

Hundreds of megabytes? That sounds quite fishy.
Actually - maybe you could print out the file size of the temp file?
Or even better provide the file - be it obfuscate (e.g. with a simple tr) ?

DanielThomas · 2015-12-04T01:46:37Z

The final md5sums file is 33M. The StringBuilder will retain double that of course, thanks to Java's 2-byte representation of strings:

java.lang.StringBuilder [JNI Global, Stack Local ← checksums, md5s] 75497512

And two more copies again of the same bytes:

checksums.toString() @ ControlBuilder:147
pContent.getBytes("UTF-8") @ ControlBuilder:212

So a little over 220MB I guess. Background is here incidentally (we've got our fair share of heap issues in our Gradle plugin!):

nebula-plugins/gradle-ospackage-plugin#142

tcurdt · 2015-12-04T10:50:51Z

Thanks for digging into this. I guess the two copies are where the problems turns into excessive. I am wondering if we could dial back the crazy by getting rid of those copies. On the other hand in-memory will always hurt scalability.

I am not so eager to use temp files - but on the first look the PR looks reasonable. I need to poke around a bit more but I am inclined to accept it. Thanks for your work!

(I so need to get started on jdeb2)

Avoid excessive heap utilisation due to in memory creation of md5s

68b2a36

DanielThomas mentioned this pull request Dec 4, 2015

Heap utilisation issues when packaging with large numbers of files nebula-plugins/gradle-ospackage-plugin#142

Open

Cleanup temporary file

6b0a7c3

tcurdt added the improvement label Jan 16, 2016

tcurdt added this to the 1.9 milestone Aug 20, 2019

tcurdt removed this from the 1.9 milestone Jun 5, 2021

Repository owner deleted a comment from MarkChristensen1 Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid excessive heap utilisation due to in memory creation of md5s#222

Avoid excessive heap utilisation due to in memory creation of md5s#222
DanielThomas wants to merge 2 commits intotcurdt:masterfrom
DanielThomas:md5-heap-utilisation

DanielThomas commented Dec 4, 2015

Uh oh!

tcurdt commented Dec 4, 2015

Uh oh!

DanielThomas commented Dec 4, 2015

Uh oh!

tcurdt commented Dec 4, 2015

Uh oh!

DanielThomas commented Dec 4, 2015

Uh oh!

tcurdt commented Dec 4, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DanielThomas commented Dec 4, 2015

Uh oh!

tcurdt commented Dec 4, 2015

Uh oh!

DanielThomas commented Dec 4, 2015

Uh oh!

tcurdt commented Dec 4, 2015

Uh oh!

DanielThomas commented Dec 4, 2015

Uh oh!

tcurdt commented Dec 4, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants