-
Notifications
You must be signed in to change notification settings - Fork 555
Shared memory performance #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's interesting. Have you repeated the test, always getting similar results? Some time ago I implemented a simpler approach, that directly accesses the reference with mmap without a POSIX shared memory object (PR #40 which, incidently, I still use). I remember that to ensure good performance I had to make sure that reference files were loaded all at once (with the |
I have repeated the test with several threading settings and always get the same answer. |
FWIW, I've tried adjusting the various flags (e.g., MAP_POPULATE) to make I also tried running #40, but I saw some segfaults when running that branch so I was never able to get a comparable benchmarking against I'm definitely not an expert on unix memory management, however, so it's still possible that someone with more experience could uncover some performance gains here. |
What happens if you run a second bwa mem after the first (ie, |
@kyleabeauchamp you say that you can reproduce this on OSX, so it probably isn't the issue, but if you're running bwa on multi-socket servers, you may want to see if you're getting hit by NUMA issues: https://www.systutorials.com/docs/linux/man/8-numactl/ |
PS: I never answered some of the follow-up questions on this thread, so here it goes. I can confirm that the slower performance of |
Does anyone (e.g., @ilveroluca or @avilella) have any thoughts on the performance of
bwa mem
when run with a shared memory index (bwa shm
)? I've found there to be a 24% performance penalty when using a pre-loaded index, which to my naive mind indicates something either with either increased cache misses or suboptimal virtual memory paging (possibly related to MMAP flags)? Ideally, I would love for the the pre-loaded index to improve performance due to the overall decreased amount of RAM usage, reduced amount of time spent on IO, and increased flexibility for threading / multiplexing.Does this number seem "reasonable" to others who have thought more carefully about memory management?
My benchmark code is below. FWIW, I've observed similar behavior on both OSX and linux.
The text was updated successfully, but these errors were encountered: