-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fix race condition in NodeProvideOutOfProcTaskHost #12708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves thread safety in NodeProviderOutOfProcTaskHost by converting two dictionary fields from IDictionary<int, T> to ConcurrentDictionary<int, T>.
- Changed field type declarations for
_nodeIdToPacketFactoryand_nodeIdToPacketHandlerto useConcurrentDictionary - Updated initialization from
DictionarytoConcurrentDictionary - Replaced
Remove()calls withTryRemove()to use the concurrent dictionary API
src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcTaskHost.cs
Outdated
Show resolved
Hide resolved
rainersigwald
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see how this solves the problem. Can you elaborate please?
oh, that's a mystery here. This fix solves the race condition in the prototype branch - I can attest to that fact. But neither me nor @surayya-MS (please correct me if you figured it out since our last sync) figured yet why it solves it, more precisely, why it runs into a race condition in the first place. We need to investigate this bug a bit more |
|
There is no race condition, it's "just" non-concurrent directory corruption by concurrent write by multiple threads (of different ids - the current design assigns different taskhost node id to each thread node), the PR correctly fixes that. The message in the assert is a red herring. |
src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcTaskHost.cs
Outdated
Show resolved
Hide resolved
|
lgtm, but I want another reviewer |
|
Ok, now I see. The writes to the dictionaries are behind some locks. However, those locks are from different tasks objects and do not actually protect from concurrent writes when we have more than one task host task running (i.e. when we have many nodes in one process). The fix makes sense now. |
|
note: |
Fixes #12660
Context
The following assert fails:
/mt /msometimes crashes with error:Changes Made
Changed
_nodeIdToPacketFactoryand_nodeIdToPacketHandlerfromDictionarytoConcurrentDictionary.Testing
Manually tested to make sure the error is not there.
Notes