-
Notifications
You must be signed in to change notification settings - Fork 142
Rerouting aggregator #4647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
scottwittenburg
wants to merge
37
commits into
ornladios:master
Choose a base branch
from
scottwittenburg:rerouting_aggregator
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Rerouting aggregator #4647
scottwittenburg
wants to merge
37
commits into
ornladios:master
from
scottwittenburg:rerouting_aggregator
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7d09261 to
2ec94df
Compare
4dde4a1 to
19ce1c1
Compare
19ce1c1 to
54b7f28
Compare
Break out the bits of InitTransports that generate the filename and open the file, and put them into a new function that can be called either with or without an mpi communicator. This supports the rerouting aggregator, currently under development, by allowing rerouted ranks to change their target subfile near the end of the writing process, after opening a different file at the beginning.
Just trying to get a threaded communication scheme going, without the actual rerouting to start (just have every group write to the expected file). Add some api to adiosComm to support the Probe/Iprobe api and make sure it's working by using it the test.
Cause of race condition was that rank 0 would, maybe due to its extra pre-write duties, arrive to the party and send its WRITE_SUBMISSION msg after the SC had already finished writing and exited the comm loop. Since all ranks generate the complete partitioning information, we can make that available so that SCs can pre-populate their writer queues from that and do away with the WRITE_SUBMISSION message altogether. This reduces the total number of messages needed, as well as avoids the race condtion that caused intermittent hangs of the test.
* Before leaving the comm loop, SC must update its m_DataPos from the variable that has been tracking it for its group, or else it shares the wrong file positions with the other ranks. This was causing all tests with more than 1 timestep to fail. * When SC comm thread first starts up, it needs to set the current file position variable from m_DataPos, which was updated correctly in append mode. That was causing tests that append to fail.
Avoid the situation with non-blocking sends where the buffer containing the data to be sent is destroyed before the send completes.
- One certain issue was storing globalState using subcoordinator rank ids, rather than the index of the groups they coordinate. Fix that by adding a mapping from rank id to global state index. - When rerouting happens on time step zero, opening without append could result in a zero block appearing where the rerouted rank wrote. Fix that by allowing to specify forceAppend when rerouted ranks open their files.
This could happen when global coordinator re-routed another rank to it's own subfile. Because the update of m_DataPos from the local tracking varibale, currentFilePos, was happening upon receipt of the GROUP_CLOSE message, and the global coordinator does not send itself that message, it wasn't capturing the actual file offset to be shared with other ranks later in EndStep(). This moves the variable update to the end of the thread method, just before returning, so that all subcoordinators do it, regardless of whether they're the global coordinator also.
54b7f28 to
4c7a645
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement aggregation method inspired by the paper "Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems?" (link).
Currently only works with
DataSizeBasedaggregation.Does not yet do any actual re-routing, only re-implements writing using a new multithreaded messaging scheme based on the paper above.
Adds some missing functionality to
adiosComm(probe api and support forMPI_ANY_SOURCE)