-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when launching with POSTPONE:<color> #25
Comments
I adjusted the Configuration class such that POSTPONE is now handled properly; however I would like to know how to start MPI such that MUSIC is not handling all of the MPI nodes as member of the same Simulation group but that it is also possible to let two NEST simulation groups communicate with each other. |
Sorry for not returning to the POSTPONE issue for such a long time. I'm not sure what happened to this code, which was correct before. If you think that your fix is the right one, you are of course free to submit a pull request. In either case, I need to look at this and will do so ASAP. Could you please clarify what you mean with your comment about simulation groups? Each MUSIC-aware application gets its own intracommunicator, associated with its own MPI process group, to be used for its internal communication. What happens above this is dependent on the communication algorithm selected. For pairwise communication, intercommunicators are created for use by MUSIC ports (these are not available through the MUSIC API). For collective communication, a communicator covering all MPI processes is instead used internally. Given this, you can have two instances of NEST running as separate MUSIC-aware applications, with their own MPI process groups, and these can communicate with eachother through MUSIC ports. This sounds similar to what you request, but probably isn't. What is it that you request? |
This answer is really what I was looking for; Here is my current understanding (feel free to correct me where I am wrong): Using MUSIC via music binary (mpiexec .... music
If I launch two different NEST simulations that are coupled via MUSIC (this is what I mean -imprecisely speaking - meant with groups) it works all fine. When launching mpiexec python <pynn_music_script.py>, then MUSIC does not complain anymore after the little bugfix but NEST does. More in detail: it says that the random number generators are not in synchrony. All of the MPI ranks seem to be in the same group with respect to NEST. I had no deeper look into the MPI management of NEST but I think its simply using COMM_WORLD. Maybe there is (actually must be) the usage of intracommunicators when MUSIC is enabled but it seems not working as I do it right now, probably I need to crawl some NEST code to get a better understanding or you already know it? Back to the pull-request story: |
Your understanding is correct. Can you provide a simple test case demonstrating your python problem such that I can reproduce it on my machine? I will then debug it. Getting back to you regarding POSTPONE. |
Use-case: I am trying to adapt the PyNN MUSIC branch towards PyNN 0.8.1 and NEST 2.10. While the PyNN part was no problem a segmentation fault occurs within MUSIC. This happens independently of PyNN-MUSIC.
The error can be reproduced when launching MUSIC with _MUSIC_CONFIG=POSTPONE:0 either with 'python <scriptname.py>' or 'mpiexec -np 1 <scriptname.py>' (launching music as single process)
@mdjurfeldt
From what I can observe is that the old Python-config API sets POSTPONE: but the configuration parser actually expects an ApplicationMap-section within the ENV. Maybe I have missed something to do but in the current state it looks like that either the music-config/config.py must provide a full application-map in the ENV (=I need to change the way PyNN-Music/multisim.py assembles this) or that the MUSIC C++ code must be adapted.
Another question is what the runtime actually does when postpone is true? I mean it calls maybePostponedSetup but I dont see where the updated ENV's are actually parsed, maybe I do miss something?
`void
Setup::maybePostponedSetup ()
{
if (postponeSetup_)
{
delete config_;
config_ = new Configuration ();
fullInit ();
}
}
void
Setup::fullInit ()
{
errorChecks ();
if (!config ("timebase", &timebase_))
timebase_ = MUSIC_DEFAULT_TIMEBASE; // default timebase
string binary;
config_->lookup ("binary", &binary);
string args;
config_->lookup ("args", &args);
argv_ = parseArgs (binary, args, &argc_);
temporalNegotiator_ = new TemporalNegotiator (this);
}
`
The text was updated successfully, but these errors were encountered: