-
Notifications
You must be signed in to change notification settings - Fork 36
Reproducibility of RMG #275
Comments
It looks like these aren't just reactions appearing in a different order - they actually have different A factors. Can you explain that? One of them must be more erroneous than the other, which makes me nervous about arbitarily picking one based on a linked hash map's order. What if we're consistently wrong from one run to the next? |
The A factors are multiples of each other, so it appears the degeneracy is different for each run (?) |
Thanks Mike. So in all three cases the total is 4 pathways: split either 1 and 3, 3 and 1, or 2 and 2. @keceli, do you have the verbose comments from these rates, to see if that gives further clues as to what's going on? |
This gets various corrections and improvements to the QM Library work. I think this includes all the changes by Nate up to nyee/RMG-Java@c10379a excluding the HashMap / LinkedHashMap change (see ReactionMechanismGenerator#275) Also worth noting - now using the augmented (modified) InChI in the QM library. I suggest you delete any QM libraries you have made thus far, unless you're sure there were no radicals in them.
I understand that it is hard to merge the global change of HashMap vs LinkedHashMap before other changes, but so far it is the only way to force RMG to have reproducible output. Is there a plan to merge it at the end, or is there any reason to avoid it? |
It is the only way we have found so far to force reproducible output, because we do not yet understand why the different pathways above (with different rates) have different degeneracies depending on the whim of a HashSet. |
I wish I had a better understanding of this bug, however, I still think leaving RMG with HashMaps is more dangerous since it is the source of the inconsistent outputs of RMG. The linked version picks the 3rd set of the rate constants listed here and I don't know how to judge if it is the right value, or not. I think with a reproducible output, it would be easier to debug the code. |
Okay, so it looks like the reaction in question is the homolytic bond fission of one of the bridging C-C bonds. The Birad_recombination family is defined in the bond forming reaction, and in that direction there is only one equivalent bond being formed, so I would expect the degeneracy to be one, not four. The first problem is that, as a Birad_recombination, the two atoms in the forming/breaking bond can be labeled as (_1, *2) or as (_2, *1). The ensuing degeneracy should be divided by two to account for this (as it is in the R_recombination family), but I don't see that happening anywhere in the code. Thus I think we are double-counting all of our Birad_recombination reactions. A test of RMG with cyclohexane seems to confirm this. (A degeneracy of one or three would then have to be an error in this case.) The second issue is one of multiple transition states. Since JP-10 is a fused ring system, there are both five-membered ring and six-membered ring transition states that are matched by the template, hence the duplicate reactions in the Chemkin output. Physically there is only one transition state, so I think we probably should not keep both kinetics in this case, but which of the two is the one to keep is not at all obvious. I am still not entirely sure why RMG sometimes arrives at degeneracies of one and three (or three and one), instead of two and two. Although using linked hash maps seems to give the desired result in this case, I think that might just be luck. |
Below is a sample RMG input file which you can observe that RMG produces different outputs (Final_Model.txt) even on the same machine.
This is a very simple input file (no Pdep, QMTP etc) and runs under a few minutes. The final model differs in the kinetics for the first reaction, you can get one of the below:
Tracing back to the root shows that species created by reverse Birad_recombination (Ring_Open) can differ in different runs. Discussions with Josh, Nick, Nathan and Beat all ended up that this is probably related with HashMap's or HashSet's in Java which causes the orders to differ in each run depending on memory allocation.
Replacing all HashMap and HashSet collections with LinkedHashMap and LinkedHashSet in all Java source files seems to be the easy fix for the problem. I have checked some test cases and didn't see any side effect. While Linked collections are known to be slow when you add or remove an element, they are faster in iterations, so I hope that performance of RMG will not be affected much. (http://www.coderanch.com/t/426744/java-programmer-SCJP/certification/Speed-LinkedHashSet-vs-HashSet)
I added a branch named Linked with those changes.
The text was updated successfully, but these errors were encountered: