-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastCC results differ from respective MATLAB function #1154
Comments
Thanks for the report @NantiaL. Can you please post the exact code that you used to generate these results? |
Hey, here it is: MATLAB:
Python:
In Python I also got different results over multiple runs after setting the zero_cutoff and/or the flux_threshold to 1e-4. |
From a cursory look it seems like there are some bugs in the FastCC coefficient setting. For instance a reaction with a forward flux of 1 and reverse flux of 1 (net flux 0) is considered active on our formulation. |
Apologies for the delayed response. I'll take a look. Thanks for the report! |
Hello, I was wondering if you ever found any leads on what is causing this issue. I tried to diagnose myself but it got quite overwhelming for me to understand. This issue seems pretty important for research since results should be reproducible. Thank you |
Hi @babessell1 ! The problem is most likely in the problem formulation as @cdiener pointed out earlier. I have been quite busy lately and I'm not sure when I can take tackle it. Apologies for that as I had self-assigned it. I'll remove it and let someone else take a stab at it. |
Hi there, I have a more general question as this is apparently still not fixed; the last change to fastCC is in 2021 if I am not mistaken, and at least one bug has been identified in 2022, yet this function still exists in this state. Why is it not removed, or why does it not at least show warnings that it is not to be used. It greatly reduces any type of confidence in this repository if this is the case (I hope I am mistaken and missed the fix/commit, however I too find differences in the amount of reactions found between matlab and python implementations), but the fact that I can use the fastCC function without knowing that there are problems with it, is problematic isn't it? |
I could write quite a bit about the project now, instead, please consider the following question: |
The changes that are necessary to fix and test the function are of course no one's responsibility, however the people accepting PR's or even those that identified issues probably do probably have the responsibility to make people aware of known issues. Now I understand that you might say: this is no one's job and as this is an open source repository, people could make these changes if they want to, however I would counter that allowing known bugs to persist without informing users (for several years), is unbecoming of academics. (this would only require 2 lines of code): import warnings |
@Midnighter Can we keep this issue open (as the PR is going to close it), as the bug still exists. @cdiener Since I will be having a look at this function (and some others) soon, I was wondering if you still see the specific bug you mentioned? I spotted some other discrepancies between this function and the MATLAB implementation, I do not see the coefficient setting that you mention? The abs(flux) part in _find_sparse_mode, won't deal with this abs(-2 + 2) should still be 0 (if you don't remember, no worries, I will take a look later this week/month) |
@dagl1 Bug is still there but it is just a sign error. What's implemented here is LP-7 from the paper. However, cobrapy is using a standard form for the LP problem, so Sorry, originally I wanted to give @synchon a chance to fix it first and then I never checked. And there is already a working function to detect blocked reactions and always enough other urgent things to do, so it wasn't the highest priority for me. |
Ah @cdiener you mean this part right (changed from + var to - var already): const = prob.Constraint( I also spotted a small source of discrepancy in how active fluxes are considered, as the Matlab implementation checks for >= .99*epsilon while cobrapy checks > zero_cutoff (zero_cutoff being the same as the epsilon of the Matlab implementation). However even with these things changed, they still lead to discrepancies, so I will try going over some potential solver setting differences, and investigate the reactions that are actually different. If anyone has other ideas that could lead to differences between the functions, let me know. Even with trying over 50 different settings I cannot get the exact same amount of consistent reactions as the matlab function provides (11681 for human-gem 1.17 with matlab, whereas the closest I got was 11677, which is only 4 off, but still, I would hope to get the same results eventually). |
Yes exactly, your version is correct. Flux threshold shouldn't matter much for the result. Smaller values can make it faster because more fluxes can carry flux requiring less iterations. Epsilon is definitely important. Does the human GEM have blocked reactions? Otherwise, the largest number of consisten reactions is probably the more correct result.... Getting the exact same result may be hard because of floating point accuracy. The > eps comparison is only true within the solver tolerances. So some fluxes might be too close to be called consistently. We use a slightly tighter tolerance than most other psckages (1e-7) so maybe adjusting the matlab settings brings it more in line. The agreement you are seeing looks pretty good already. |
Yes I too would think the more lenient values are correct, and most likely floating point arithmatic will be the root of this, but in general it's nice to be able to see the exact same output (just so that any analysis that would take whole network structure or whatever down the line, won't be different between using something in Python vs Matlab). I will let you know what comes out regarding settings (although so far 4 reactions off is the best I can still do). I am using Human Gem 1.17 (1.19 is the latest), where the original model has ~12700 reactions, so a good ~1000 are filtered away. |
Hello, I would like to raise a question regarding the FastCC implementation of Cobrapy. After multiple runs, FastCC outputs a different number of blocked reactions using the Recon1 model. It might be reasonable because optimization never delivers unique solutions. But, the respective MATLAB function from the COBRAToolbox always returns the same results after several runs. Since both implementations are based on the same algorithm, I guess both should give the same results. So, I think this might be something worths checking.
I noticed that the number of irreversible reactions detected in the FastCC implementation deviates from the number of irreversible reactions computed by the MATLAB function. Reactions that are irreversible in the backward direction are not reported as irreversible in the MATLAB command window.
For Recon1:
MATLAB:
Python:
Used solver: glpk, python 3.8.5
The text was updated successfully, but these errors were encountered: