Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastCC results differ from respective MATLAB function #1154

Open
NantiaL opened this issue Feb 21, 2022 · 14 comments · May be fixed by #1427
Open

FastCC results differ from respective MATLAB function #1154

NantiaL opened this issue Feb 21, 2022 · 14 comments · May be fixed by #1427
Labels

Comments

@NantiaL
Copy link

NantiaL commented Feb 21, 2022

Hello, I would like to raise a question regarding the FastCC implementation of Cobrapy. After multiple runs, FastCC outputs a different number of blocked reactions using the Recon1 model. It might be reasonable because optimization never delivers unique solutions. But, the respective MATLAB function from the COBRAToolbox always returns the same results after several runs. Since both implementations are based on the same algorithm, I guess both should give the same results. So, I think this might be something worths checking.

I noticed that the number of irreversible reactions detected in the FastCC implementation deviates from the number of irreversible reactions computed by the MATLAB function. Reactions that are irreversible in the backward direction are not reported as irreversible in the MATLAB command window.

For Recon1:
MATLAB:

  • 2186 irreversible reactions
  • 1274 blocked reactions

Python:

  • 2192 irreversible reactions
  • blocked reactions differ after each run
    Used solver: glpk, python 3.8.5
@Midnighter
Copy link
Member

Thanks for the report @NantiaL. Can you please post the exact code that you used to generate these results?

@NantiaL
Copy link
Author

NantiaL commented Feb 21, 2022

Hey, here it is:

MATLAB:

is_active = fastcc(model, 1e-4);
inactiveRxns = setdiff(model.rxns, model.rxns(is_active));

Python:

from cobra import *
model = io.read_sbml_model('RECON1.xml')
model.solver= 'glpk'
fastcc(model)

In Python I also got different results over multiple runs after setting the zero_cutoff and/or the flux_threshold to 1e-4.

@NantiaL NantiaL changed the title FastCC results differe from respective MATLAB function FastCC results differ from respective MATLAB function Feb 21, 2022
@cdiener
Copy link
Member

cdiener commented Feb 21, 2022

From a cursory look it seems like there are some bugs in the FastCC coefficient setting. For instance a reaction with a forward flux of 1 and reverse flux of 1 (net flux 0) is considered active on our formulation.

@Midnighter Midnighter added the bug label Feb 21, 2022
@synchon
Copy link
Member

synchon commented Feb 23, 2022

Apologies for the delayed response. I'll take a look. Thanks for the report!

@synchon synchon self-assigned this Feb 23, 2022
@babessell1
Copy link

@synchon

Hello, I was wondering if you ever found any leads on what is causing this issue. I tried to diagnose myself but it got quite overwhelming for me to understand. This issue seems pretty important for research since results should be reproducible.

Thank you

@synchon
Copy link
Member

synchon commented Aug 4, 2022

Hi @babessell1 ! The problem is most likely in the problem formulation as @cdiener pointed out earlier. I have been quite busy lately and I'm not sure when I can take tackle it. Apologies for that as I had self-assigned it. I'll remove it and let someone else take a stab at it.

@synchon synchon removed their assignment Aug 4, 2022
@synchon synchon added help-wanted An issue that should be easy to implement for anyone in the community. and removed help-wanted An issue that should be easy to implement for anyone in the community. labels Aug 4, 2022
@dagl1
Copy link

dagl1 commented Feb 19, 2025

Hi there, I have a more general question as this is apparently still not fixed; the last change to fastCC is in 2021 if I am not mistaken, and at least one bug has been identified in 2022, yet this function still exists in this state. Why is it not removed, or why does it not at least show warnings that it is not to be used. It greatly reduces any type of confidence in this repository if this is the case (I hope I am mistaken and missed the fix/commit, however I too find differences in the amount of reactions found between matlab and python implementations), but the fact that I can use the fastCC function without knowing that there are problems with it, is problematic isn't it?

@Midnighter
Copy link
Member

I could write quite a bit about the project now, instead, please consider the following question:
Whose responsibility is it in your opinion to make those changes?

@dagl1
Copy link

dagl1 commented Feb 19, 2025

The changes that are necessary to fix and test the function are of course no one's responsibility, however the people accepting PR's or even those that identified issues probably do probably have the responsibility to make people aware of known issues. Now I understand that you might say: this is no one's job and as this is an open source repository, people could make these changes if they want to, however I would counter that allowing known bugs to persist without informing users (for several years), is unbecoming of academics.

(this would only require 2 lines of code):

import warnings
...
(inside fastcc()):
warnings.warn("As of 2021, release V... this function contains a known bug which allows reactions with zero net flux to be considered active, as well as known discrepancies between this function (the cobrapy implementation) and its matlab implementation present in the cobra toolbox, see: #1154")

@dagl1 dagl1 linked a pull request Feb 19, 2025 that will close this issue
4 tasks
@dagl1
Copy link

dagl1 commented Feb 20, 2025

@Midnighter Can we keep this issue open (as the PR is going to close it), as the bug still exists. @cdiener Since I will be having a look at this function (and some others) soon, I was wondering if you still see the specific bug you mentioned? I spotted some other discrepancies between this function and the MATLAB implementation, I do not see the coefficient setting that you mention? The abs(flux) part in _find_sparse_mode, won't deal with this abs(-2 + 2) should still be 0 (if you don't remember, no worries, I will take a look later this week/month)

@cdiener
Copy link
Member

cdiener commented Feb 20, 2025

@dagl1 Bug is still there but it is just a sign error. What's implemented here is LP-7 from the paper. However, cobrapy is using a standard form for the LP problem, so v = v_f - v_b. So for the z from the paper, it is, v_f - v_b >= z, ergo v_f - v_b - z >= 0 so there is a sign error in the implementation. The rest looks pretty good. Maybe it would be faster to flip the objective coefficients instead of the constraints, but what is there now is not wrong AFAICT.

Sorry, originally I wanted to give @synchon a chance to fix it first and then I never checked. And there is already a working function to detect blocked reactions and always enough other urgent things to do, so it wasn't the highest priority for me.

@dagl1
Copy link

dagl1 commented Feb 20, 2025

Ah @cdiener you mean this part right (changed from + var to - var already):

const = prob.Constraint(
rxn.forward_variable - rxn.reverse_variable - var,
name="constraint_{}".format(rxn.id),
lb=0.0,
)

I also spotted a small source of discrepancy in how active fluxes are considered, as the Matlab implementation checks for >= .99*epsilon while cobrapy checks > zero_cutoff (zero_cutoff being the same as the epsilon of the Matlab implementation).

However even with these things changed, they still lead to discrepancies, so I will try going over some potential solver setting differences, and investigate the reactions that are actually different.
flux_threshold is just a set of ones in the matlab implementation if I see it correctly, so that shouldn't really need to be changed either way, but I will see if setting it as an int might lead to changes.

If anyone has other ideas that could lead to differences between the functions, let me know. Even with trying over 50 different settings I cannot get the exact same amount of consistent reactions as the matlab function provides (11681 for human-gem 1.17 with matlab, whereas the closest I got was 11677, which is only 4 off, but still, I would hope to get the same results eventually).

@cdiener
Copy link
Member

cdiener commented Feb 20, 2025

Yes exactly, your version is correct. Flux threshold shouldn't matter much for the result. Smaller values can make it faster because more fluxes can carry flux requiring less iterations. Epsilon is definitely important. Does the human GEM have blocked reactions? Otherwise, the largest number of consisten reactions is probably the more correct result....

Getting the exact same result may be hard because of floating point accuracy. The > eps comparison is only true within the solver tolerances. So some fluxes might be too close to be called consistently. We use a slightly tighter tolerance than most other psckages (1e-7) so maybe adjusting the matlab settings brings it more in line. The agreement you are seeing looks pretty good already.

@dagl1
Copy link

dagl1 commented Feb 21, 2025

Yes exactly, your version is correct. Flux threshold shouldn't matter much for the result. Smaller values can make it faster because more fluxes can carry flux requiring less iterations. Epsilon is definitely important. Does the human GEM have blocked reactions? Otherwise, the largest number of consisten reactions is probably the more correct result....

Getting the exact same result may be hard because of floating point accuracy. The > eps comparison is only true within the solver tolerances. So some fluxes might be too close to be called consistently. We use a slightly tighter tolerance than most other psckages (1e-7) so maybe adjusting the matlab settings brings it more in line. The agreement you are seeing looks pretty good already.

Yes I too would think the more lenient values are correct, and most likely floating point arithmatic will be the root of this, but in general it's nice to be able to see the exact same output (just so that any analysis that would take whole network structure or whatever down the line, won't be different between using something in Python vs Matlab). I will let you know what comes out regarding settings (although so far 4 reactions off is the best I can still do).

I am using Human Gem 1.17 (1.19 is the latest), where the original model has ~12700 reactions, so a good ~1000 are filtered away.
Some are trivially easy to spot by just checking the for any metaboltie that can only be produced/removed (taking into account flipping of reversible reactions); I think I will apply loopless FVA either way to really find any reactions that can take no flux, regardless of getting fastCC to be fully congruent or not, just for making sure I can reduce the model as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants