-
Notifications
You must be signed in to change notification settings - Fork 38
FEAT - add fit_intercept support for LBFGS #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT - add fit_intercept support for LBFGS #326
Conversation
skglm/solvers/lbfgs.py
Outdated
| Xw = X @ w[:-1] + w[-1] | ||
| datafit_grad = datafit.gradient(X, y, Xw) | ||
| penalty_grad = penalty.gradient(w[:-1]) | ||
| intercept_grad = datafit.intercept_update_step(y, Xw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there may be an issue here, because intercept_update_step does not compute the gradient, but a scaled version of it (it's mulliplied by the stepsize).
The safest way to do it would be to call raw_grad().sum(), which is equivalent to np.ones(n_features) @ raw_grad(), e.g. the gradient with respect to a feature full of ones.
Sounds good @Badr-MOUFAD ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @mathurinm,
I have a little concern because it seems that in some part of the code (Logostic datafit) intercept_update_step account for the stepsize, whereas in other parts (Quadratic, Huber, Poisson, ...) intercept_update_step evaluate the gradient
That being said, I agree that the safest option is to sum(raw_grad(y, Xw)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Quadratic (hand Huber I guess) it's because the stepsize is 1 (lc is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right @mathurinm, thanks
What should we do for Poisson and Gamma datafits ? they implement intercept_update_step with the stepsize convention, but actually since they are non-quadratic it does't not make sense for them
|
@Badr-MOUFAD I tried the refactor (also considering your comment @mathurinm ) |
|
Btw, looking at the initial issue #320 that initiated this PR, the intercept fitting improved the sklearn speed difference, but it is still slower. --- Fitting Time Comparison ---
I guess in this PR, we only focus on the intercept, but maybe we should open a new issue investigating other causes for this. If you approve the refactor, I can delete the debug script (issue320) and we can merge. |
|
Thanks @floriankozikowski for the timing comparison. Can you pls add the intercept to the unittest and update the whats'new page |
|
@Badr-MOUFAD thanks for the feedback! It should be complete now and I'd say we can merge. I won't have access to my laptop the next two weeks, so if anything comes up, I will look at it mid-August again. |
Context of the PR
Closes #325
Contributions of the PR
Checks before merging PR