Skip to content

Conversation

@floriankozikowski
Copy link
Contributor

Context of the PR

Closes #325

Contributions of the PR

Checks before merging PR

  • added documentation for any new feature
  • added unit tests
  • edited the what's new (if applicable)

Xw = X @ w[:-1] + w[-1]
datafit_grad = datafit.gradient(X, y, Xw)
penalty_grad = penalty.gradient(w[:-1])
intercept_grad = datafit.intercept_update_step(y, Xw)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be an issue here, because intercept_update_step does not compute the gradient, but a scaled version of it (it's mulliplied by the stepsize).
The safest way to do it would be to call raw_grad().sum(), which is equivalent to np.ones(n_features) @ raw_grad(), e.g. the gradient with respect to a feature full of ones.

Sounds good @Badr-MOUFAD ?

Copy link
Collaborator

@Badr-MOUFAD Badr-MOUFAD Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @mathurinm,

I have a little concern because it seems that in some part of the code (Logostic datafit) intercept_update_step account for the stepsize, whereas in other parts (Quadratic, Huber, Poisson, ...) intercept_update_step evaluate the gradient

That being said, I agree that the safest option is to sum(raw_grad(y, Xw)).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Quadratic (hand Huber I guess) it's because the stepsize is 1 (lc is $| X_j |^2 / n = 1$

Copy link
Collaborator

@Badr-MOUFAD Badr-MOUFAD Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right @mathurinm, thanks

What should we do for Poisson and Gamma datafits ? they implement intercept_update_step with the stepsize convention, but actually since they are non-quadratic it does't not make sense for them

@floriankozikowski
Copy link
Contributor Author

@Badr-MOUFAD I tried the refactor (also considering your comment @mathurinm )
Let me know what you think. Didn't find an option to make this refactor shorter

@floriankozikowski
Copy link
Contributor Author

Btw, looking at the initial issue #320 that initiated this PR, the intercept fitting improved the sklearn speed difference, but it is still slower.

--- Fitting Time Comparison ---
skglm (LBFGS): 8.0444 seconds
sklearn (L-BFGS): 2.4578 seconds

sklearn was 3.27x faster.

I guess in this PR, we only focus on the intercept, but maybe we should open a new issue investigating other causes for this.

If you approve the refactor, I can delete the debug script (issue320) and we can merge.

@Badr-MOUFAD
Copy link
Collaborator

Thanks @floriankozikowski for the timing comparison.
Weird that sklearn is x3 faster, but I agree to tackle this issue in a separate PR

Can you pls add the intercept to the unittest and update the whats'new page

@floriankozikowski
Copy link
Contributor Author

@Badr-MOUFAD thanks for the feedback! It should be complete now and I'd say we can merge. I won't have access to my laptop the next two weeks, so if anything comes up, I will look at it mid-August again.

@mathurinm mathurinm merged commit 29d67fa into scikit-learn-contrib:main Aug 2, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FEAT - Add fit_intercept support to LBFGS solver

3 participants