This project provides a professional and mathematically rigorous explanation of:
- Stochastic Average Gradient (SAG) algorithm
- SAGA optimization algorithm
- Variance reduction methods
- Large-scale convex optimization
- Sparse machine learning solvers
- Regularized regression optimization
This repository is designed for:
- Data Scientists
- Machine Learning Engineers
- Optimization Researchers
- Students studying empirical risk minimization
stochastic average gradient, sag algorithm, saga optimization, variance reduction methods, convex optimization solver, sparse machine learning optimization, large scale optimization, ridge regression solver, lasso regression solver, empirical risk minimization algorithm
We solve the optimization problem:
Where:
-
$$\theta$$ β model parameters -
$$n$$ β number of samples -
$$f_i(\theta)$$ β loss of sample$$i$$
Example (Ridge regression):
Standard SGD update:
Issues:
- High gradient variance
- Slow convergence
- Sublinear rate
SAG stores gradients for all samples.
Let:
Update rule:
At each iteration:
- Sample index
$$i_k$$ - Compute new gradient
- Replace stored gradient
Key property:
Variance decreases over time.
SAGA improves SAG by removing bias:
Properties:
- Unbiased gradient estimator
- Supports composite objectives
- Works with L1 regularization
- Linear convergence for strongly convex problems
Assume:
-
$$F(\theta)$$ is$$\mu$$ -strongly convex - Gradient is
$$L$$ -Lipschitz continuous
Condition number:
Then SAG / SAGA achieve:
Compared to SGD:
This explains why variance reduction methods dominate in large-scale convex optimization.
SGD gradient variance:
SAG/SAGA reduce variance because:
This yields linear convergence rate.
stochastic-average-gradient-sag-solver-course/
β
βββ README.md
βββ LICENSE
βββ requirements.txt
β
βββ src/
β βββ sag.py
β βββ saga.py
β βββ loss_functions.py
β
βββ examples/
β βββ demo.py
β
βββ docs/
β βββ theory.md
β βββ convergence.md
β
βββ index.html
Clean structure improves:
- Academic credibility
- Search visibility
- Portfolio quality
import numpy as np
class SAGA:
def __init__(self, X, y, lr=0.01):
self.X = X
self.y = y
self.lr = lr
self.n, self.d = X.shape
self.theta = np.zeros(self.d)
self.grad_memory = np.zeros((self.n, self.d))
self.grad_avg = np.zeros(self.d)
def step(self):
i = np.random.randint(0, self.n)
grad = self.compute_gradient(i)
self.theta -= self.lr * (
grad
- self.grad_memory[i]
+ self.grad_avg
)
self.grad_avg += (grad - self.grad_memory[i]) / self.n
self.grad_memory[i] = grad
def compute_gradient(self, i):
xi = self.X[i]
yi = self.y[i]
return xi * (xi @ self.theta - yi)pip install -r requirements.txtRun demo:
python examples/demo.py- Logistic regression optimization
- Ridge regression solver
- Lasso regression solver
- Sparse machine learning models
- Large-scale convex optimization
- Industrial ML systems
- Stochastic Gradient Descent (SGD)
- SVRG
- L-BFGS
- Conjugate Gradient
- Variance Reduction Methods
- Convex Optimization
If you are studying stochastic optimization algorithms for machine learning, this repository provides both theoretical foundations and practical implementation of SAG and SAGA solvers.
β Star the repository if it helps your learning or research.