Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 150 additions & 0 deletions .github/issues/rosenbrock_dae_algebraic_error.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Rosenbrock DAE Error Estimator: Algebraic Variable Overshoot

## Summary

The Rosenbrock DAE4 error estimator produces near-zero error values for algebraic
variables, causing the step-acceptance criterion to ignore large errors in
constraint-derived quantities. This allows algebraic "balance" variables (e.g.,
SO₂(g) in a total-sulfur conservation constraint) to overshoot through zero
without triggering step rejection.

## Reproducer

See `python/test/integration/test_dae_algebraic_overshoot.py` for a minimal
chemistry system that demonstrates the issue.

**System**: H₂O₂-pathway sulfate production in cloud droplets.
- 10 species: 3 differential (SO₄²⁻, SO₂OOH⁻, H₂O), 7 algebraic (SO₂, H₂O₂,
SO₂(aq), H₂O₂(aq), HSO₃⁻, OH⁻, H⁺)
- SO₂(g) is algebraic via `LinearConstraint`: SO₂ = C − SO₂(aq) − HSO₃⁻ − SO₂OOH⁻ − SO₄²⁻
- Fast reactions convert S(IV) → S(VI) until SO₂(g) should approach zero

**Observed**: With a single 30 s solve and default tolerances (atol=1e-3 for gas,
1e-14 for aqueous, rtol=1e-6):
- SO₂(g) = −6.5e-8 (should be ≥ 0)
- 47 internal steps, 0 rejections
- Total sulfur is conserved (constraint works correctly)
- The solver never rejects a step despite SO₂(g) changing sign

**Key evidence**: Changing `atol` for SO₂ from 1e-3 to 1e-12 produces **identical
results** (same 47 steps, same SO₂ value). This proves the error vector entry for
SO₂ is effectively zero — the tolerance is irrelevant.

## Root Cause Analysis

### The error vector is ~0 for algebraic variables

The DAE4 error estimate is `Yerror = K[3]` (only `e[3] = 1.0`, all others zero).

For algebraic variable `i` (mass matrix diagonal `M_ii = 0`):

1. **Stage coupling is zeroed**: In the RHS accumulation:
```
K[stage][i] += (c_jk/H) * M_ii * K[j][i]
```
Since `M_ii = 0`, the coupling terms from previous stages are zero for algebraic rows.

2. **K[3] is determined by the linear solve**: After coupling, the system
`(M/(γH) − J) · K[3] = RHS` is solved. For algebraic row `i`:
```
−J_i · K[3] = f_i(Y + 2·K[0] + K[2])
```
where `f_i` is the constraint forcing term.

3. **The constraint forcing at the evaluation point is small**: The constraint
initialization step ensures `f_i ≈ 0` at the initial point. The evaluation
point `Y + 2·K[0] + K[2]` may deviate, but the Jacobian coupling between the
constraint row and all other variables means the linear solve distributes the
residual across many variables, resulting in a small `K[3]` for any single
algebraic variable.

4. **The error norm contribution is negligible**: Since `Yerror[i] ≈ 0`:
```
errors_over_scale = Yerror[i] / (atol[i] + rtol * ymax[i]) ≈ 0
```
No matter what `atol` is set to.

### Why tight differential tolerances help indirectly

With `atol = 1e-16` and `rtol = 1e-12` for **all** species, the solver takes 690k
steps. This works not because the algebraic error is checked, but because:
- SO₄²⁻ (differential) has a tighter tolerance
- The solver must resolve the SO₄²⁻ trajectory with high accuracy
- This forces small enough steps that SO₄²⁻ never overshoots the total-S budget
- SO₂(g) stays positive as a consequence

This is impractical for real applications — 690k steps for 30 s of chemistry.

## What the current "fix" (commit c20507ad) does

The fix removes the `if (upper_left_identity_diagonal_[i] > 0.0)` skip from
`NormalizedError()`, and replaces `num_ode_variables` with `Y.NumRows() * Y.NumColumns()`.

This is correct in principle but has **no practical effect** because `Yerror[i] ≈ 0`
for algebraic variables. The skip was a symptom, not the cause.

## Proposed fix options

### Option A: Compute algebraic error from the constraint residual

After computing `Ynew`, evaluate the constraint residual `g(Ynew)` and use it as
the error for algebraic variables:

```cpp
// After: Ynew.Copy(Y); for (...) Ynew.Axpy(m_[stage], K[stage]);
// Compute constraint residual for error estimation
if (has_constraints) {
DenseMatrixPolicy constraint_residual;
constraint_residual.Fill(0);
constraints_.AddForcingTerms(Ynew, state.custom_rate_parameters_, constraint_residual);
// Replace Yerror entries for algebraic variables with constraint residual
for (i_cell...) for (i_var...) {
if (state.upper_left_identity_diagonal_[i_var] == 0.0) {
Yerror[i_cell][i_var] = constraint_residual[i_cell][i_var];
}
}
}
```

**Pro**: Directly measures what we care about — how well the constraint is satisfied.
**Con**: The residual may not scale correctly compared to the Rosenbrock error estimate.

### Option B: Use the "state constraint" error estimate

For DAEs, the error in algebraic variables `y_a` is related to the error in
differential variables `y_d` through the constraint Jacobian:

```
δy_a ≈ −(∂g/∂y_a)^{-1} · (∂g/∂y_d) · δy_d
```

This could be computed from the Jacobian blocks. For linear constraints like
total-S conservation, this is exact: `δSO₂ = −δSO₄²⁻ − δSO₂OOH⁻`.

**Pro**: Theoretically sound, relates algebraic error to differential error.
**Con**: Requires extracting Jacobian blocks, more complex implementation.

### Option C: Solve the "over-determined" error system

Replace the Rosenbrock error computation for algebraic rows with a secondary
solve using the constraint Jacobian, as described in Hairer & Wanner (1996),
Section VI.6.

### Option D: Rate damping (user-level workaround)

MIAM's `max_halflife` parameter caps reaction rates so that no reactant is
consumed faster than a specified half-life. This prevents the differential
variable (SO₄²⁻) from overshooting, which indirectly keeps SO₂(g) positive.

**Pro**: Already implemented, user can apply immediately.
**Con**: Workaround, not a solver-level fix. Slightly perturbs the kinetics.

## Recommended approach

**Option A** is the most practical and directly addresses the root cause. The
constraint residual at `Ynew` is already available (just call `AddForcingTerms`
again) and provides a meaningful error measure for algebraic variables.

For the constraint residual to work as an error estimate, it should be scaled
appropriately. One approach: use the L2 norm of the residual per algebraic variable,
normalized by `atol + rtol * |y_a|`.
89 changes: 89 additions & 0 deletions .github/prompts/cam_cloud_chemistry_status.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# CAM Cloud Chemistry Tutorial (Tutorial 14): Status & Next Steps

## Current state

Tutorial 14 (`tutorials/14. cam_cloud_chemistry.ipynb`) uses a **total-sulfur
constraint** (including SO₄²⁻) with rate damping (`max_halflife`) and
per-species tolerances.

**MICM commit `7e46f9b2`** implements step-change error estimation for
algebraic variables (`Yerror[a] = Ynew[a] - Y[a]`), making algebraic
tolerances control step acceptance. This replaces the previous near-zero
embedded error for algebraic rows.

### What works today

- Full SO₂/H₂O₂ cloud chemistry system (gas + aqueous species)
- Aqueous oxidation reactions (H₂O₂ pathway R1a/R1b)
- Algebraic constraints (Henry's Law, dissociation, conservation, charge balance)
- Total-sulfur conservation constraint (SO₂ + SO₂_aq + HSO₃⁻ + SO₂OOH⁻ + SO₄²⁻)
- Dummy `emitted_SO₂` tracker for SO₂ source term in mass budget
- Rate damping via `max_halflife` parameter
- Solvent damping via `solvent_damping_epsilon` (default 1e-20)
- Per-species absolute tolerances (1e-14 aqueous, 1e-9 gas algebraic)
- Validation cells for total-S conservation, charge balance, Henry's Law
- Step-change error estimate for algebraic variables (MICM fix)

### Tolerance strategy

With the step-change error fix, algebraic tolerances now matter:

| Algebraic atol | SO₂ sign | Steps/30s | Practical? |
|---|---|---|---|
| 1e-3 to 1e-8 | negative | 13–48 | fast but overshoots |
| 1e-10 | negative | ~3,500 | moderate |
| 1e-12 | **positive** | ~131k | expensive |

For the tutorial, the combination of moderate algebraic tolerances (1e-8 to
1e-10) plus rate damping (`max_halflife`) provides a practical balance: the
rate damping limits per-step overshoot and the step-change error controls
step acceptance.

### Solver parameters

```python
RosenbrockSolverParameters(
absolute_tolerances=[1e-9 (gas algebraic), 1e-14 (aqueous)],
constraint_init_max_iterations=100,
constraint_init_tolerance=1e-8,
max_number_of_steps=5000,
)
```

---

## Remaining work

### 1. Tutorial tolerance tuning

With the step-change error fix, re-evaluate whether the tutorial's per-species
tolerances and `max_halflife` settings give the best balance of accuracy and
performance. Consider testing:
- `alg_atol=1e-10` with `max_halflife=10` — may prevent overshoot at ~3.5k steps
- `alg_atol=1e-8` with `max_halflife=5` — fewer steps, rate damping does the work

### 2. Run the full test suite

```bash
pytest python/test/integration/test_miam_cloud_chemistry.py -v # 31 tests
pytest python/test/integration/test_dae_algebraic_overshoot.py -v # 3 tests
```

### 3. Tutorial notebook validation

Run the tutorial end-to-end to verify all cells produce correct output with
the new MICM.

---

## File inventory

| File | Purpose |
|---|---|
| `tutorials/14. cam_cloud_chemistry.ipynb` | The tutorial notebook |
| `python/test/integration/test_miam_cloud_chemistry.py` | Integration tests (31) |
| `python/test/integration/test_dae_algebraic_overshoot.py` | Overshoot tests (3) — sensitivity, loose, tight |
| `python/test/integration/diagnose_dae_error.py` | Diagnostic script (historical) |
| `.github/issues/rosenbrock_dae_algebraic_error.md` | Issue doc (historical — fix landed) |
| `.github/prompts/dae_algebraic_overshoot.md` | Step-change error description |
| `.github/prompts/cam_cloud_chemistry_status.md` | This file |
77 changes: 77 additions & 0 deletions .github/prompts/dae_algebraic_overshoot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# DAE Rosenbrock Error Estimator: Algebraic Variable Step-Change Error

## Summary

MICM's Rosenbrock DAE solver now uses **step-change error estimation** for
algebraic variables. After computing the embedded error, the solver replaces
each algebraic entry with `Yerror[a] = Ynew[a] - Y[a]`. This means algebraic
variable tolerances now control step acceptance.

**MICM commit:** `7e46f9b2f9b968b56cebb6f1272005409b30df3d`
**MIAM commit:** `002eadfeac2deb7b5bae84c1addf2d09a00f7544`

## Background

Previously, the embedded error formula (`Yerror = Σ e_i·K_i`) produced
near-zero entries for algebraic rows because the mass matrix diagonal
`M_ii = 0` zeroed out inter-stage coupling terms. The solver was completely
insensitive to algebraic tolerances. The fix replaces the near-zero embedded
error with the actual step change for algebraic variables.

## Behavior

With the step-change error, algebraic tolerances control how much the variable
can change per internal step before the solver rejects and retries smaller:

| Algebraic atol | SO₂(g) after 30 s | Steps | Notes |
|---|---|---|---|
| 1e-3 (loose) | −6.50e-8 | 13 | Overshoot — too few steps |
| 1e-8 | −6.50e-8 | 48 | More steps, still overshoots |
| 1e-10 | −6.25e-8 | 3,470 | Significant step reduction |
| 1e-12 (tight) | +1.73e-8 | 130,766 | No overshoot — SO₂ positive |
| 1e-14 | +1.19e-7 | 391,987 | Very tight — expensive |

**Key insight:** The guide recommends `atol = 1e-8 to 1e-10` for algebraic
variables. At these tolerances, SO₂ still goes slightly negative (within atol)
in this stiff system. For guaranteed non-negativity, `atol ≤ 1e-12` is needed
but costs ~130k steps per 30 s timestep.

## Practical tolerance strategy

For cloud chemistry with conservation constraints:

```python
ordering = micm.create_state().get_species_ordering()
abs_tols = [1e-14] * len(ordering) # tight default for differential species

# Per the guide: 1e-8 to 1e-10 for algebraic species
algebraic_species = {"SO2", "H2O2", "CLOUD.AQUEOUS.SO2_aq", ...}
for name, idx in ordering.items():
if name in algebraic_species:
abs_tols[idx] = 1e-10 # or 1e-8 for speed
```

If the balance variable going slightly negative (within atol) is unacceptable,
combine with rate damping (`max_halflife`) to limit per-step overshoot.

## Test case

**File:** `python/test/integration/test_dae_algebraic_overshoot.py`

Three tests:

- `test_algebraic_tolerance_sensitivity`: Proves that changing atol for
algebraic species from 1e-3 to 1e-10 produces >10× more steps
- `test_negative_so2_with_loose_algebraic_tolerances`: SO₂ goes negative
with loose algebraic atol (1e-3), confirming the overshoot mechanism
- `test_positive_so2_with_tight_algebraic_tolerances`: SO₂ stays positive
with tight algebraic atol (1e-12), confirming the fix prevents overshoot

```bash
pytest python/test/integration/test_dae_algebraic_overshoot.py -v
```

## Reference

See the MICM developer guide at `micm/.github/prompts/dae_algebraic_tolerance_guide.md`
for the full specification and C++ examples.
Loading