Skip to content

Latest commit

 

History

History
955 lines (770 loc) · 61.2 KB

File metadata and controls

955 lines (770 loc) · 61.2 KB

Changelog

All notable changes to this project will be documented in this file.

[0.6.0dev3] - 2026-03-15 – Kikku package, solver refactor, generic EGM operators

Architecture

  • Kikku package (bright-forest/kikku): new standalone package for stage-composition and period-graph tools.
    • period_graphs: period_to_graph, backward_paths (wavefront partition), forward_order. Handles branching stages with branch-keyed poststates. DAG acyclicity check.
    • pipeline: load_syntax (I/O boundary), instantiate_period (dolo-plus pipeline, lazy import).
    • nest: load_inter_connector (inter-period connector loading with custom YAML tag handling).
    • asva subpackage: make_egm_1d — generic 1D EGM operator factory. Takes four sub-equation callables (InvEuler, Bellman, cntn_to_dcsn, Concavity) + params array → returns @njit EGM step function.

Retirement solver refactor

  • solve.py: thin functional combinator. All I/O in load_syntax, pure transforms downstream. solve_backward is a standalone combinator. solve_nest returns (nest, model, ops, waves) for full reuse.
  • operators.py: stage operators extracted from model.py. Grids passed as arguments (reusable across grid sizes without JIT recompilation). Both worker and retiree stages use make_egm_1d from kikku. EGM sub-equation callables have standardized (pointwise, fixed_state, params) signatures.
  • model.py: thin rho output. RetirementModel delegates params to stage .calibration/.settings via __getattr__. Stores only grids + @njit callables. EGM recipe callables (WORKER_EGM_FNS, RETIREE_EGM_FNS) defined here.
  • UE method bound at methodization time (patched in stage sources before pipeline runs), not in the solve loop.
  • Period graph auto-plotted in notebook via networkx.
  • Scaling plot includes O(√N) reference line.

Installation

  • kikku added to examples and dev optional dependencies (pyproject.toml), pointing to git+https://github.com/bright-forest/kikku.git.
  • Setup script (setup/setup_venv.sh) verifies kikku import.

[0.6.0dev2] - 2026-03-09 – Docs overhaul, PEP 8 formatting, citation fixes

Documentation

  • Algorithm page (docs/algorithm/fues-algorithm.md): narrowed scope to 1D EGM, aligned notation with paper (\hat{x}'_i), softened left-turn rule to "tentatively retain", rewrote jump-threshold section, added forward/backward scan purpose statement, replaced big-O complexity table with requirements/mechanism comparison table, added new \bar{M} accuracy tip, spelling and grammar fixes.
  • Comparison table now includes linked references, complexity descriptions, and a new LTM row (Druedahl & Jørgensen 2017).
  • Removed all --- horizontal rules from docs and notebook markdown cells.
  • Added FUES scan diagram (fues-scan.svg) to the retirement notebook for MkDocs rendering.
  • Fixed install instructions: removed @release-prep and -b release-prep from docs/getting-started/installation.md.
  • Trimmed docs/examples/housing.md and docs/examples/housing-renting.md stubs.

Citation standardisation

  • All references to the FUES paper across docs, docstrings, pyproject.toml, CHANGELOG.md, README.md, and replication/README.md now consistently cite Dobrescu & Shanker (2022) with the correct SSRN link.
  • Fixed author order in pyproject.toml description (was "Shanker & Dobrescu").

Code formatting

  • fues_v0_2dev.py: PEP 8 formatted to 78-character line width — all code lines, docstrings, and comments wrapped. Docstrings trimmed for conciseness. No logic changes; all tests pass.

Notebook

  • Retirement notebook: updated paper reference to (2022), fixed "dolo-plusz" typo, re-ran all cells with latest outputs.

[0.5.0dev5] - 2026-03-05 – Versioned FUES & Scaling Optimisations

FUES versioning

  • fues.py is now a thin re-export from fues_v0_2dev (the default).
  • fues_v0_1dev.py: release-prep baseline (v0.1dev).
  • fues_v0_2dev.py: optimised version (v0.2dev) — see below.
  • UE engine registry: FUES (default=v0.2dev), FUES_V0_1DEV, FUES_V0_2DEV.

FUES v0.2dev optimisations

  • np.empty for intersection buffer (was np.full(..., np.nan)).
  • Conditional m_bar: skip max(abs,abs) when endog_mbar=False.
  • assume_sorted param + auto-detection to skip argsort + 5 fancy-index copies.
  • Linear O(K+J) merge via _merge_sorted_with_few (was O(N log N) argsort).
  • cache=True on all scan helpers (faster Numba cold starts).
  • Pre-computed abs(del_a) array (avoid per-iteration np.abs).
  • Dead code removal: commented-out variables, redundant de_1, simplified dispatch.

Cleanup

  • Removed unused correct_jumps_vf_pol from math_funcs.py.
  • Moved examples/durables/plot.py and helpers/metrics_fast.py to superseeded/.
  • Deleted stray examples/durables/Untitled.
  • Trimmed experiments/retirement/README.md.

[0.5.0dev4] - 2026-03-04 – Retirement Pipeline-First Restructure

Pipeline restructure (matsya-reviewed)

  • solve_nest (was solve_canonical) is the single entry point — three-functor override mechanism (calib_overrides, config_overrides)
  • _accrete_nest (was _build_and_solve_nest) — backward induction loop, named per DDSL accretion convention
  • instantiate_period (was build_period) — dolo-plus pipeline: parse, methodize, configure, calibrate
  • RetirementModel: all params required (no defaults), added with_test_defaults() classmethod
  • from_period reads from both .calibration and .settings via _get() helper
  • Inter-period twister loaded from syntax/nest.yaml (was hardcoded)

Cleanup

  • Removed stale src/dc_smm/ duplicate package; all imports updated to dcsmm
  • Flattened syntax/syntax/ to syntax/
  • calibration.yaml + settings.yaml are the single source of truth for all parameters
  • Deleted backward_induction() bypass pattern
  • Removed examples/retirement/params/ (replaced by syntax/ + experiments/)
  • Removed examples/retirement/code/ (empty stale directory)
  • Renamed run_experiment.py to run.py

New CLI

  • run.py: --calib-override key=value, --config-override key=value, --override-file path.yml, --method
  • benchmark.py: uses solve_nest throughout

Notebook

  • Added docs/notebooks/retirement_fues.ipynb — interactive walkthrough with plotly zoom, Nord theme, scaling sweep 1k-15k
  • mkdocs.yml: added mkdocs-jupyter plugin + Notebooks nav section

Experiment overrides

  • experiments/retirement/params/*.yml rewritten as sparse key-value overrides (only values differing from syntax/ defaults)

[0.5.0dev3] - 2025-12-16 – EGM Loop Memory and Compute Optimizations

Memory Optimizations

  • [2025-12-16] Removed policy_a storage in horses_c.py:

    • Asset policy array (policy_a) was being stored but never used downstream by metrics or plotting
    • Removed allocation, loop assignments, and return values from both EGM and VFI paths
    • Savings: ~11MB per stage × 5 stages × 20 periods = ~1.1GB per model
  • [2025-12-16] Conditional vlu_dcsn allocation in horses_c.py:

    • When delta == 1 (standard exponential discounting), vlu_dcsn = Q_dcsn mathematically
    • Skip allocating separate vlu_dcsn array; return Q_dcsn directly for both
    • Savings: ~11MB per stage for standard discounting models
  • [2025-12-16] EGM grids return optimization in horses_c.py:

    • Changed _solve_egm_loop to return None instead of empty dictionaries when store_egm_grids=False
    • Added explicit del unrefined_grids, refined_grids cleanup
    • Downstream code checks egm_grids is not None before access

Compute Optimizations

  • [2025-12-16] Skip expensive computations when delta == 1 in horses_c.py:
    • compute_gradient() - expensive gradient calculation for c_prime
    • u_func() call - utility function evaluation for vlu_dcsn transformation
    • Simplified: lambda_dcsn = uc_today, vlu_dcsn = Q_dcsn (direct assignment)
    • Savings: ~34,000 function calls skipped per model (343 iterations × 5 stages × 20 periods)

[0.5.0dev2] - 2025-12-15 – Memory Optimizations and Lazy Compilation

Note: Memory performance improvements are ongoing. Current optimizations reduce overhead but peak memory usage during sweep runs may still be high.

Memory Optimizations

  • [2025-12-15] FUES array copy optimization in fues.py:

    • Added _ensure_f64() helper to avoid unnecessary array copies when inputs are already float64 and C-contiguous
    • Before: np.asarray() always created copies; After: direct pass-through when dtype/layout matches
    • Savings: ~18MB per FUES call for typical grid sizes (5 arrays × 4000 × 8 bytes × 7 × 16)
  • [2025-12-15] DCEGM memory fixes in dcegm.py:

    • Changed np.zeros_like() + np.nan to np.full(n, np.nan) (creates 1 array instead of 2)
    • Removed unused LinearInterp object creation that was never used
    • Savings: ~130KB per DCEGM call + object overhead
  • [2025-12-15] Housing solver meshgrid optimization in horses_h.py:

    • Moved np.meshgrid and resources_liquid_3d from inside operator (per-solve) to factory closure (once per model)
    • Added del a_mesh, y_mesh to immediately free intermediate arrays
    • Savings: Avoids repeated allocation of large 3D arrays per income state per period
  • [2025-12-15] Lazy compilation in whisperer.py (activated via --low-memory):

    • New _compile_period_stages(period) compiles stages just before solving each period
    • New _clear_period_numerics(period) clears numerical grids after solving (except kept periods)
    • Memory footprint: O(1) periods instead of O(N_periods) for numerical grids
    • Timing accuracy: Period times now exclude compilation and cleanup overhead (documented in docstring)
  • [2025-12-15] Complete model cleanup in solve_runner.py:

    • New cleanup_model_complete(model) aggressively frees all solutions, models, operators, and period lists
    • Called after each configuration in sweep mode to prevent memory accumulation across MPI ranks
    • Ensures each new configuration starts with clean memory

New CLI Options

  • [2025-12-15] --skip-bundle-save flag (cli.py, solve_runner.py):
    • Skips saving solution bundles to disk entirely
    • Useful for timing runs where only metrics are needed
    • Reduces I/O overhead and disk space usage
    • Added to run_sweep_noPB_test.sh for faster sweep runs

Bug Fixes

  • [2025-12-15] Euler error NaN fix: Periods 0 and 1 (in periods_to_keep) no longer have their numerical grids cleared, preserving data needed for Euler error calculation

Technical Details

  • Lazy compilation is mutually exclusive with dynx grid sharing (documented in devspec)
  • All optimizations verified: arrays in FUES/DCEGM are read-only, so pass-through is safe
  • PBS scripts updated to use --skip-bundle-save and --low-memory flags

[0.5.0dev1] - 2025-11-29 – Retirement Example Refactoring and Configuration

  • [2025-12-14 23:45 AEST] Fixed VFI marginal utility calculation in horses_c.py:

    • Change: CPU VFI solvers (_solve_vfi_numerical and _solve_vfi_block) now use uc = alpha/c instead of 1/c for marginal utility, matching the GPU version and Cobb-Douglas utility.
    • Note: This only affects lambda_dcsn output, NOT the VFI policy optimization itself (VFI doesn't use λ - it directly maximizes the Bellman equation).
    • Added alpha parameter to VFI kernel function signatures for consistency.
  • [2025-12-14 23:30 AEST] VFI/EGM policy discrepancy was due to grid lower bounds:

    • Issue: VFI and EGM policies appeared different due to minimum grid values (a_grid[0], w_grid[0]) being set too high.
    • This caused the optimization bounds in VFI to be overly constrained at low wealth levels.
    • Fix: Adjusted grid lower bounds to appropriate values.
    • Note: NOT caused by uc formula since VFI optimization doesn't use marginal utility - it directly maximizes u(c,H) + βδV(a',H',y').
  • [2025-12-14 22:30 AEST] Fixed VFI JIT recompilation issue in horses_c.py:

    • Bug: _solve_vfi_loop was calling build_njit_utility() directly, creating a NEW Numba-compiled function on every call. This triggered ~1.8s JIT compilation overhead for every VFI stage solve, making timing appear constant regardless of grid size.
    • Fix: Changed to use get_u_func() which uses @lru_cache to return the same cached function object for identical parameters.
    • Result: After JIT warmup, VFI kernel execution now properly scales with grid size (~30ms for 1000 grid → ~60ms for 2000 grid).
    • Note: The Numba cache is cleared in PBS scripts (rm -rf $NUMBA_CACHE_DIR). For production runs, consider keeping the cache to avoid first-call JIT overhead.
  • [2025-12-14 21:00 AEST] Added avg_ownc_time_per_period metric to sweep results:

    • Computed in whisperer.py from stage_timings before they're excluded from CSV
    • Added to timing tables in generate_paper_tables.py to show VFI computational scaling
  • [2025-12-04 14:00 AEST] Added PCHIP-style monotone gradient method (piecewise_gradient_pchip) in gradients.py - uses weighted harmonic mean to preserve MPC bounds (0, 1]

  • [2025-12-04 14:15 AEST] Added q_diff > 0 condition to jump detection in _egm_preprocess_core - only add constraint points where value function is increasing

  • [2025-12-04 14:30 AEST] Modified solve_runner model factory to double a_points and a_nxt_points at runtime for improved accuracy while keeping w_points unchanged

  • [2025-12-06 10:00 AEST] Changed constraint point spacing at jumps from fixed count to mean e_diff based - points now spaced at input grid density via use_mean_spacing parameter in _egm_preprocess_core

Changed

  • Retirement example restructured into modular components:

    • plots.py: Plotting functions (EGM grids, consumption policy, DC-EGM comparison)
    • tables.py: Markdown and LaTeX table generation with parameter captions
    • benchmarks.py: Timing sweep and performance comparisons
    • run_experiment.py: CLI runner with argparse for grid size, plot age, and sweep settings
  • YAML-based model configuration for retirement experiments:

    • Added experiments/retirement/params/ with baseline, high_beta, low_delta, and long_horizon configurations
    • Model parameters (m_bar, beta, delta, etc.) now loaded from YAML and passed through the solver chain
    • m_bar (FUES jump threshold) configurable from YAML → RetirementModel → Operator_Factory → EGM_UE → FUES
  • Benchmark tables output both .md and .tex formats with parameter captions

  • Experiments reorganized: Moved run_housing_single_core.sh to experiments/housing_renting/ with job configs

  • PBS scripts updated with PBS_O_WORKDIR handling for correct path resolution; logs output to logs/

  • FUES defaults aligned between fues.py and upperenvelope.py: m_bar=1.0, lb=4

  • Upper envelope interface: Added include_intersections parameter to EGM_UE and _fues_engine

Fixed

  • Removed deprecated FUES_sep_intersect import; replaced with fues_alg(..., return_intersections_separately=True)
  • Fixed hardcoded m_bar values in worker_solver and iter_bell to use model parameter
  • PBS path resolution for scripts submitted via qsub

Added

  • scripts/setup_public_venv.sh: Creates virtual environment with dynx from GitHub

  • experiments/retirement/retirement_timings.sh: Configurable bash wrapper with sweep settings

  • logs/ directory for HPC job output; added temp/ and archive/ to .gitignore

  • Consumption deviation benchmarking (PR #9):

    • Added consumption_deviation() function in retirement.py to compare solutions against high-resolution "true" reference (default: 20k grid DCEGM)
    • Metric uses same format as Euler error: log₁₀(|c - c_true| / c_true)
    • Configurable true_grid_size and true_method parameters in YAML benchmark section
    • Restructured benchmark tables into two cleaner formats:
      • Timing table with UE/Tot sub-columns per method
      • Accuracy table with Euler/Dev sub-columns per method
    • Grid-grouped rows with delta as sub-rows for improved readability
    • Renamed l2 variables to cdev for clarity (consumption deviation, not L2 norm)

[0.5.0dev0] - 2025-08-12 – Multi-GPU Support and FUES Algorithm Cleanup

  • [2025-08-16 10:00 AEST] Major refactoring: Removed MPI support from horses_c.py, removed unused F_ownc_cntn_to_dcsn factory, standardized terminology
  • [2025-08-17 17:00 AEST] Added DGX A100 support with specialized PBS scripts, GPU kernel optimizations, and log management utilities
  • [2025-08-23 11:21 AEST] Fixed numerical stability issues in FUES algorithm for delta != 1 case
  • [2025-10-25 16:30 AEST] Fixed ZeroDivisionError in piecewise_gradient_3rd_filtered by adding zero-division protection throughout gradient computation
  • [2025-10-25 17:00 AEST] Enhanced _egm_preprocess_core to only add jump constraints for segments with at least 4 points, improving numerical stability
  • [2025-10-25 17:30 AEST] Added asset policy monotonicity filter in horses_c.py to remove points where refined asset policy is decreasing
  • [2025-10-25 18:00 AEST] Fixed boolean operator error in _egm_preprocess_core (changed 'or' to '|' for array operations)
  • [2025-10-25 18:15 AEST] Fixed array allocation in _egm_preprocess_core to correctly account for 2 segments per jump
  • [2025-10-27 15:30 AEST] Added skip_egm_plots flag to conditionally skip EGM CSV exports (plot_csv_export.py, solve_runner.py)
  • [2025-10-27 16:00 AEST] Implemented conditional EGM grid storage: skip saving EGM grids to Solution object when --skip-egm-plots enabled, reducing memory usage and pickle sizes (horses_c.py, solve_runner.py)
  • [2025-10-27 16:35 AEST] Fixed AttributeError in make_housing_model: corrected mc.periods to mc.periods_list for flag injection (solve_runner.py)
  • [2025-10-27 16:45 AEST] Added optional asset policy gradient filtering: filter_a_jumps setting removes refined grid points where da/dm exceeds max_a_gradient threshold (horses_c.py, master.yml)
  • [2025-11-08 12:00 AEST] Implemented first-order condition (FOC) checks in _egm_preprocess_core to filter constraint points based on economic optimality: only add points that satisfy Kuhn-Tucker conditions with scaled lambda values (horses_common.py, horses_c.py)
  • [2025-11-08 12:15 AEST] Modified image saving to always use timestamped directories (images_YYYYMMDD_HHMMSS) to preserve all previous runs instead of overwriting old image files (execution_settings.py, solve_runner.py)
  • [2025-11-08 12:30 AEST] Added uc_test function in horses_common.py as a simple test marginal utility for FOC verification, imported into horses_c.py for testing purposes
  • [2025-11-08 12:45 AEST] Fixed incorrect double method directory creation: removed redundant method subdirectory since we're already in bundles/hash/METHOD/images_TIMESTAMP/ structure (plots.py, plot_csv_export.py)
  • [2025-11-08 16:00 AEST] Added disable_jump_checks parameter to FUES algorithm to control manual jump check overrides: when True, forces keep_i1=False (right turn) and keep_j=True (left turn); default is False to enable checks (fues.py)
  • [2025-11-08 16:30 AEST] Optimized _egm_preprocess_core for speed: vectorized FOC checks, optimized jump detection with single mask computation, eliminated redundant array operations, replaced np.empty+fill with np.full (horses_common.py)

Added

  • Multi-GPU MPI parallelization for housing model

    • Single-node support for up to 4 GPUs with MPI
    • Multi-node support for scaling across Gadi nodes (8+ GPUs)
    • NUMA-aware CPU binding with 12 cores per MPI rank
    • MPI dispatcher in horses_h.py with GPU detection and fallback
    • MPI driver in horses_c_gpu.py with Allgatherv collectives
    • Shared cache for grid.
  • PBS scripts for GPU scaling

    • run_housing_gpu_mpi.pbs: Single-node 4 GPU execution
    • run_housing_gpu_multi_node.pbs: Multi-node 8 GPU execution
    • benchmark_gpu_scaling.pbs: Automated 1, 2, 4 GPU performance comparison
    • Scripts use same options as single-GPU version for consistency
    • run_housing_dgxa100_single.pbs: DGX A100 single GPU job (512GB RAM, 80GB GPU)
    • run_housing_dgxa100_parallel.pbs: DGX A100 4-GPU parallel execution
    • submit_dgxa100_config.sh: Submit helper accepting multiple configurations
    • move_logs_to_scratch.sh: Utility to move all logs to scratch storage

Changed

  • FUES algorithm cleanup and configurability

    • Removed 4 unused functions (uniqueEG, linear_interp, seg_intersect, line_intersect_unbounded)
    • Made epsilon parameters configurable (eps_d, eps_sep, eps_fwd_back, parallel_guard)
    • Consolidated duplicated intersection logic into _forced_intersection_twopoint() helper
    • Moved helper functions to src/dc_smm/fues/helpers/math_funcs.py
    • Merged FUES_sep_intersect into main FUES function with return_intersections_separately flag
  • GPU implementation modifications

    • Added initialize_vfh_from_config() function for MPI initialization
    • Modified V_out calculation in kernel to use formula: V = (Q - (1-delta)*u(c,h)) / delta
    • Implemented C-contiguous array handling for MPI operations
    • Convergence check performed before array swap using allreduce(MAX)
    • Policy gathering made conditional via policy_every parameter
    • Two-pass grid search in vfi_gpu_kernel: coarse then fine search (~25% fewer evaluations)
    • Pre-computed log_H_term for housing utility (avoids redundant calculations)
    • Branchless operations using max() instead of if-statements
    • Immediate memory cleanup after GPU transfers for large arrays

Architecture

  • MPI implementation structure
    • MPI logic isolated in solver layer (horses_h.py, horses_c_gpu.py)
    • Whisperer module unchanged - no modifications required
    • 1 MPI rank mapped to 1 GPU
    • Precomputed Allgatherv counts and displacements

[0.4.0dev11] - 2025-08-11 – FUES Algorithm Stability Improvements

Enhanced

  • FUES intersection calculations

    • Rewrote intersection logic to handle near-parallel segments robustly
    • Added forced intersection points that guarantee envelope continuity
    • Intersection coordinates now strictly bounded within valid intervals
    • Averaging technique reduces numerical drift at segment boundaries
  • Branch detection and continuity

    • New branch detection checks both gradient thresholds and point proximity
    • Safe extrapolation finds suitable points when direct neighbors unavailable
    • Circular buffer for backward scanning improves memory efficiency
    • Forward scan validates jumps using combined value and gradient criteria
  • Consecutive jump handling

    • Prevents numerical instabilities from multiple policy jumps in sequence
    • Drops previous jump point when consecutive jump detected
    • Maintains index consistency by removing associated intersections
    • Rule only enforced when current jump passes validation

Fixed

  • Numerical stability issues
    • Adjusted epsilon constants for better numerical behavior (EPS_D from 1e-200 to 1e-20)
    • Added parallel line guard (1e-12) for degenerate geometry detection
    • Intersection capacity increased to 2*(N-1) preventing silent truncation
    • Eliminated spurious Euler residuals at policy kinks
    • [2025-08-23] Improved float64 numerical stability for delta != 1 case:
      • Changed EPS_D from 1e-50 to 1e-14 (safe for float64 precision)
      • Increased PARALLEL_GUARD to 1e-10 for better parallel line detection
      • Added explicit float64 dtype enforcement in FUES and egm_preprocess
      • Enhanced uniqueEG() to handle near-duplicate points with tolerance
      • Fixed consumption lower bounds (1e-100 to 1e-10) in horses_common.py

Performance

  • Memory optimizations
    • Pre-allocated arrays reduce allocation overhead in hot loops
    • Circular buffer implementation minimizes memory churn
    • Uniform index bookkeeping simplifies maintenance and debugging

[0.4.0dev10] - 2025-08-03 – GPU Performance Optimizations

  • [2025-08-02 18:38 AEST] Improved CLAUDE.md documentation with better organization, version management discipline, and incorporated feedback from o3pro.
  • [2025-08-03 17:30 AEST] Fixed GPU scaling issue by implementing memory freeing during solve to prevent 193GB+ memory accumulation
  • [2025-08-03 18:15 AEST] Enhanced memory management to completely free periods 2+ while preserving periods 0,1 for Euler error calculation
  • [2025-08-03 18:45 AEST] Fixed Euler error GPU bottleneck by implementing sampling-based calculation for large grids to prevent 100GB+ memory transfers
  • [2025-08-03 15:30 AEST] Created multi-job PBS submission system for running multiple GPU configurations in parallel
  • [2025-08-03 16:00 AEST] Added income process generation script using Fella (2014) parameters for housing model
  • [2025-08-03 16:30 AEST] Verified memory freeing implementation is complete but jobs crashing before benefits visible
  • [2025-08-03 19:00 AEST] Identified issue with FUES algorithm dropping points after policy function jumps - needs intersection fallback when scans fail
  • [2025-08-08 13:37 AEST] Fixed left/right branch assignment in FUES intersection calculation - new branch should be on right (higher e_grid values), old branch on left
  • [2025-08-08 14:15 AEST] Implemented extrapolated segment intersections (extrap_segments_05_08082025_v1) - adds fallback extrapolation when forward/backward scans fail to find bracketing points, ensuring continuous piecewise-linear envelope
  • [2025-08-08 15:34 AEST] Simplified solve_runner.py Phase 1 - extracted configuration management into ConfigurationManager class, reducing main() complexity while maintaining full PBS compatibility
  • [2025-08-12 17:15 AEST] Cleaned up fues.py - removed 4 unused functions (uniqueEG, linear_interp, seg_intersect, line_intersect_unbounded) and made epsilon parameters (eps_d, eps_sep, eps_fwd_back, parallel_guard) configurable as optional function arguments while maintaining backward compatibility
  • [2025-08-12 18:30 AEST] Consolidated duplicated intersection geometry logic in fues.py - created _forced_intersection_twopoint helper function to eliminate ~150 lines of duplicate code across Cases A, C.1, and C.2, while ensuring all epsilon parameters are properly passed through the function hierarchy
  • [2025-08-12 19:00 AEST] Merged FUES and FUES_sep_intersect functions in fues.py - consolidated into a single FUES function with a return_intersections_separately flag for simplified API and improved maintainability.
  • [2025-08-12 19:15 AEST] Cleaned up fues.py formatting - removed redundant comments, excessive blank lines, and obvious inline comments to improve code readability while maintaining functionality
  • [2025-08-12 19:30 AEST] Simplified function signatures in fues.py - refactored _forced_intersection_twopoint and add_intersection_from_pairs_with_sep to accept L and R as tuples instead of 20 individual parameters, improving code clarity
  • [2025-08-12 19:45 AEST] Fixed Numba compilation issue - removed @njit decorator from FUES wrapper function as it's unnecessary (only _scan needs JIT compilation) and was causing return type inconsistency errors
  • [2025-08-12 20:00 AEST] Refactored FUES helpers - moved intersection and circular buffer utilities from fues.py to helpers/math_funcs.py for better code organization and reusability.
  • [2025-08-12 20:15 AEST] Applied PEP8 formatting to fues.py - cleaned up whitespace, fixed spacing around operators, improved line breaks for better readability
  • [2025-08-12 20:30 AEST] Fixed constants handling - moved EPS_D, EPS_SEP, and PARALLEL_GUARD constants from math_funcs.py back to fues.py where they belong, removed default parameter values that used these constants
  • [2025-08-08 15:45 AEST] Renamed ConfigurationManager to ExecutionSettings to distinguish PBS execution settings from model configuration YAML
  • [2025-08-08 16:15 AEST] Implemented clean left/no jump logic (clean_left_no_jump_logic_05_08082025.md) - allows consecutive no-jump left turns while preventing consecutive jumps via demotion, adds jump_now condition to intersection logic, ensures uniform index bookkeeping across all cases
  • [2025-08-08 16:25 AEST] Fixed NameError in solve_runner.py - corrected missed variable rename from cfg_container to model_config in CircuitRunner initialization
  • [2025-08-08 16:40 AEST] Fixed critical FUES implementation errors causing all points to be dropped:
    • Fixed undefined variable 'left_turn' -> 'left_turn_any' in backward_scan_combined call
    • Fixed uninitialized variable 'j' in first iteration (i=0)
    • Added missing 'not_allow_2lefts' parameter to both _scan function calls in FUES and FUES_sep_intersect wrappers
  • [2025-08-08 16:50 AEST] Applied additional FUES fixes from 05pro_fues_dev1_fixes.md:
    • Fixed index update logic in Case C.1 when j is dropped - prev_j now correctly points to k (current tail) instead of dropped j
    • Fixed value-fall state flags - last_turn_left now correctly set to False (value fall is not a geometric turn)
    • Increased intersection capacity from N//2 to 2*(N-1) to prevent silent truncation in pathological cases
  • [2025-08-08 17:05 AEST] Implemented _scan_v2 from right_as_left_no2jumps.md - cleaner, more compact FUES implementation:
    • Single case_id encoding (turn<<1)|jump for simpler branching logic (4 cases: RTNJ, RTJ, LTNJ, LTJ)
    • Different consecutive jump handling: keeps second jump but drops previously jumped-to point j and undoes its intersections
    • Uniform index updates across all cases for better maintainability
    • Intersections only added on jump iterations with robust extrapolation fallback
    • Updated both FUES and FUES_sep_intersect wrappers to use _scan_v2
  • [2025-08-08 17:20 AEST] Applied no_two_jumps.md refinement - only enforce consecutive jump rule when current jump is kept:
    • Removed early unconditional consecutive jump enforcement block
    • RTJ case: only drops previous j when keep_i1 is True (current jump is validated and kept)
    • LTJ case: enforces rule at start since i+1 is always kept by construction
    • Ensures "no two jumps" rule only applies when we're actually accepting the current jump
  • [2025-08-08 18:00 AEST] Implemented strict bracket enforcement for FUES intersections - ensures intersections always lie within (e_j, e_{i+1}):
    • Replaced loose e_min/e_max window check with strict _between_open(intr_x, e_grid[j], e_grid[i+1], EPS_SEP) validation
    • Added _clip_open to clamp intersection x-coordinate into valid interval with safety margin
    • Recompute intersection y-coordinate at clamped x using both line equations and average
    • Applied to all three intersection cases: Case A (right-turn jump), Case C.1 (left turn, j dropped), Case C.2 (left turn, j kept)
    • Prevents spurious off-interval intersections that corrupt envelope geometry on next iteration
  • [2025-08-09 10:00 AEST] Implemented forced intersection points for all kept jumps - eliminates Euler equation residual gaps:
    • Added _force_crossing_inside() function to guarantee valid intersections even for near-parallel lines
    • Implemented adaptive separation min(EPS_SEP, 0.25*interval_length) to handle small intervals
    • Modified all three cases (RTJ, LTJ j-dropped, LTJ j-kept) to use forced intersections
    • Ensures piecewise-linear envelope with explicit kinks at all discrete choice switches
  • [2025-08-09 11:00 AEST] Added comprehensive debug printing for intersection analysis:
    • Added debug parameters to _scan and FUES functions with specific region filtering
    • Prints intersection details including flag, point, indices, liquid savings values, and policies
    • Default debug region set to e: [31.3, 32], v: [6.71, 6.75] for targeted analysis
  • [2025-08-09 11:30 AEST] Replaced complex interpolation with cleaner implementation for non-ConSav methods:
    • Added interp_clean() function with simpler, more robust extrapolation logic
    • Modified horses_c.py to use interp_clean for Q_dcsn and policy interpolation when method != "CONSAV"
    • Addresses suspected interpolation issues causing FUES instabilities
  • [2025-08-09 12:00 AEST] Enhanced forward scan logic in Case A (right-turn jump):
    • Added jump verification when g_1 > g_f_vf_at_idx condition is met
    • Now checks if gradient from i+1 to idx_f exceeds jump threshold (m_bar)
    • Only sets keep_i1=True when both value condition AND jump are confirmed
  • [2025-08-10 10:15 AEST] Refactored e_grid to x_dcsn_hat in fues.py for improved clarity and consistency with paper notation.

Fixed

  • CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
    • Reduced thread block sizes from 1024 to 256-512 threads to avoid GPU resource exhaustion
    • VFI kernel: (8,8,8)=512, Housing Owner: (8,8,4)=256, Housing Renter: (16,16)=256
    • All GPU kernels now launch successfully with HIGH_RES_SETTINGS

Added

  • GPU-accelerated shock integration

    • New shock_integration_kernel replaces CPU-based np.einsum
    • Automatic GPU dispatch when compute="GPU" and problem size > 1000
    • Expected 5-10x speedup for shock integration operations
  • Development specifications

    • multi_gpu_parallel_architecture_03082025.md: Full 4-GPU + 48-CPU architecture
    • vfi_hdgrid_gpu_parallel_03082025.md: Focused VFI-only 4-GPU parallelization

Changed

  • GPU kernel optimizations
    • VFI kernel restored to proper 3D parallelization (was 2D with loop)
    • Fixed serialization bottleneck in wealth dimension
    • All kernels now use balanced thread configurations for stability
    • Expected 3-5x speedup from proper parallelization

Technical Details

  • Resource management:
    • Complex kernels require fewer threads due to register pressure
    • Trade-off: more kernel launches but successful execution
    • GPU memory usage: ~400MB for test grids, scales linearly

[0.4.0dev9] - 2025-08-02 – GPU Underutilization Fix

Fixed

  • GPU underutilization warning for small grids
    • Fixed "Grid size 1 will likely result in GPU under-utilization" warning
    • Added adaptive thread block sizing when grid dimensions are very small
    • Ensures minimum GPU occupancy by adjusting thread configuration dynamically
    • Affects: horses_h.py (owner/renter choice) and horses_c_gpu.py (VFI solver)
    • Small test grids now launch with better GPU utilization

Technical Details

  • Adaptive kernel configuration:
    • Detects when total blocks would be ≤ 2 and reduces thread block size
    • Maintains correctness while improving GPU occupancy for test configurations
    • Example: 1×1 grid now uses 1×1 threads instead of 16×16, avoiding warnings

[0.4.0dev8] - 2025-08-02 – GPU Kernel Fix and FUES Algorithm Reorganization

Fixed

  • GPU VFI kernel launch failure
    • Fixed missing @cuda.jit decorator on calculate_continuation_values_gpu_kernel function
    • Resolved CUDA_ERROR_INVALID_VALUE by converting 3D grid to 2D grid with internal loop
    • Changed from cuda.grid(3) to cuda.grid(2) to avoid CUDA's Z-dimension limit (65535 blocks)
    • Reduced thread block configuration from (16,16,4) to (16,16) for better compatibility
    • GPU kernel now handles 4+ million grid points without exceeding CUDA limits

Changed

  • FUES algorithm version reorganization

    • Renamed fues_2dev5.pyfues.py as current production version
    • Renamed original fues.pyfues_v0dev.py (October 2024 paper version)
    • Moved all experimental versions (fues_2dev1-8) to src/dc_smm/fues/experimental/
    • Updated all method references: FUES2DEV5FUES, FUES2DEV*FUES
    • Upper envelope registry updated: @register("FUES") for production, @register("FUES_V0DEV") for paper version
  • Repository cleanup for public release

    • Enhanced .gitignore to exclude HPC output files, backup directories, working notes
    • Added examples/README_OUTPUTS.md explaining output directory structure
    • Excluded all generated images/results from version control (best practice)
    • Python build artifacts (*.egg-info) now properly ignored

Technical Details

  • GPU fix details:

    • Problem: 3D grid with dimensions (250, 250, 64) = 4M points exceeded CUDA Z limit
    • Solution: 2D grid (n_H, n_Y) with internal loop over n_W dimension
    • Maintains same computation pattern while respecting CUDA architecture limits
  • Files reorganized:

    • src/dc_smm/fues/__init__.py - Updated imports
    • src/dc_smm/uenvelope/upperenvelope.py - Updated engine registrations
    • 11 example/test files updated with new method references
    • Fixed all legacy import paths (dc_smm.fues.legacy.* no longer exists)
  • Repository structure:

    src/dc_smm/fues/
    ├── fues.py              # Current production (was fues_2dev5)
    ├── fues_v0dev.py        # Original paper version
    └── experimental/        # All experimental versions
    

Performance Impact

  • GPU kernel now successfully launches for high-resolution grids
  • Expected 3-5x speedup for VFI GPU solver vs CPU
  • Eliminates memory transfer bottleneck by keeping computation on device

[0.4.0dev7] - 2025-07-31 – Walltime Optimization and Selective Model Loading

Added

  • Comparison metrics filtering for baseline-only runs

    • Added --comparison-metrics parameter to specify which metrics require baseline loading
    • Automatically skips comparison metrics when running only baseline method to prevent self-comparisons
    • Saves ~45 minutes of unnecessary computation on baseline-only GPU runs
  • Selective model loading for memory efficiency

    • Added --load-periods parameter to load only specific period indices
    • Added --load-stages parameter for fine-grained stage filtering per period
    • Reduces loading from 75 to 18 pickle files (76% reduction) for Euler error calculations
    • Integrated with DynX's enhanced load_circuit() function

Changed

  • Smart metric execution based on method selection

    • Baseline method now temporarily excludes comparison metrics during its own execution
    • Comparison metrics (dev_c_L2, plot_c_comparison, plot_v_comparison) only run for fast methods
    • Prevents meaningless baseline vs baseline comparisons that always return 0
    • Improves walltime efficiency for GPU baseline computations
  • Updated single-core loading script

    • Modified run_housing_single_core.sh to use selective loading for existing models
    • Added explanatory comments about loading requirements for Euler error
    • Maintains backward compatibility when parameters not specified

Fixed

  • GPU walltime exceeded errors
    • Identified that metrics calculation phase was pushing baseline runs over 10-hour limit
    • Baseline solving completed at 9h 13m, but metrics added >47m causing walltime kill
    • Solution: skip unnecessary comparison metrics on baseline-only runs

Technical Details

  • Files modified:
    • examples/housing_renting/solve_runner.py - Added comparison metrics filtering and loading options
    • scripts/pbs/run_housing_single_core.sh - Added selective loading parameters
    • examples/housing_renting/helpers/euler_error.py - Added precompilation function
  • Euler error requirements: Only needs Period 0 (OWNC stage) and Period 1 (all stages)
  • Performance impact: Prevents walltime exceeded errors, reduces I/O by 76% when loading models
  • Integration: Works with DynX v1.7.0 selective loading features

Performance

  • Euler error precompilation

    • Added precompile_euler_error_cpu() function to warm up Numba JIT cache
    • Eliminates ~30-60 second compilation overhead on first Euler error calculation
    • Automatically runs during initialization when euler_error metric is requested
    • Uses minimal dummy data for fast compilation
    • Fixed utility function expressions to match standard CRRA housing model
  • Metric-specific selective loading for comparison metrics

    • Comparison metrics now load only Period 0, OWNC stage from baseline (instead of all 5 periods)
    • Reduces baseline loading from 75 to 3 pickle files per comparison (96% reduction)
    • Each fast method saves ~42 seconds on baseline loading for comparisons
    • Total time saved for 4 fast methods: ~168 seconds

[0.4.0dev6] - 2025-07-26 – FUES Code Cleanup and Optimization

Changed

  • Refactored FUES scan logic for better code organization

    • Extracted forward scan logic into dedicated forward_scan_case_a() function
    • Combined backward scan and find_backward_same_branch into unified backward_scan_combined() function
    • Eliminated nested loops in favor of cleaner function calls while maintaining exact algorithm behavior
    • Removed redundant pre-allocated arrays (g_f_vf, g_f_a, g_m_vf, g_m_a) with on-the-fly computation
    • Memory savings of 4*N floats per scan operation
  • Fixed circular buffer iteration order

    • Discovered that fues_2dev1 had incorrect backward scan order (oldest to newest instead of newest to oldest)
    • fues_2dev4 correctly implements the intended behavior: selecting the closest (most recent) point
    • Both versions kept for comparison purposes with documented behavioral differences
  • Improved numerical stability for intersection points

    • Changed intersection point separation from 1e-50 to 1e-8
    • Prevents divide-by-zero errors in numpy gradient calculations
    • Maintains accuracy while avoiding numerical precision issues

Fixed

  • Index consistency bug

    • Fixed idx_f being used as both loop counter and grid index
    • Now correctly stores actual grid index: idx_f = i+2+f
    • Ensures correct segment selection for intersection calculations
  • Missing circular buffer updates

    • Added missing m_head = circ_put(m_buf, m_head, j) when j is dropped
    • Fixed consecutive left turn handling to properly maintain buffer state
    • Added prev_j tracking for correct j restoration
  • Spurious intersection handling

    • Added added_intersection_last_iter flag to track intersection creation
    • Remove last intersection on consecutive left turns to avoid spurious points
    • Improved intersection point management for discrete choice switches

Technical Details

  • Files modified:
    • src/dc_smm/fues/fues_2dev4.py - Refactored version with correct backward scan
    • src/dc_smm/fues/fues_2dev1.py - Original version with backward scan bug (kept for comparison)
    • src/dc_smm/fues/fues_2dev1_working_backup_dev1.py - Backup of working version
  • Performance impact: Reduced memory allocation and improved cache locality
  • Backward compatibility: Both versions produce valid upper envelopes, just with different point selection in edge cases

[0.4.0dev5] - 2025-01-28 – Enhanced FUES with Intersection Points

Added

  • Intersection point tracking in FUES algorithm

    • Implemented intersection point detection as described in Dobrescu & Shanker (2022) Section 2.1.3
    • Added add_intersections parameter to FUES() function (default: True) for enhanced accuracy around crossing points
    • Forward scan intersection detection during right-turn jumps identifies where choice-specific value functions cross
    • Backward scan intersection storage during left-turn elimination captures suboptimal point intersections
    • Intersection points include interpolated policy values at crossing locations for complete solution representation
  • Memory-efficient intersection storage

    • Pre-allocated intersection arrays (10% of grid size) to maintain O(n) complexity
    • Automatic merging of original EGM points with intersection points, sorted by endogenous grid values
    • Configurable intersection tracking with backward compatibility when disabled

Changed

  • Enhanced _scan function with intersection tracking
    • Added track_intersections and policy_2 parameters for comprehensive intersection detection
    • Consistent return format for all code paths to maintain Numba compatibility
    • Improved boundary checking in forward scan to prevent array index errors

Technical Details

  • Intersection detection algorithm:
    # Forward scan: detect crossings when jumping to new value function branch
    inter_point = seg_intersect(p1, p2, p3, p4)  # Line-line intersection
    # Interpolate policies at intersection point
    t = (inter_point[0] - e_grid[i+1]) / (e_grid[b_idx] - e_grid[i+1])
    inter_p1[n_inter] = (1-t) * a_prime[i+1] + t * a_prime[b_idx]
  • Files modified: src/dc_smm/fues/fues_2dev1.py
  • Performance impact: Minimal overhead when intersections disabled; ~10% memory increase when enabled
  • Accuracy improvement: Better representation of value function upper envelope around choice-specific crossings

[0.4.0dev4] - 2025-01-28 – EGM Plotting Fix & Memory Management Enhancements

Fixed

  • EGM plots generation for all EGM-based methods
    • Fixed key prefix mismatch in plot_egm_grids() function where plotting code was looking for unprefixed keys (e.g., "0-7") but EGM data was stored with prefixed keys (e.g., "e_0-7", "Q_0-7")
    • EGM plots now generate correctly for FUES2DEV, CONSAV, DCEGM, and other EGM-based methods
    • Updated both unrefined and refined grid access to use proper prefixed key formats
    • Added proper error handling for missing EGM data components

Changed

  • Plot metrics configuration logic
    • Plot metrics are now only included in computation when explicitly requested in --metrics list
    • Removed incorrect behavior where --plots flag would automatically include plot metrics in computation
    • Improved separation between traditional plot generation (--plots) and plot-based metric computation (--metrics plot_c_comparison)

Added

  • Comprehensive debugging for EGM data flow
    • Added targeted debugging in plot_egm_grids() to verify EGM data availability and key formats
    • Enhanced error messages for missing or malformed EGM grid data
    • Created systematic approach for debugging data flow from solution storage to plotting

Technical Details

  • Key format changes in plots.py:
    # Before (incorrect):
    e_grid_unrefined = egm_data["unrefined"][grid_key]  # Looking for "0-7"
    
    # After (correct):
    prefixed_e_key = f"e_{grid_key}"  # Looking for "e_0-7"
    e_grid_unrefined = unrefined_dict.get(prefixed_e_key)
  • Files modified: examples/housing_renting/helpers/plots.py, examples/housing_renting/solve_runner.py
  • Impact: Visual validation of endogenous grid method upper envelope refinement process now available for all EGM-based methods

[0.4.0dev3] - 2025-07-20 – Streamlined Method Configuration & CONSAV Fix

Added

  • Dynamic baseline method selection via --baseline-method flag with auto-detection based on --gpu flag
  • Configurable fast methods via --fast-methods flag (default: FUES2DEV,CONSAV)
  • Automatic baseline inclusion via --include-baseline flag for cleaner single-core workflows
  • Enhanced module docstring with comprehensive examples for GPU, MPI, and baseline loading workflows

Changed

  • Method configuration no longer requires editing source code - all methods configurable via command-line flags
  • Backward compatibility maintained - existing scripts work without modification

Fixed

  • CONSAV engine argument handling - fixed AttributeError when u_func["args"] expects dictionary format
  • Method selection logic streamlined to eliminate hardcoded baseline/fast method lists

[0.4.0dev2] - 2025-07-20 – Metric Accuracy

Added

  • Professional, publication-quality comparison plots for policy and value functions via plot_comparison_factory in helpers/metrics.py.
  • Plots now use proper interpolation: Both fast and baseline methods are compared on a common grid, matching the logic used in L2 error metrics.
  • Plots are saved in the bundle directory for each parameter/method, keeping results organized and reproducible.
  • X-axis uses real economic grid values (e.g., wealth, housing) instead of indices, for interpretability.
  • Error plot features: Zero reference line, error bars, statistics box (max/mean error), and improved styling for publication-quality output.
  • Docstrings for all metrics and plotting functions updated to explain interpolation, grid handling, and scientific accuracy.
  • Example usage and configuration included in docstrings for both plotting and L2 metrics.

Changed

  • L2 error and plotting metrics now always compare on a common grid, ensuring scientifically accurate, like-for-like comparisons regardless of discretization.
  • Improved error handling and warnings for grid mismatches, shape incompatibilities, and extraction failures.
  • All changes are fully integrated with CircuitRunner and its bundle management system, so plots and metrics are always associated with the correct parameter set.

Fixed

  • Bugfixes to Euler error metric: Improved threshold handling and interpolation logic to avoid NaN results and ensure robust error calculation for all methods.
  • Plotting function scope issues: Fixed closure variable capture and array indexing errors in plotting configuration.
  • Value function extraction: Now uses correct model attribute names (vlu instead of v) and solution types for robust extraction.

[0.4.0] - 2025-06-16 – GPU-Accelerated VFI Solver

Added

  • GPU-Accelerated VFI Solver (VFI_HDGRID_GPU)

    • Implemented a new solver backend using Numba CUDA to offload the VFI dense grid search to NVIDIA GPUs.
    • The vfi_gpu_kernel performs the core computation in parallel across thousands of GPU threads.
    • The solve_vfi_gpu host function manages data transfers (CPU↔GPU) and kernel launches.
    • This provides a significant performance increase for high-density baseline calculations, enabling larger and more complex models to be solved within practical time limits.
  • Dynamic and Shared Memory on GPU

    • The GPU kernel now uses dynamic shared memory to dramatically reduce slow global memory access, a key optimization for performance.
    • The launcher calculates the required shared memory size at runtime, allowing the kernel to handle variable-sized grids without hardcoded limits.
  • Unified Solver and Pre-compilation Workflow

    • solve_runner.py is now the single entry point for all workflows (CPU, MPI, and GPU).
    • A new --precompile flag intelligently warms up the correct Numba cache (either CPU or GPU) based on the selected method.
    • The framework now automatically uses minimal grid settings during pre-compilation to prevent GPU out-of-memory errors.
  • Robust GPU-Compatible Helper Functions

    • Created a GPU-safe interp_gpu function to perform linear interpolation, as np.interp is not supported in CUDA kernels.
    • Implemented a "dispatcher pattern" for utility functions, using an integer ID to select between pre-compiled, static GPU device functions (u_func_gpu_crra, etc.). This is the robust solution for handling different functional forms on the GPU.

Fixed

  • GPU Compilation Errors: Resolved a series of TypingError and NameError issues by:
    • Replacing unsupported function calls (np.interp, cuda.lib.isinf) with GPU-compatible equivalents (interp_gpu, math.isinf).
    • Correctly handling function namespaces (math vs. np) inside device code.
    • Eliminating the use of unsupported closures as kernel arguments.
  • GPU Out-of-Memory Errors: Fixed CudaAPIError: [700] by ensuring the pre-compilation step uses a minimal memory footprint.

[0.3.0dev4] - Planned – Hierarchical MPI Parameter Sweeps

Planned Features

  • Hierarchical MPI parameter sweep architecture

    • Two-level MPI communicators: COMM_TOP for parameter distribution, COMM_SOLVER for intra-node VFI computation
    • Enable scaling to large parameter spaces (e.g., 50 parameter combinations × 45 cores each = 2250 total cores)
    • Each node runs complete baseline+fast workflow for one parameter combination
  • Memory-efficient parameter processing

    • Apply solve→plot→delete→gc pattern from solve_runner.py to parameter sweeps
    • Process each parameter combination sequentially to avoid memory accumulation
    • Immediate model cleanup after plotting and metric extraction
  • DynX Sampler integration for parameter sweeps

    • Replace manual parameter grid construction with built-in Cartesian sampler
    • Canonical column ordering and robust parameter space handling
    • Support for both list (PATH=v1,v2,v3) and range (PATH=min:max:N) parameter specifications
  • Enhanced bundle management for parameter caching

    • Hash-based bundle directories for each parameter combination
    • Automatic skip of completed parameter combinations
    • Robust restart capability for interrupted parameter sweeps
    • Method-aware bundle organization (VFI_HDGRID, FUES, CONSAV in separate subdirectories)

Implementation Strategy

  • Phase 1: Core architecture with hierarchical MPI and sampler integration
  • Phase 2: Integration with proven solve_runner patterns and bundle management
  • Phase 3: CLI enhancement and workflow optimization
  • Migration Path: Create param_sweep_v2.py alongside existing implementation

Expected Benefits

  • Performance: 10-100x reduction in peak memory usage for large parameter sweeps
  • Scalability: Linear scaling to hundreds of parameter combinations across multiple nodes
  • Robustness: Automatic restart capability and bundle corruption recovery
  • Maintainability: Code reuse from solve_runner and elimination of manual parameter bookkeeping

[0.3.0dev3] - 2025-06-13 – MPI Error Handling & Memory Optimization

Added

  • Comprehensive MPI warning suppression

    • Added environment variables to suppress non-fatal MPI collective communication warnings (LOG_CAT_ML, basesmuma, ml_discover_hierarchy)
    • New MPI configuration variables: OMPI_MCA_coll_ml_priority=0, OMPI_MCA_coll_hcoll_enable=0, and BTL layer warning suppressions
    • Implemented stderr filtering in MPI scripts to remove noise while preserving genuine errors
  • Numba cache management for MPI environments

    • Added automatic Numba cache clearing before MPI runs to prevent cache corruption
    • Implemented process-specific cache directories (NUMBA_CACHE_DIR=/tmp/numba_cache_$$)
    • Added NUMBA_DISABLE_CACHE=1 and NUMBA_NUM_THREADS=1 for MPI safety
  • Memory-efficient model processing workflow

    • Implemented immediate model processing pattern: solve → extract metrics → generate plots → delete model → garbage collect
    • Added per-model memory cleanup with explicit del model and gc.collect() calls
    • Replaced batch processing with sequential processing to minimize peak memory usage
  • Enhanced logging and error tracking

    • Added timestamped log files for both stdout and stderr with tee command
    • Implemented comprehensive error logging while maintaining screen output visibility
    • Added run completion status reporting with exit codes

Changed

  • Solve runner workflow optimization

    • Modified solve_runner.py to process each model individually instead of keeping all models in memory
    • Baseline and fast methods now follow identical solve-plot-delete pattern
    • Replaced mpi_map batch processing with individual runner.run() calls for better memory control
    • Updated metrics collection to use all_metrics list instead of DataFrame concatenation
  • MPI script robustness

    • Enhanced circuit_run_HR_mpi.sh with comprehensive error suppression and cache management
    • Added pre-run cache cleaning and post-run status reporting
    • Implemented filtered stderr to separate MPI noise from application errors

Fixed

  • Numba compilation race conditions

    • Resolved KeyError exceptions in Numba caching system during concurrent MPI compilation
    • Fixed ReferenceError: underlying object has vanished errors during object serialization
    • Eliminated cache corruption issues when multiple MPI processes compile identical functions
  • Memory management issues

    • Fixed memory accumulation when processing multiple models sequentially
    • Resolved potential memory leaks by ensuring proper model cleanup after plotting
    • Eliminated peak memory spikes by processing models one at a time
  • MPI communication noise

    • Suppressed non-fatal basesmuma component warnings that cluttered error logs
    • Filtered out ml_discover_hierarchy and collective communication layer warnings
    • Maintained visibility of genuine MPI errors while removing infrastructure noise

Technical Details

  • Error patterns addressed:

    • KeyError: ((Array(int32, 1, 'C', False, aligned=True), ...)) in Numba caching
    • ReferenceError: underlying object has vanished during serialization
    • [LOG_CAT_ML] component basesmuma is not available MPI warnings
    • Memory exhaustion from keeping multiple large models in memory simultaneously
  • Environment variables added:

    NUMBA_DISABLE_CACHE=1
    NUMBA_CACHE_DIR=/tmp/numba_cache_$$
    NUMBA_NUM_THREADS=1
    OMPI_MCA_coll_ml_priority=0
    OMPI_MCA_coll_hcoll_enable=0
    OMPI_MCA_btl_base_warn_component_unused=0

[0.3.0dev2] - 2025-06-12 – MPI-enabled VFI_HDGRID & root-only workflow

Changed

  • Runner metric now specific to each model -- metrics is local rather than being imported from dynx.runner.metrics.deviations

[0.3.0dev1] - 2025-06-07 – MPI-enabled VFI_HDGRID & root-only workflow

Added

  • MPI parallelization for VFI_HDGRID

    • New memory-slim MPI implementation that scatters value-function slices to workers instead of broadcasting the full tensor.
    • Workers hold virtually zero memory after each stage, enabling large-scale runs on clusters (e.g. NCI Gadi).
    • Provides bit-for-bit identical results between serial and MPI modes.
  • Two-step baseline workflow

    • New CLI flags (--baseline-only, --use-baseline, --fresh-fast) for separating expensive HD-grid construction from fast method comparisons.
    • Allows building a baseline once on many cores and reusing it for subsequent fast solver runs on a single core.
    • Leverages CircuitRunner's built-in save_by_default and load_if_exists functionality.
  • MPI-aware operator factories & solvers

    • Updated horses_c.py and whisperer.py to be rank-aware.
    • Workers now receive lightweight stub Solution objects for non-MPI stages, preventing deadlocks and memory bloat.
    • No heavy .sol objects are ever broadcast back to workers.
  • Configurable plotting comparison system

    • New plot_comparison_factory() function in helpers/metrics.py creates configurable plotting metrics for comparing fast methods against baseline solutions.
    • Generates difference plots between policy/value functions of different solution methods (e.g., FUES vs VFI_HDGRID).
    • Configurable state-space slicing allows plotting specific indices of multi-dimensional arrays.
    • Supports both consumption policies (c) and value functions (vlu) with automatic detection of solution attributes.
    • Integrated with CircuitRunner's metric system for seamless workflow integration.
    • Uses existing _extract_policy() function for robust data extraction from complex model structures.
    • Memory-efficient design stores baseline model temporarily and cleans up automatically.

Changed

  • Streamlined terminal value initialization
    • The initialize_terminal_values function in whisperer.py now only processes consumption stages (OWNC and RNTC), eliminating wasteful placeholder grids for housing and tenure stages.
    • Saves 150-300 MB of RAM on large grids and speeds up terminal pass by ~10%.

Removed

  • Legacy broadcast MPI mode and --legacy-bcast flag.
  • Redundant synchronization calls (_sync_perch_solutions) from whisperer.py.
  • Unused utility functions and imports for a cleaner, more maintainable codebase.
  • Over-engineered baseline I/O in favor of CircuitRunner's native bundle management.

Fixed

  • Hash collision bug where __runner.mode was incorrectly included in param_paths, preventing fast methods from loading the correct baseline bundle.
  • Deadlocks caused by workers returning None instead of lightweight stubs for non-MPI stages.
  • Unnecessary recomputation of fast methods when a baseline was loaded.
  • Timing metrics now correctly captured and displayed in the summary tables.
  • Plot comparison function scope issues where parameter variables from factory function weren't properly captured in closure.
  • Array indexing errors in plotting configuration by using 0-indexed bounds instead of array size.
  • Value function extraction by using correct model attribute names (vlu instead of v) and solution types.
  • Euler error calculation thresholds made more flexible and based on model's borrowing constraint to prevent NaN results for FUES methods.

[0.2.0 dev4] - 2025-06-09 – MPI-safe baseline & lean workers

Changed

  • mpi_run takes in a solver communicator which splits each run across solvers. Only master rank processes the metrics and loading/saving.
  • outputs across mpi and non-mpi runs are not consistent.
  • basic HF vf grid comparison for housing renting model using MPI. (compares to single parameter run in circuit_runner_solving.py)

Added

  • Root-only metrics path CircuitRunner.run() now skips the expensive metric_fns block on non-root ranks; workers only return lightweight timing info. → prevents N× baseline reloads and cuts RAM usage on large jobs.

  • Global MPI helpers (_MPI_COMM / _MPI_RANK / _MPI_SIZE) initialised once at import time; used throughout the runner/solver stack to gate code that should execute only on rank 0.

Changed

  • mpi_map() rewritten for clarity

    • Always returns a pair (df, models) (second element [] when models are not gathered).
    • Serial code-path untouched; MPI path defers metrics to rank 0.
  • Stage compilation log-level compile_all_stages() prints INFO messages only when the caller set --verbose; otherwise it downgrades to DEBUG to keep worker logs clean.

  • Config patching (patch_cfg)

    • Consumption stages now carry a cheap "compute": "SINGLE" flag for fast methods; `