All notable changes to this project will be documented in this file.
- Kikku package (bright-forest/kikku): new standalone package for stage-composition and period-graph tools.
period_graphs:period_to_graph,backward_paths(wavefront partition),forward_order. Handles branching stages with branch-keyed poststates. DAG acyclicity check.pipeline:load_syntax(I/O boundary),instantiate_period(dolo-plus pipeline, lazy import).nest:load_inter_connector(inter-period connector loading with custom YAML tag handling).asvasubpackage:make_egm_1d— generic 1D EGM operator factory. Takes four sub-equation callables (InvEuler, Bellman, cntn_to_dcsn, Concavity) + params array → returns@njitEGM step function.
solve.py: thin functional combinator. All I/O inload_syntax, pure transforms downstream.solve_backwardis a standalone combinator.solve_nestreturns(nest, model, ops, waves)for full reuse.operators.py: stage operators extracted frommodel.py. Grids passed as arguments (reusable across grid sizes without JIT recompilation). Both worker and retiree stages usemake_egm_1dfrom kikku. EGM sub-equation callables have standardized(pointwise, fixed_state, params)signatures.model.py: thin rho output.RetirementModeldelegates params to stage.calibration/.settingsvia__getattr__. Stores only grids +@njitcallables. EGM recipe callables (WORKER_EGM_FNS,RETIREE_EGM_FNS) defined here.- UE method bound at methodization time (patched in stage sources before pipeline runs), not in the solve loop.
- Period graph auto-plotted in notebook via networkx.
- Scaling plot includes
O(√N)reference line.
kikkuadded toexamplesanddevoptional dependencies (pyproject.toml), pointing togit+https://github.com/bright-forest/kikku.git.- Setup script (
setup/setup_venv.sh) verifies kikku import.
- Algorithm page (
docs/algorithm/fues-algorithm.md): narrowed scope to 1D EGM, aligned notation with paper (\hat{x}'_i), softened left-turn rule to "tentatively retain", rewrote jump-threshold section, added forward/backward scan purpose statement, replaced big-O complexity table with requirements/mechanism comparison table, added new\bar{M}accuracy tip, spelling and grammar fixes. - Comparison table now includes linked references, complexity descriptions, and a new LTM row (Druedahl & Jørgensen 2017).
- Removed all
---horizontal rules from docs and notebook markdown cells. - Added FUES scan diagram (
fues-scan.svg) to the retirement notebook for MkDocs rendering. - Fixed install instructions: removed
@release-prepand-b release-prepfromdocs/getting-started/installation.md. - Trimmed
docs/examples/housing.mdanddocs/examples/housing-renting.mdstubs.
- All references to the FUES paper across docs, docstrings,
pyproject.toml,CHANGELOG.md,README.md, andreplication/README.mdnow consistently cite Dobrescu & Shanker (2022) with the correct SSRN link. - Fixed author order in
pyproject.tomldescription (was "Shanker & Dobrescu").
fues_v0_2dev.py: PEP 8 formatted to 78-character line width — all code lines, docstrings, and comments wrapped. Docstrings trimmed for conciseness. No logic changes; all tests pass.
- Retirement notebook: updated paper reference to (2022), fixed "dolo-plusz" typo, re-ran all cells with latest outputs.
fues.pyis now a thin re-export fromfues_v0_2dev(the default).fues_v0_1dev.py: release-prep baseline (v0.1dev).fues_v0_2dev.py: optimised version (v0.2dev) — see below.- UE engine registry:
FUES(default=v0.2dev),FUES_V0_1DEV,FUES_V0_2DEV.
np.emptyfor intersection buffer (wasnp.full(..., np.nan)).- Conditional
m_bar: skipmax(abs,abs)whenendog_mbar=False. assume_sortedparam + auto-detection to skipargsort+ 5 fancy-index copies.- Linear O(K+J) merge via
_merge_sorted_with_few(was O(N log N)argsort). cache=Trueon all scan helpers (faster Numba cold starts).- Pre-computed
abs(del_a)array (avoid per-iterationnp.abs). - Dead code removal: commented-out variables, redundant
de_1, simplified dispatch.
- Removed unused
correct_jumps_vf_polfrommath_funcs.py. - Moved
examples/durables/plot.pyandhelpers/metrics_fast.pytosuperseeded/. - Deleted stray
examples/durables/Untitled. - Trimmed
experiments/retirement/README.md.
solve_nest(wassolve_canonical) is the single entry point — three-functor override mechanism (calib_overrides,config_overrides)_accrete_nest(was_build_and_solve_nest) — backward induction loop, named per DDSL accretion conventioninstantiate_period(wasbuild_period) — dolo-plus pipeline: parse, methodize, configure, calibrateRetirementModel: all params required (no defaults), addedwith_test_defaults()classmethodfrom_periodreads from both.calibrationand.settingsvia_get()helper- Inter-period twister loaded from
syntax/nest.yaml(was hardcoded)
- Removed stale
src/dc_smm/duplicate package; all imports updated todcsmm - Flattened
syntax/syntax/tosyntax/ calibration.yaml+settings.yamlare the single source of truth for all parameters- Deleted
backward_induction()bypass pattern - Removed
examples/retirement/params/(replaced by syntax/ + experiments/) - Removed
examples/retirement/code/(empty stale directory) - Renamed
run_experiment.pytorun.py
run.py:--calib-override key=value,--config-override key=value,--override-file path.yml,--methodbenchmark.py: usessolve_nestthroughout
- Added
docs/notebooks/retirement_fues.ipynb— interactive walkthrough with plotly zoom, Nord theme, scaling sweep 1k-15k mkdocs.yml: addedmkdocs-jupyterplugin + Notebooks nav section
experiments/retirement/params/*.ymlrewritten as sparse key-value overrides (only values differing from syntax/ defaults)
-
[2025-12-16] Removed
policy_astorage inhorses_c.py:- Asset policy array (
policy_a) was being stored but never used downstream by metrics or plotting - Removed allocation, loop assignments, and return values from both EGM and VFI paths
- Savings: ~11MB per stage × 5 stages × 20 periods = ~1.1GB per model
- Asset policy array (
-
[2025-12-16] Conditional
vlu_dcsnallocation inhorses_c.py:- When
delta == 1(standard exponential discounting),vlu_dcsn = Q_dcsnmathematically - Skip allocating separate
vlu_dcsnarray; returnQ_dcsndirectly for both - Savings: ~11MB per stage for standard discounting models
- When
-
[2025-12-16] EGM grids return optimization in
horses_c.py:- Changed
_solve_egm_loopto returnNoneinstead of empty dictionaries whenstore_egm_grids=False - Added explicit
del unrefined_grids, refined_gridscleanup - Downstream code checks
egm_grids is not Nonebefore access
- Changed
- [2025-12-16] Skip expensive computations when
delta == 1inhorses_c.py:compute_gradient()- expensive gradient calculation for c_primeu_func()call - utility function evaluation for vlu_dcsn transformation- Simplified:
lambda_dcsn = uc_today,vlu_dcsn = Q_dcsn(direct assignment) - Savings: ~34,000 function calls skipped per model (343 iterations × 5 stages × 20 periods)
Note: Memory performance improvements are ongoing. Current optimizations reduce overhead but peak memory usage during sweep runs may still be high.
-
[2025-12-15] FUES array copy optimization in
fues.py:- Added
_ensure_f64()helper to avoid unnecessary array copies when inputs are already float64 and C-contiguous - Before:
np.asarray()always created copies; After: direct pass-through when dtype/layout matches - Savings: ~18MB per FUES call for typical grid sizes (5 arrays × 4000 × 8 bytes × 7 × 16)
- Added
-
[2025-12-15] DCEGM memory fixes in
dcegm.py:- Changed
np.zeros_like() + np.nantonp.full(n, np.nan)(creates 1 array instead of 2) - Removed unused
LinearInterpobject creation that was never used - Savings: ~130KB per DCEGM call + object overhead
- Changed
-
[2025-12-15] Housing solver meshgrid optimization in
horses_h.py:- Moved
np.meshgridandresources_liquid_3dfrom inside operator (per-solve) to factory closure (once per model) - Added
del a_mesh, y_meshto immediately free intermediate arrays - Savings: Avoids repeated allocation of large 3D arrays per income state per period
- Moved
-
[2025-12-15] Lazy compilation in whisperer.py (activated via
--low-memory):- New
_compile_period_stages(period)compiles stages just before solving each period - New
_clear_period_numerics(period)clears numerical grids after solving (except kept periods) - Memory footprint: O(1) periods instead of O(N_periods) for numerical grids
- Timing accuracy: Period times now exclude compilation and cleanup overhead (documented in docstring)
- New
-
[2025-12-15] Complete model cleanup in
solve_runner.py:- New
cleanup_model_complete(model)aggressively frees all solutions, models, operators, and period lists - Called after each configuration in sweep mode to prevent memory accumulation across MPI ranks
- Ensures each new configuration starts with clean memory
- New
- [2025-12-15]
--skip-bundle-saveflag (cli.py,solve_runner.py):- Skips saving solution bundles to disk entirely
- Useful for timing runs where only metrics are needed
- Reduces I/O overhead and disk space usage
- Added to
run_sweep_noPB_test.shfor faster sweep runs
- [2025-12-15] Euler error NaN fix: Periods 0 and 1 (in
periods_to_keep) no longer have their numerical grids cleared, preserving data needed for Euler error calculation
- Lazy compilation is mutually exclusive with dynx grid sharing (documented in devspec)
- All optimizations verified: arrays in FUES/DCEGM are read-only, so pass-through is safe
- PBS scripts updated to use
--skip-bundle-saveand--low-memoryflags
-
[2025-12-14 23:45 AEST] Fixed VFI marginal utility calculation in
horses_c.py:- Change: CPU VFI solvers (
_solve_vfi_numericaland_solve_vfi_block) now useuc = alpha/cinstead of1/cfor marginal utility, matching the GPU version and Cobb-Douglas utility. - Note: This only affects
lambda_dcsnoutput, NOT the VFI policy optimization itself (VFI doesn't use λ - it directly maximizes the Bellman equation). - Added
alphaparameter to VFI kernel function signatures for consistency.
- Change: CPU VFI solvers (
-
[2025-12-14 23:30 AEST] VFI/EGM policy discrepancy was due to grid lower bounds:
- Issue: VFI and EGM policies appeared different due to minimum grid values (
a_grid[0],w_grid[0]) being set too high. - This caused the optimization bounds in VFI to be overly constrained at low wealth levels.
- Fix: Adjusted grid lower bounds to appropriate values.
- Note: NOT caused by
ucformula since VFI optimization doesn't use marginal utility - it directly maximizesu(c,H) + βδV(a',H',y').
- Issue: VFI and EGM policies appeared different due to minimum grid values (
-
[2025-12-14 22:30 AEST] Fixed VFI JIT recompilation issue in
horses_c.py:- Bug:
_solve_vfi_loopwas callingbuild_njit_utility()directly, creating a NEW Numba-compiled function on every call. This triggered ~1.8s JIT compilation overhead for every VFI stage solve, making timing appear constant regardless of grid size. - Fix: Changed to use
get_u_func()which uses@lru_cacheto return the same cached function object for identical parameters. - Result: After JIT warmup, VFI kernel execution now properly scales with grid size (~30ms for 1000 grid → ~60ms for 2000 grid).
- Note: The Numba cache is cleared in PBS scripts (
rm -rf $NUMBA_CACHE_DIR). For production runs, consider keeping the cache to avoid first-call JIT overhead.
- Bug:
-
[2025-12-14 21:00 AEST] Added
avg_ownc_time_per_periodmetric to sweep results:- Computed in
whisperer.pyfrom stage_timings before they're excluded from CSV - Added to timing tables in
generate_paper_tables.pyto show VFI computational scaling
- Computed in
-
[2025-12-04 14:00 AEST] Added PCHIP-style monotone gradient method (
piecewise_gradient_pchip) in gradients.py - uses weighted harmonic mean to preserve MPC bounds (0, 1] -
[2025-12-04 14:15 AEST] Added q_diff > 0 condition to jump detection in
_egm_preprocess_core- only add constraint points where value function is increasing -
[2025-12-04 14:30 AEST] Modified solve_runner model factory to double a_points and a_nxt_points at runtime for improved accuracy while keeping w_points unchanged
-
[2025-12-06 10:00 AEST] Changed constraint point spacing at jumps from fixed count to mean e_diff based - points now spaced at input grid density via
use_mean_spacingparameter in_egm_preprocess_core
-
Retirement example restructured into modular components:
plots.py: Plotting functions (EGM grids, consumption policy, DC-EGM comparison)tables.py: Markdown and LaTeX table generation with parameter captionsbenchmarks.py: Timing sweep and performance comparisonsrun_experiment.py: CLI runner with argparse for grid size, plot age, and sweep settings
-
YAML-based model configuration for retirement experiments:
- Added
experiments/retirement/params/with baseline, high_beta, low_delta, and long_horizon configurations - Model parameters (
m_bar,beta,delta, etc.) now loaded from YAML and passed through the solver chain m_bar(FUES jump threshold) configurable from YAML → RetirementModel → Operator_Factory → EGM_UE → FUES
- Added
-
Benchmark tables output both
.mdand.texformats with parameter captions -
Experiments reorganized: Moved
run_housing_single_core.shtoexperiments/housing_renting/with job configs -
PBS scripts updated with
PBS_O_WORKDIRhandling for correct path resolution; logs output tologs/ -
FUES defaults aligned between
fues.pyandupperenvelope.py:m_bar=1.0,lb=4 -
Upper envelope interface: Added
include_intersectionsparameter toEGM_UEand_fues_engine
- Removed deprecated
FUES_sep_intersectimport; replaced withfues_alg(..., return_intersections_separately=True) - Fixed hardcoded
m_barvalues inworker_solveranditer_bellto use model parameter - PBS path resolution for scripts submitted via
qsub
-
scripts/setup_public_venv.sh: Creates virtual environment with dynx from GitHub -
experiments/retirement/retirement_timings.sh: Configurable bash wrapper with sweep settings -
logs/directory for HPC job output; addedtemp/andarchive/to.gitignore -
Consumption deviation benchmarking (PR #9):
- Added
consumption_deviation()function inretirement.pyto compare solutions against high-resolution "true" reference (default: 20k grid DCEGM) - Metric uses same format as Euler error:
log₁₀(|c - c_true| / c_true) - Configurable
true_grid_sizeandtrue_methodparameters in YAML benchmark section - Restructured benchmark tables into two cleaner formats:
- Timing table with UE/Tot sub-columns per method
- Accuracy table with Euler/Dev sub-columns per method
- Grid-grouped rows with delta as sub-rows for improved readability
- Renamed
l2variables tocdevfor clarity (consumption deviation, not L2 norm)
- Added
- [2025-08-16 10:00 AEST] Major refactoring: Removed MPI support from horses_c.py, removed unused F_ownc_cntn_to_dcsn factory, standardized terminology
- [2025-08-17 17:00 AEST] Added DGX A100 support with specialized PBS scripts, GPU kernel optimizations, and log management utilities
- [2025-08-23 11:21 AEST] Fixed numerical stability issues in FUES algorithm for delta != 1 case
- [2025-10-25 16:30 AEST] Fixed ZeroDivisionError in piecewise_gradient_3rd_filtered by adding zero-division protection throughout gradient computation
- [2025-10-25 17:00 AEST] Enhanced _egm_preprocess_core to only add jump constraints for segments with at least 4 points, improving numerical stability
- [2025-10-25 17:30 AEST] Added asset policy monotonicity filter in horses_c.py to remove points where refined asset policy is decreasing
- [2025-10-25 18:00 AEST] Fixed boolean operator error in _egm_preprocess_core (changed 'or' to '|' for array operations)
- [2025-10-25 18:15 AEST] Fixed array allocation in _egm_preprocess_core to correctly account for 2 segments per jump
- [2025-10-27 15:30 AEST] Added skip_egm_plots flag to conditionally skip EGM CSV exports (plot_csv_export.py, solve_runner.py)
- [2025-10-27 16:00 AEST] Implemented conditional EGM grid storage: skip saving EGM grids to Solution object when --skip-egm-plots enabled, reducing memory usage and pickle sizes (horses_c.py, solve_runner.py)
- [2025-10-27 16:35 AEST] Fixed AttributeError in make_housing_model: corrected mc.periods to mc.periods_list for flag injection (solve_runner.py)
- [2025-10-27 16:45 AEST] Added optional asset policy gradient filtering: filter_a_jumps setting removes refined grid points where da/dm exceeds max_a_gradient threshold (horses_c.py, master.yml)
- [2025-11-08 12:00 AEST] Implemented first-order condition (FOC) checks in _egm_preprocess_core to filter constraint points based on economic optimality: only add points that satisfy Kuhn-Tucker conditions with scaled lambda values (horses_common.py, horses_c.py)
- [2025-11-08 12:15 AEST] Modified image saving to always use timestamped directories (images_YYYYMMDD_HHMMSS) to preserve all previous runs instead of overwriting old image files (execution_settings.py, solve_runner.py)
- [2025-11-08 12:30 AEST] Added uc_test function in horses_common.py as a simple test marginal utility for FOC verification, imported into horses_c.py for testing purposes
- [2025-11-08 12:45 AEST] Fixed incorrect double method directory creation: removed redundant method subdirectory since we're already in bundles/hash/METHOD/images_TIMESTAMP/ structure (plots.py, plot_csv_export.py)
- [2025-11-08 16:00 AEST] Added disable_jump_checks parameter to FUES algorithm to control manual jump check overrides: when True, forces keep_i1=False (right turn) and keep_j=True (left turn); default is False to enable checks (fues.py)
- [2025-11-08 16:30 AEST] Optimized _egm_preprocess_core for speed: vectorized FOC checks, optimized jump detection with single mask computation, eliminated redundant array operations, replaced np.empty+fill with np.full (horses_common.py)
-
Multi-GPU MPI parallelization for housing model
- Single-node support for up to 4 GPUs with MPI
- Multi-node support for scaling across Gadi nodes (8+ GPUs)
- NUMA-aware CPU binding with 12 cores per MPI rank
- MPI dispatcher in horses_h.py with GPU detection and fallback
- MPI driver in horses_c_gpu.py with Allgatherv collectives
- Shared cache for grid.
-
PBS scripts for GPU scaling
run_housing_gpu_mpi.pbs: Single-node 4 GPU executionrun_housing_gpu_multi_node.pbs: Multi-node 8 GPU executionbenchmark_gpu_scaling.pbs: Automated 1, 2, 4 GPU performance comparison- Scripts use same options as single-GPU version for consistency
run_housing_dgxa100_single.pbs: DGX A100 single GPU job (512GB RAM, 80GB GPU)run_housing_dgxa100_parallel.pbs: DGX A100 4-GPU parallel executionsubmit_dgxa100_config.sh: Submit helper accepting multiple configurationsmove_logs_to_scratch.sh: Utility to move all logs to scratch storage
-
FUES algorithm cleanup and configurability
- Removed 4 unused functions (uniqueEG, linear_interp, seg_intersect, line_intersect_unbounded)
- Made epsilon parameters configurable (eps_d, eps_sep, eps_fwd_back, parallel_guard)
- Consolidated duplicated intersection logic into _forced_intersection_twopoint() helper
- Moved helper functions to src/dc_smm/fues/helpers/math_funcs.py
- Merged FUES_sep_intersect into main FUES function with return_intersections_separately flag
-
GPU implementation modifications
- Added initialize_vfh_from_config() function for MPI initialization
- Modified V_out calculation in kernel to use formula: V = (Q - (1-delta)*u(c,h)) / delta
- Implemented C-contiguous array handling for MPI operations
- Convergence check performed before array swap using allreduce(MAX)
- Policy gathering made conditional via policy_every parameter
- Two-pass grid search in vfi_gpu_kernel: coarse then fine search (~25% fewer evaluations)
- Pre-computed log_H_term for housing utility (avoids redundant calculations)
- Branchless operations using max() instead of if-statements
- Immediate memory cleanup after GPU transfers for large arrays
- MPI implementation structure
- MPI logic isolated in solver layer (horses_h.py, horses_c_gpu.py)
- Whisperer module unchanged - no modifications required
- 1 MPI rank mapped to 1 GPU
- Precomputed Allgatherv counts and displacements
-
FUES intersection calculations
- Rewrote intersection logic to handle near-parallel segments robustly
- Added forced intersection points that guarantee envelope continuity
- Intersection coordinates now strictly bounded within valid intervals
- Averaging technique reduces numerical drift at segment boundaries
-
Branch detection and continuity
- New branch detection checks both gradient thresholds and point proximity
- Safe extrapolation finds suitable points when direct neighbors unavailable
- Circular buffer for backward scanning improves memory efficiency
- Forward scan validates jumps using combined value and gradient criteria
-
Consecutive jump handling
- Prevents numerical instabilities from multiple policy jumps in sequence
- Drops previous jump point when consecutive jump detected
- Maintains index consistency by removing associated intersections
- Rule only enforced when current jump passes validation
- Numerical stability issues
- Adjusted epsilon constants for better numerical behavior (EPS_D from 1e-200 to 1e-20)
- Added parallel line guard (1e-12) for degenerate geometry detection
- Intersection capacity increased to 2*(N-1) preventing silent truncation
- Eliminated spurious Euler residuals at policy kinks
- [2025-08-23] Improved float64 numerical stability for delta != 1 case:
- Changed EPS_D from 1e-50 to 1e-14 (safe for float64 precision)
- Increased PARALLEL_GUARD to 1e-10 for better parallel line detection
- Added explicit float64 dtype enforcement in FUES and egm_preprocess
- Enhanced uniqueEG() to handle near-duplicate points with tolerance
- Fixed consumption lower bounds (1e-100 to 1e-10) in horses_common.py
- Memory optimizations
- Pre-allocated arrays reduce allocation overhead in hot loops
- Circular buffer implementation minimizes memory churn
- Uniform index bookkeeping simplifies maintenance and debugging
- [2025-08-02 18:38 AEST] Improved CLAUDE.md documentation with better organization, version management discipline, and incorporated feedback from o3pro.
- [2025-08-03 17:30 AEST] Fixed GPU scaling issue by implementing memory freeing during solve to prevent 193GB+ memory accumulation
- [2025-08-03 18:15 AEST] Enhanced memory management to completely free periods 2+ while preserving periods 0,1 for Euler error calculation
- [2025-08-03 18:45 AEST] Fixed Euler error GPU bottleneck by implementing sampling-based calculation for large grids to prevent 100GB+ memory transfers
- [2025-08-03 15:30 AEST] Created multi-job PBS submission system for running multiple GPU configurations in parallel
- [2025-08-03 16:00 AEST] Added income process generation script using Fella (2014) parameters for housing model
- [2025-08-03 16:30 AEST] Verified memory freeing implementation is complete but jobs crashing before benefits visible
- [2025-08-03 19:00 AEST] Identified issue with FUES algorithm dropping points after policy function jumps - needs intersection fallback when scans fail
- [2025-08-08 13:37 AEST] Fixed left/right branch assignment in FUES intersection calculation - new branch should be on right (higher e_grid values), old branch on left
- [2025-08-08 14:15 AEST] Implemented extrapolated segment intersections (extrap_segments_05_08082025_v1) - adds fallback extrapolation when forward/backward scans fail to find bracketing points, ensuring continuous piecewise-linear envelope
- [2025-08-08 15:34 AEST] Simplified solve_runner.py Phase 1 - extracted configuration management into ConfigurationManager class, reducing main() complexity while maintaining full PBS compatibility
- [2025-08-12 17:15 AEST] Cleaned up fues.py - removed 4 unused functions (uniqueEG, linear_interp, seg_intersect, line_intersect_unbounded) and made epsilon parameters (eps_d, eps_sep, eps_fwd_back, parallel_guard) configurable as optional function arguments while maintaining backward compatibility
- [2025-08-12 18:30 AEST] Consolidated duplicated intersection geometry logic in fues.py - created _forced_intersection_twopoint helper function to eliminate ~150 lines of duplicate code across Cases A, C.1, and C.2, while ensuring all epsilon parameters are properly passed through the function hierarchy
- [2025-08-12 19:00 AEST] Merged
FUESandFUES_sep_intersectfunctions in fues.py - consolidated into a singleFUESfunction with areturn_intersections_separatelyflag for simplified API and improved maintainability. - [2025-08-12 19:15 AEST] Cleaned up fues.py formatting - removed redundant comments, excessive blank lines, and obvious inline comments to improve code readability while maintaining functionality
- [2025-08-12 19:30 AEST] Simplified function signatures in fues.py - refactored _forced_intersection_twopoint and add_intersection_from_pairs_with_sep to accept L and R as tuples instead of 20 individual parameters, improving code clarity
- [2025-08-12 19:45 AEST] Fixed Numba compilation issue - removed @njit decorator from FUES wrapper function as it's unnecessary (only _scan needs JIT compilation) and was causing return type inconsistency errors
- [2025-08-12 20:00 AEST] Refactored FUES helpers - moved intersection and circular buffer utilities from fues.py to helpers/math_funcs.py for better code organization and reusability.
- [2025-08-12 20:15 AEST] Applied PEP8 formatting to fues.py - cleaned up whitespace, fixed spacing around operators, improved line breaks for better readability
- [2025-08-12 20:30 AEST] Fixed constants handling - moved EPS_D, EPS_SEP, and PARALLEL_GUARD constants from math_funcs.py back to fues.py where they belong, removed default parameter values that used these constants
- [2025-08-08 15:45 AEST] Renamed ConfigurationManager to ExecutionSettings to distinguish PBS execution settings from model configuration YAML
- [2025-08-08 16:15 AEST] Implemented clean left/no jump logic (clean_left_no_jump_logic_05_08082025.md) - allows consecutive no-jump left turns while preventing consecutive jumps via demotion, adds jump_now condition to intersection logic, ensures uniform index bookkeeping across all cases
- [2025-08-08 16:25 AEST] Fixed NameError in solve_runner.py - corrected missed variable rename from cfg_container to model_config in CircuitRunner initialization
- [2025-08-08 16:40 AEST] Fixed critical FUES implementation errors causing all points to be dropped:
- Fixed undefined variable 'left_turn' -> 'left_turn_any' in backward_scan_combined call
- Fixed uninitialized variable 'j' in first iteration (i=0)
- Added missing 'not_allow_2lefts' parameter to both _scan function calls in FUES and FUES_sep_intersect wrappers
- [2025-08-08 16:50 AEST] Applied additional FUES fixes from 05pro_fues_dev1_fixes.md:
- Fixed index update logic in Case C.1 when j is dropped - prev_j now correctly points to k (current tail) instead of dropped j
- Fixed value-fall state flags - last_turn_left now correctly set to False (value fall is not a geometric turn)
- Increased intersection capacity from N//2 to 2*(N-1) to prevent silent truncation in pathological cases
- [2025-08-08 17:05 AEST] Implemented _scan_v2 from right_as_left_no2jumps.md - cleaner, more compact FUES implementation:
- Single case_id encoding (turn<<1)|jump for simpler branching logic (4 cases: RTNJ, RTJ, LTNJ, LTJ)
- Different consecutive jump handling: keeps second jump but drops previously jumped-to point j and undoes its intersections
- Uniform index updates across all cases for better maintainability
- Intersections only added on jump iterations with robust extrapolation fallback
- Updated both FUES and FUES_sep_intersect wrappers to use _scan_v2
- [2025-08-08 17:20 AEST] Applied no_two_jumps.md refinement - only enforce consecutive jump rule when current jump is kept:
- Removed early unconditional consecutive jump enforcement block
- RTJ case: only drops previous j when keep_i1 is True (current jump is validated and kept)
- LTJ case: enforces rule at start since i+1 is always kept by construction
- Ensures "no two jumps" rule only applies when we're actually accepting the current jump
- [2025-08-08 18:00 AEST] Implemented strict bracket enforcement for FUES intersections - ensures intersections always lie within (e_j, e_{i+1}):
- Replaced loose e_min/e_max window check with strict _between_open(intr_x, e_grid[j], e_grid[i+1], EPS_SEP) validation
- Added _clip_open to clamp intersection x-coordinate into valid interval with safety margin
- Recompute intersection y-coordinate at clamped x using both line equations and average
- Applied to all three intersection cases: Case A (right-turn jump), Case C.1 (left turn, j dropped), Case C.2 (left turn, j kept)
- Prevents spurious off-interval intersections that corrupt envelope geometry on next iteration
- [2025-08-09 10:00 AEST] Implemented forced intersection points for all kept jumps - eliminates Euler equation residual gaps:
- Added _force_crossing_inside() function to guarantee valid intersections even for near-parallel lines
- Implemented adaptive separation min(EPS_SEP, 0.25*interval_length) to handle small intervals
- Modified all three cases (RTJ, LTJ j-dropped, LTJ j-kept) to use forced intersections
- Ensures piecewise-linear envelope with explicit kinks at all discrete choice switches
- [2025-08-09 11:00 AEST] Added comprehensive debug printing for intersection analysis:
- Added debug parameters to _scan and FUES functions with specific region filtering
- Prints intersection details including flag, point, indices, liquid savings values, and policies
- Default debug region set to e: [31.3, 32], v: [6.71, 6.75] for targeted analysis
- [2025-08-09 11:30 AEST] Replaced complex interpolation with cleaner implementation for non-ConSav methods:
- Added interp_clean() function with simpler, more robust extrapolation logic
- Modified horses_c.py to use interp_clean for Q_dcsn and policy interpolation when method != "CONSAV"
- Addresses suspected interpolation issues causing FUES instabilities
- [2025-08-09 12:00 AEST] Enhanced forward scan logic in Case A (right-turn jump):
- Added jump verification when g_1 > g_f_vf_at_idx condition is met
- Now checks if gradient from i+1 to idx_f exceeds jump threshold (m_bar)
- Only sets keep_i1=True when both value condition AND jump are confirmed
- [2025-08-10 10:15 AEST] Refactored
e_gridtox_dcsn_hatinfues.pyfor improved clarity and consistency with paper notation.
- CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
- Reduced thread block sizes from 1024 to 256-512 threads to avoid GPU resource exhaustion
- VFI kernel: (8,8,8)=512, Housing Owner: (8,8,4)=256, Housing Renter: (16,16)=256
- All GPU kernels now launch successfully with HIGH_RES_SETTINGS
-
GPU-accelerated shock integration
- New
shock_integration_kernelreplaces CPU-based np.einsum - Automatic GPU dispatch when compute="GPU" and problem size > 1000
- Expected 5-10x speedup for shock integration operations
- New
-
Development specifications
multi_gpu_parallel_architecture_03082025.md: Full 4-GPU + 48-CPU architecturevfi_hdgrid_gpu_parallel_03082025.md: Focused VFI-only 4-GPU parallelization
- GPU kernel optimizations
- VFI kernel restored to proper 3D parallelization (was 2D with loop)
- Fixed serialization bottleneck in wealth dimension
- All kernels now use balanced thread configurations for stability
- Expected 3-5x speedup from proper parallelization
- Resource management:
- Complex kernels require fewer threads due to register pressure
- Trade-off: more kernel launches but successful execution
- GPU memory usage: ~400MB for test grids, scales linearly
- GPU underutilization warning for small grids
- Fixed "Grid size 1 will likely result in GPU under-utilization" warning
- Added adaptive thread block sizing when grid dimensions are very small
- Ensures minimum GPU occupancy by adjusting thread configuration dynamically
- Affects: horses_h.py (owner/renter choice) and horses_c_gpu.py (VFI solver)
- Small test grids now launch with better GPU utilization
- Adaptive kernel configuration:
- Detects when total blocks would be ≤ 2 and reduces thread block size
- Maintains correctness while improving GPU occupancy for test configurations
- Example: 1×1 grid now uses 1×1 threads instead of 16×16, avoiding warnings
- GPU VFI kernel launch failure
- Fixed missing
@cuda.jitdecorator oncalculate_continuation_values_gpu_kernelfunction - Resolved
CUDA_ERROR_INVALID_VALUEby converting 3D grid to 2D grid with internal loop - Changed from
cuda.grid(3)tocuda.grid(2)to avoid CUDA's Z-dimension limit (65535 blocks) - Reduced thread block configuration from (16,16,4) to (16,16) for better compatibility
- GPU kernel now handles 4+ million grid points without exceeding CUDA limits
- Fixed missing
-
FUES algorithm version reorganization
- Renamed
fues_2dev5.py→fues.pyas current production version - Renamed original
fues.py→fues_v0dev.py(October 2024 paper version) - Moved all experimental versions (fues_2dev1-8) to
src/dc_smm/fues/experimental/ - Updated all method references:
FUES2DEV5→FUES,FUES2DEV*→FUES - Upper envelope registry updated:
@register("FUES")for production,@register("FUES_V0DEV")for paper version
- Renamed
-
Repository cleanup for public release
- Enhanced .gitignore to exclude HPC output files, backup directories, working notes
- Added
examples/README_OUTPUTS.mdexplaining output directory structure - Excluded all generated images/results from version control (best practice)
- Python build artifacts (*.egg-info) now properly ignored
-
GPU fix details:
- Problem: 3D grid with dimensions (250, 250, 64) = 4M points exceeded CUDA Z limit
- Solution: 2D grid (n_H, n_Y) with internal loop over n_W dimension
- Maintains same computation pattern while respecting CUDA architecture limits
-
Files reorganized:
src/dc_smm/fues/__init__.py- Updated importssrc/dc_smm/uenvelope/upperenvelope.py- Updated engine registrations- 11 example/test files updated with new method references
- Fixed all legacy import paths (dc_smm.fues.legacy.* no longer exists)
-
Repository structure:
src/dc_smm/fues/ ├── fues.py # Current production (was fues_2dev5) ├── fues_v0dev.py # Original paper version └── experimental/ # All experimental versions
- GPU kernel now successfully launches for high-resolution grids
- Expected 3-5x speedup for VFI GPU solver vs CPU
- Eliminates memory transfer bottleneck by keeping computation on device
-
Comparison metrics filtering for baseline-only runs
- Added
--comparison-metricsparameter to specify which metrics require baseline loading - Automatically skips comparison metrics when running only baseline method to prevent self-comparisons
- Saves ~45 minutes of unnecessary computation on baseline-only GPU runs
- Added
-
Selective model loading for memory efficiency
- Added
--load-periodsparameter to load only specific period indices - Added
--load-stagesparameter for fine-grained stage filtering per period - Reduces loading from 75 to 18 pickle files (76% reduction) for Euler error calculations
- Integrated with DynX's enhanced load_circuit() function
- Added
-
Smart metric execution based on method selection
- Baseline method now temporarily excludes comparison metrics during its own execution
- Comparison metrics (dev_c_L2, plot_c_comparison, plot_v_comparison) only run for fast methods
- Prevents meaningless baseline vs baseline comparisons that always return 0
- Improves walltime efficiency for GPU baseline computations
-
Updated single-core loading script
- Modified
run_housing_single_core.shto use selective loading for existing models - Added explanatory comments about loading requirements for Euler error
- Maintains backward compatibility when parameters not specified
- Modified
- GPU walltime exceeded errors
- Identified that metrics calculation phase was pushing baseline runs over 10-hour limit
- Baseline solving completed at 9h 13m, but metrics added >47m causing walltime kill
- Solution: skip unnecessary comparison metrics on baseline-only runs
- Files modified:
examples/housing_renting/solve_runner.py- Added comparison metrics filtering and loading optionsscripts/pbs/run_housing_single_core.sh- Added selective loading parametersexamples/housing_renting/helpers/euler_error.py- Added precompilation function
- Euler error requirements: Only needs Period 0 (OWNC stage) and Period 1 (all stages)
- Performance impact: Prevents walltime exceeded errors, reduces I/O by 76% when loading models
- Integration: Works with DynX v1.7.0 selective loading features
-
Euler error precompilation
- Added
precompile_euler_error_cpu()function to warm up Numba JIT cache - Eliminates ~30-60 second compilation overhead on first Euler error calculation
- Automatically runs during initialization when euler_error metric is requested
- Uses minimal dummy data for fast compilation
- Fixed utility function expressions to match standard CRRA housing model
- Added
-
Metric-specific selective loading for comparison metrics
- Comparison metrics now load only Period 0, OWNC stage from baseline (instead of all 5 periods)
- Reduces baseline loading from 75 to 3 pickle files per comparison (96% reduction)
- Each fast method saves ~42 seconds on baseline loading for comparisons
- Total time saved for 4 fast methods: ~168 seconds
-
Refactored FUES scan logic for better code organization
- Extracted forward scan logic into dedicated
forward_scan_case_a()function - Combined backward scan and find_backward_same_branch into unified
backward_scan_combined()function - Eliminated nested loops in favor of cleaner function calls while maintaining exact algorithm behavior
- Removed redundant pre-allocated arrays (g_f_vf, g_f_a, g_m_vf, g_m_a) with on-the-fly computation
- Memory savings of 4*N floats per scan operation
- Extracted forward scan logic into dedicated
-
Fixed circular buffer iteration order
- Discovered that fues_2dev1 had incorrect backward scan order (oldest to newest instead of newest to oldest)
- fues_2dev4 correctly implements the intended behavior: selecting the closest (most recent) point
- Both versions kept for comparison purposes with documented behavioral differences
-
Improved numerical stability for intersection points
- Changed intersection point separation from 1e-50 to 1e-8
- Prevents divide-by-zero errors in numpy gradient calculations
- Maintains accuracy while avoiding numerical precision issues
-
Index consistency bug
- Fixed idx_f being used as both loop counter and grid index
- Now correctly stores actual grid index:
idx_f = i+2+f - Ensures correct segment selection for intersection calculations
-
Missing circular buffer updates
- Added missing
m_head = circ_put(m_buf, m_head, j)when j is dropped - Fixed consecutive left turn handling to properly maintain buffer state
- Added prev_j tracking for correct j restoration
- Added missing
-
Spurious intersection handling
- Added
added_intersection_last_iterflag to track intersection creation - Remove last intersection on consecutive left turns to avoid spurious points
- Improved intersection point management for discrete choice switches
- Added
- Files modified:
src/dc_smm/fues/fues_2dev4.py- Refactored version with correct backward scansrc/dc_smm/fues/fues_2dev1.py- Original version with backward scan bug (kept for comparison)src/dc_smm/fues/fues_2dev1_working_backup_dev1.py- Backup of working version
- Performance impact: Reduced memory allocation and improved cache locality
- Backward compatibility: Both versions produce valid upper envelopes, just with different point selection in edge cases
-
Intersection point tracking in FUES algorithm
- Implemented intersection point detection as described in Dobrescu & Shanker (2022) Section 2.1.3
- Added
add_intersectionsparameter toFUES()function (default: True) for enhanced accuracy around crossing points - Forward scan intersection detection during right-turn jumps identifies where choice-specific value functions cross
- Backward scan intersection storage during left-turn elimination captures suboptimal point intersections
- Intersection points include interpolated policy values at crossing locations for complete solution representation
-
Memory-efficient intersection storage
- Pre-allocated intersection arrays (10% of grid size) to maintain O(n) complexity
- Automatic merging of original EGM points with intersection points, sorted by endogenous grid values
- Configurable intersection tracking with backward compatibility when disabled
- Enhanced
_scanfunction with intersection tracking- Added
track_intersectionsandpolicy_2parameters for comprehensive intersection detection - Consistent return format for all code paths to maintain Numba compatibility
- Improved boundary checking in forward scan to prevent array index errors
- Added
- Intersection detection algorithm:
# Forward scan: detect crossings when jumping to new value function branch inter_point = seg_intersect(p1, p2, p3, p4) # Line-line intersection # Interpolate policies at intersection point t = (inter_point[0] - e_grid[i+1]) / (e_grid[b_idx] - e_grid[i+1]) inter_p1[n_inter] = (1-t) * a_prime[i+1] + t * a_prime[b_idx]
- Files modified:
src/dc_smm/fues/fues_2dev1.py - Performance impact: Minimal overhead when intersections disabled; ~10% memory increase when enabled
- Accuracy improvement: Better representation of value function upper envelope around choice-specific crossings
- EGM plots generation for all EGM-based methods
- Fixed key prefix mismatch in
plot_egm_grids()function where plotting code was looking for unprefixed keys (e.g., "0-7") but EGM data was stored with prefixed keys (e.g., "e_0-7", "Q_0-7") - EGM plots now generate correctly for FUES2DEV, CONSAV, DCEGM, and other EGM-based methods
- Updated both unrefined and refined grid access to use proper prefixed key formats
- Added proper error handling for missing EGM data components
- Fixed key prefix mismatch in
- Plot metrics configuration logic
- Plot metrics are now only included in computation when explicitly requested in
--metricslist - Removed incorrect behavior where
--plotsflag would automatically include plot metrics in computation - Improved separation between traditional plot generation (
--plots) and plot-based metric computation (--metrics plot_c_comparison)
- Plot metrics are now only included in computation when explicitly requested in
- Comprehensive debugging for EGM data flow
- Added targeted debugging in
plot_egm_grids()to verify EGM data availability and key formats - Enhanced error messages for missing or malformed EGM grid data
- Created systematic approach for debugging data flow from solution storage to plotting
- Added targeted debugging in
- Key format changes in plots.py:
# Before (incorrect): e_grid_unrefined = egm_data["unrefined"][grid_key] # Looking for "0-7" # After (correct): prefixed_e_key = f"e_{grid_key}" # Looking for "e_0-7" e_grid_unrefined = unrefined_dict.get(prefixed_e_key)
- Files modified:
examples/housing_renting/helpers/plots.py,examples/housing_renting/solve_runner.py - Impact: Visual validation of endogenous grid method upper envelope refinement process now available for all EGM-based methods
- Dynamic baseline method selection via
--baseline-methodflag with auto-detection based on--gpuflag - Configurable fast methods via
--fast-methodsflag (default: FUES2DEV,CONSAV) - Automatic baseline inclusion via
--include-baselineflag for cleaner single-core workflows - Enhanced module docstring with comprehensive examples for GPU, MPI, and baseline loading workflows
- Method configuration no longer requires editing source code - all methods configurable via command-line flags
- Backward compatibility maintained - existing scripts work without modification
- CONSAV engine argument handling - fixed
AttributeErrorwhenu_func["args"]expects dictionary format - Method selection logic streamlined to eliminate hardcoded baseline/fast method lists
- Professional, publication-quality comparison plots for policy and value functions via
plot_comparison_factoryinhelpers/metrics.py. - Plots now use proper interpolation: Both fast and baseline methods are compared on a common grid, matching the logic used in L2 error metrics.
- Plots are saved in the bundle directory for each parameter/method, keeping results organized and reproducible.
- X-axis uses real economic grid values (e.g., wealth, housing) instead of indices, for interpretability.
- Error plot features: Zero reference line, error bars, statistics box (max/mean error), and improved styling for publication-quality output.
- Docstrings for all metrics and plotting functions updated to explain interpolation, grid handling, and scientific accuracy.
- Example usage and configuration included in docstrings for both plotting and L2 metrics.
- L2 error and plotting metrics now always compare on a common grid, ensuring scientifically accurate, like-for-like comparisons regardless of discretization.
- Improved error handling and warnings for grid mismatches, shape incompatibilities, and extraction failures.
- All changes are fully integrated with CircuitRunner and its bundle management system, so plots and metrics are always associated with the correct parameter set.
- Bugfixes to Euler error metric: Improved threshold handling and interpolation logic to avoid NaN results and ensure robust error calculation for all methods.
- Plotting function scope issues: Fixed closure variable capture and array indexing errors in plotting configuration.
- Value function extraction: Now uses correct model attribute names (
vluinstead ofv) and solution types for robust extraction.
-
GPU-Accelerated VFI Solver (
VFI_HDGRID_GPU)- Implemented a new solver backend using Numba CUDA to offload the VFI dense grid search to NVIDIA GPUs.
- The
vfi_gpu_kernelperforms the core computation in parallel across thousands of GPU threads. - The
solve_vfi_gpuhost function manages data transfers (CPU↔GPU) and kernel launches. - This provides a significant performance increase for high-density baseline calculations, enabling larger and more complex models to be solved within practical time limits.
-
Dynamic and Shared Memory on GPU
- The GPU kernel now uses dynamic shared memory to dramatically reduce slow global memory access, a key optimization for performance.
- The launcher calculates the required shared memory size at runtime, allowing the kernel to handle variable-sized grids without hardcoded limits.
-
Unified Solver and Pre-compilation Workflow
solve_runner.pyis now the single entry point for all workflows (CPU, MPI, and GPU).- A new
--precompileflag intelligently warms up the correct Numba cache (either CPU or GPU) based on the selected method. - The framework now automatically uses minimal grid settings during pre-compilation to prevent GPU out-of-memory errors.
-
Robust GPU-Compatible Helper Functions
- Created a GPU-safe
interp_gpufunction to perform linear interpolation, asnp.interpis not supported in CUDA kernels. - Implemented a "dispatcher pattern" for utility functions, using an integer ID to select between pre-compiled, static GPU device functions (
u_func_gpu_crra, etc.). This is the robust solution for handling different functional forms on the GPU.
- Created a GPU-safe
- GPU Compilation Errors: Resolved a series of
TypingErrorandNameErrorissues by:- Replacing unsupported function calls (
np.interp,cuda.lib.isinf) with GPU-compatible equivalents (interp_gpu,math.isinf). - Correctly handling function namespaces (
mathvs.np) inside device code. - Eliminating the use of unsupported closures as kernel arguments.
- Replacing unsupported function calls (
- GPU Out-of-Memory Errors: Fixed
CudaAPIError: [700]by ensuring the pre-compilation step uses a minimal memory footprint.
-
Hierarchical MPI parameter sweep architecture
- Two-level MPI communicators:
COMM_TOPfor parameter distribution,COMM_SOLVERfor intra-node VFI computation - Enable scaling to large parameter spaces (e.g., 50 parameter combinations × 45 cores each = 2250 total cores)
- Each node runs complete baseline+fast workflow for one parameter combination
- Two-level MPI communicators:
-
Memory-efficient parameter processing
- Apply solve→plot→delete→gc pattern from
solve_runner.pyto parameter sweeps - Process each parameter combination sequentially to avoid memory accumulation
- Immediate model cleanup after plotting and metric extraction
- Apply solve→plot→delete→gc pattern from
-
DynX Sampler integration for parameter sweeps
- Replace manual parameter grid construction with built-in
Cartesiansampler - Canonical column ordering and robust parameter space handling
- Support for both list (
PATH=v1,v2,v3) and range (PATH=min:max:N) parameter specifications
- Replace manual parameter grid construction with built-in
-
Enhanced bundle management for parameter caching
- Hash-based bundle directories for each parameter combination
- Automatic skip of completed parameter combinations
- Robust restart capability for interrupted parameter sweeps
- Method-aware bundle organization (VFI_HDGRID, FUES, CONSAV in separate subdirectories)
- Phase 1: Core architecture with hierarchical MPI and sampler integration
- Phase 2: Integration with proven solve_runner patterns and bundle management
- Phase 3: CLI enhancement and workflow optimization
- Migration Path: Create
param_sweep_v2.pyalongside existing implementation
- Performance: 10-100x reduction in peak memory usage for large parameter sweeps
- Scalability: Linear scaling to hundreds of parameter combinations across multiple nodes
- Robustness: Automatic restart capability and bundle corruption recovery
- Maintainability: Code reuse from solve_runner and elimination of manual parameter bookkeeping
-
Comprehensive MPI warning suppression
- Added environment variables to suppress non-fatal MPI collective communication warnings (
LOG_CAT_ML,basesmuma,ml_discover_hierarchy) - New MPI configuration variables:
OMPI_MCA_coll_ml_priority=0,OMPI_MCA_coll_hcoll_enable=0, and BTL layer warning suppressions - Implemented stderr filtering in MPI scripts to remove noise while preserving genuine errors
- Added environment variables to suppress non-fatal MPI collective communication warnings (
-
Numba cache management for MPI environments
- Added automatic Numba cache clearing before MPI runs to prevent cache corruption
- Implemented process-specific cache directories (
NUMBA_CACHE_DIR=/tmp/numba_cache_$$) - Added
NUMBA_DISABLE_CACHE=1andNUMBA_NUM_THREADS=1for MPI safety
-
Memory-efficient model processing workflow
- Implemented immediate model processing pattern: solve → extract metrics → generate plots → delete model → garbage collect
- Added per-model memory cleanup with explicit
del modelandgc.collect()calls - Replaced batch processing with sequential processing to minimize peak memory usage
-
Enhanced logging and error tracking
- Added timestamped log files for both stdout and stderr with
teecommand - Implemented comprehensive error logging while maintaining screen output visibility
- Added run completion status reporting with exit codes
- Added timestamped log files for both stdout and stderr with
-
Solve runner workflow optimization
- Modified
solve_runner.pyto process each model individually instead of keeping all models in memory - Baseline and fast methods now follow identical solve-plot-delete pattern
- Replaced
mpi_mapbatch processing with individualrunner.run()calls for better memory control - Updated metrics collection to use
all_metricslist instead of DataFrame concatenation
- Modified
-
MPI script robustness
- Enhanced
circuit_run_HR_mpi.shwith comprehensive error suppression and cache management - Added pre-run cache cleaning and post-run status reporting
- Implemented filtered stderr to separate MPI noise from application errors
- Enhanced
-
Numba compilation race conditions
- Resolved
KeyErrorexceptions in Numba caching system during concurrent MPI compilation - Fixed
ReferenceError: underlying object has vanishederrors during object serialization - Eliminated cache corruption issues when multiple MPI processes compile identical functions
- Resolved
-
Memory management issues
- Fixed memory accumulation when processing multiple models sequentially
- Resolved potential memory leaks by ensuring proper model cleanup after plotting
- Eliminated peak memory spikes by processing models one at a time
-
MPI communication noise
- Suppressed non-fatal
basesmumacomponent warnings that cluttered error logs - Filtered out
ml_discover_hierarchyand collective communication layer warnings - Maintained visibility of genuine MPI errors while removing infrastructure noise
- Suppressed non-fatal
-
Error patterns addressed:
KeyError: ((Array(int32, 1, 'C', False, aligned=True), ...))in Numba cachingReferenceError: underlying object has vanishedduring serialization[LOG_CAT_ML] component basesmuma is not availableMPI warnings- Memory exhaustion from keeping multiple large models in memory simultaneously
-
Environment variables added:
NUMBA_DISABLE_CACHE=1 NUMBA_CACHE_DIR=/tmp/numba_cache_$$ NUMBA_NUM_THREADS=1 OMPI_MCA_coll_ml_priority=0 OMPI_MCA_coll_hcoll_enable=0 OMPI_MCA_btl_base_warn_component_unused=0
- Runner metric now specific to each model -- metrics is local rather than being imported from
dynx.runner.metrics.deviations
-
MPI parallelization for
VFI_HDGRID- New memory-slim MPI implementation that scatters value-function slices to workers instead of broadcasting the full tensor.
- Workers hold virtually zero memory after each stage, enabling large-scale runs on clusters (e.g. NCI Gadi).
- Provides bit-for-bit identical results between serial and MPI modes.
-
Two-step baseline workflow
- New CLI flags (
--baseline-only,--use-baseline,--fresh-fast) for separating expensive HD-grid construction from fast method comparisons. - Allows building a baseline once on many cores and reusing it for subsequent fast solver runs on a single core.
- Leverages CircuitRunner's built-in
save_by_defaultandload_if_existsfunctionality.
- New CLI flags (
-
MPI-aware operator factories & solvers
- Updated
horses_c.pyandwhisperer.pyto be rank-aware. - Workers now receive lightweight stub
Solutionobjects for non-MPI stages, preventing deadlocks and memory bloat. - No heavy
.solobjects are ever broadcast back to workers.
- Updated
-
Configurable plotting comparison system
- New
plot_comparison_factory()function inhelpers/metrics.pycreates configurable plotting metrics for comparing fast methods against baseline solutions. - Generates difference plots between policy/value functions of different solution methods (e.g., FUES vs VFI_HDGRID).
- Configurable state-space slicing allows plotting specific indices of multi-dimensional arrays.
- Supports both consumption policies (
c) and value functions (vlu) with automatic detection of solution attributes. - Integrated with CircuitRunner's metric system for seamless workflow integration.
- Uses existing
_extract_policy()function for robust data extraction from complex model structures. - Memory-efficient design stores baseline model temporarily and cleans up automatically.
- New
- Streamlined terminal value initialization
- The
initialize_terminal_valuesfunction inwhisperer.pynow only processes consumption stages (OWNCandRNTC), eliminating wasteful placeholder grids for housing and tenure stages. - Saves 150-300 MB of RAM on large grids and speeds up terminal pass by ~10%.
- The
- Legacy broadcast MPI mode and
--legacy-bcastflag. - Redundant synchronization calls (
_sync_perch_solutions) fromwhisperer.py. - Unused utility functions and imports for a cleaner, more maintainable codebase.
- Over-engineered baseline I/O in favor of CircuitRunner's native bundle management.
- Hash collision bug where
__runner.modewas incorrectly included inparam_paths, preventing fast methods from loading the correct baseline bundle. - Deadlocks caused by workers returning
Noneinstead of lightweight stubs for non-MPI stages. - Unnecessary recomputation of fast methods when a baseline was loaded.
- Timing metrics now correctly captured and displayed in the summary tables.
- Plot comparison function scope issues where parameter variables from factory function weren't properly captured in closure.
- Array indexing errors in plotting configuration by using 0-indexed bounds instead of array size.
- Value function extraction by using correct model attribute names (
vluinstead ofv) and solution types. - Euler error calculation thresholds made more flexible and based on model's borrowing constraint to prevent NaN results for FUES methods.
- mpi_run takes in a solver communicator which splits each run across solvers. Only master rank processes the metrics and loading/saving.
- outputs across mpi and non-mpi runs are not consistent.
- basic HF vf grid comparison for housing renting model using MPI. (compares to single parameter run in circuit_runner_solving.py)
-
Root-only metrics path
CircuitRunner.run()now skips the expensivemetric_fnsblock on non-root ranks; workers only return lightweight timing info. → prevents N× baseline reloads and cuts RAM usage on large jobs. -
Global MPI helpers (
_MPI_COMM / _MPI_RANK / _MPI_SIZE) initialised once at import time; used throughout the runner/solver stack to gate code that should execute only on rank 0.
-
mpi_map()rewritten for clarity- Always returns a pair
(df, models)(second element[]when models are not gathered). - Serial code-path untouched; MPI path defers metrics to rank 0.
- Always returns a pair
-
Stage compilation log-level
compile_all_stages()prints INFO messages only when the caller set--verbose; otherwise it downgrades to DEBUG to keep worker logs clean. -
Config patching (
patch_cfg)- Consumption stages now carry a cheap
"compute": "SINGLE"flag for fast methods; `
- Consumption stages now carry a cheap