Skip to content

The YAML File

Manjaree Binjolkar edited this page Sep 19, 2025 · 10 revisions

YAML File Format

The Tiger-HLM-Runoff YAML configuration file specifies all required parameters to run the Runoff5 model using AORC forcing data. It defines time periods, input data locations, solver settings, output controls, and runtime flags. The configuration file is passed as an argument when executing the module:

./bin/runoff data/config.yaml

The program will parse the config.yaml at runtime and use the specified values to initialize and execute the simulation. Any parameters not used can be left blank or commented out.

Below is an example configuration for running Runoff5 model data for 2017–2018:

description: |
  This config sets up Runoff5 to run with AORC - 1km data for 2017 and 2018,
  time chunks are automatically calculated, using NetCDF inputs and
  outputs written to the path you specify.

model:
  uid: 5
  name: "Runoff5"

time_period:
  start: "2017-01-01"
  end: "2018-12-31"

initial:
  mode: "constant"
  file: ""
  values: [0.01, 0.1, 0.0, 0.1, 0.01, 1.0, 1.0, 0.0, 0.0]  
  #Order of values: [snow, static, surface, grav, aquifer, T_air, T_soil, surf_runoff, total_runoff]

parameters:
  path: "./data/Stony_Brook/"
  spatially_varying_file: "./Stony_Brook_spatial_params.csv"

forcings:
  type: "netcdf"
  path: "../data/Stony_Brook/"
  time_chunking: true

  variables:
    - name: "pr"
      file: "pr_hourly_AORC_{year}.nc"
      var_name: "pr"
      time_resolution: "1h"
      required: true
      dims: "time, latitude, longitude"      

    - name: "t2m"
      file: "t2m_daily_avg_AORC_{year}.nc"
      var_name: "t2m"
      time_resolution: "24h"
      required: true
      dims: "time, latitude, longitude"     

forcing_mappings:
  path: "."
  variables:
    - name: "pr"
      file: "Stony_Brook_pr_AORC_lookup.csv"
    - name: "t2m"
      file: "Stony_Brook_t2m_AORC_lookup.csv"

output:
  print_interval: 1                   
  query_dt: 60.0                       
  states: [0, 1, 2, 3, 4, 5, 6, 7, 8]  
  output_path: "./data/Stony_Brook/outputs/" 
  output_file: "dense.nc"            
  final_output_file: "final.nc"       
  runoff_output_file: "runoff.nc"     

solver:
  rtol: 1e-6                     
  atol: 1e-9                     
  safety: 0.9                    
  min_scale: 0.2                 
  max_scale: 10.0                
  override_tolerances: true     
  initial_step: 0.001            
  override_initial_step: true    

flags:
  use_mpi: false              
  max_gpu_mem_gb: 80          
  gpu_mem_buffer_pct: 15      

The configuration file shown above is available at: /data/Stony_Brook/config_Stony_Brook.yaml.


Parameter Definitions

description

  • description: A free-text description of the simulation setup. This is for user reference only.

model

  • uid: Unique numeric identifier for the model instance.
  • name: Name of the model class (e.g., "Runoff5").

time_period

  • start: Simulation start date in YYYY-MM-DD format (inclusive).
  • end: Simulation end date in YYYY-MM-DD format (inclusive).

initial

  • mode: Determines how to initialize model states:

    • "constant" — Uses predefined constant values which can be specified in the values "[]" array.
    • "from_file" — Loads state variables from a specified file.
  • file: Path to the file containing initial states (only used if mode = "from_file").


parameters

  • path: Directory containing parameter files for the model.
  • spatially_varying_file: Path to a CSV file with spatially varying (per-link) parameters.
  • constant_parameters_index: List of indices identifying which parameters are treated as constants.
  • constant_parameters_values: Values of the constant parameters (correspond to indices above).

forcings

  • type: Format of input forcing data. Currently supports "netcdf".
  • path: Directory containing forcing files.
  • time_chunking: If true, forcing data will be loaded in time chunks to manage memory usage.

forcings.variables (list)

Each forcing variable is specified with:

  • name: Logical name used internally by the model.
  • file: Template filename for NetCDF input, supports {year} placeholders.
  • var_name: Name of the variable inside the NetCDF file.
  • time_resolution: Temporal resolution of the forcing (e.g., "1h", "24h").
  • time_chunk_size (optional): Number of days per chunk to load (overrides automatic chunking if set).
  • required: If true, the variable must be present; if false, missing data will be zero-filled.

forcing_mappings

  • path: Directory containing mapping CSV files.

  • variables: List of mappings for each forcing variable:

    • name: Must match the forcing variable name in forcings.variables.
    • file: CSV file mapping stream IDs to lat/lon grid points.

output

  • print_interval: Frequency (in chunks or timesteps) to print progress to console.
  • query_dt: Time interval between outputs in minutes.
  • states: List of state variable indices to include in the output.
  • output_path: Directory where output NetCDF files will be written.
  • output_file: Filename for dense output files (per chunk).
  • final_output_file: Name of the final NetCDF file containing the final model state.
  • runoff_output_file: Filename for runoff-specific output.

solver

  • rtol: Relative tolerance for the ODE solver.
  • atol: Absolute tolerance for the ODE solver.
  • safety: Safety factor for adaptive time step control.
  • min_scale: Minimum scaling factor when reducing time step size.
  • max_scale: Maximum scaling factor when increasing time step size.
  • override_tolerances: If true, forces the solver to use the specified tolerances.
  • initial_step: Suggested initial step size (in minutes).
  • override_initial_step: If true, overrides the solver’s automatic initial step size estimation.

flags

  • use_mpi:

    • true — Enable MPI parallel execution. Each rank will bind to a GPU, process its own subset of hillslopes, and write rank-suffixed outputs.
    • false — Run in serial (single rank, single GPU). If the job is launched with multiple MPI processes while this is false, the program will exit to prevent duplicate work.
  • max_gpu_mem_gb: Maximum GPU memory (in GiB) that the solver is allowed to use. This is used when computing safe chunk sizes for forcing and state data.

    • For NVIDIA A100 GPUs use 80.
    • For NVIDIA H200 GPUs use 96.
  • gpu_mem_buffer_pct: Percentage of GPU memory to reserve as a buffer. Prevents out-of-memory errors by keeping some headroom.

Clone this wiki locally