From 84164ebf546b87db21e4fe70e6eef7b02481e242 Mon Sep 17 00:00:00 2001 From: Matt Archer Date: Thu, 19 Mar 2026 13:28:33 +0000 Subject: [PATCH 1/2] docs: Add more detailed pipeline instruction --- README.md | 126 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 93 insertions(+), 33 deletions(-) diff --git a/README.md b/README.md index a9d1255..b4442d4 100644 --- a/README.md +++ b/README.md @@ -9,11 +9,11 @@ contains the full end-to-end workflow as a citable, versioned snapshot. ## Repositories -| Submodule | Description | -|-----------|-------------| -| [`nemo-spinup-forecast`](nemo-spinup-forecast/) | Dimensionality reduction and forecasting | -| [`nemo-spinup-restart`](nemo-spinup-restart/) | Restart file generation | -| [`nemo-spinup-evaluation`](nemo-spinup-evaluation/) | Evaluation and validation | +| Submodule | Description | +| ----------------------------------------------------------------------------- | ---------------------------------------- | +| [`nemo-spinup-forecast`](https://github.com/m2lines/nemo-spinup-forecast) | Dimensionality reduction and forecasting | +| [`nemo-spinup-restart`](https://github.com/m2lines/nemo-spinup-restart) | Restart file generation | +| [`nemo-spinup-evaluation`](https://github.com/m2lines/nemo-spinup-evaluation) | Evaluation and validation | --- @@ -25,7 +25,7 @@ Reference data (DINO output, restart files, `mesh_mask.nc`) is archived on Zenod --- -## End-to-End Steps for Running Spin-Up NEMO +## Installation of Nemo-Spinup-Bench dependencies 1. **Clone this repository with submodules** @@ -34,28 +34,69 @@ Reference data (DINO output, restart files, `mesh_mask.nc`) is archived on Zenod cd nemo-spinup-bench ``` -2. **Download reference data from Zenodo** +2. **Create a virtual environment and install dependencies** + + ```bash + python3 -m venv venv + source venv/bin/activate + pip install ./nemo-spinup-{forecast,restart,evaluation} + ``` + + +## Benchmark end-to-end steps + +This describes the complete end-to-end pipeline to run the benchmark. We omit details like building and compiling NEMO/DINO. + +> The entire pipeline assumes NEMO 4.2.0 and a completed cold-start NEMO run, i.e. output files, restart files, and a `mesh_mask.nc` are available before starting. +> +> The commands below use paths from the Zenodo reference dataset. Substitute `/path/to/reference/data` with your own data directory if not using the reference data. + +### Data preparation + +1. **Get simulation data** + + The entire benchmark will run using sample data hosted on Zenodo. Alternatively you may run NEMO/DINO yourself; we recommend running for at least 50–100 years. A Slurm script is provided in the NEMO [notes](https://github.com/m2lines/Spinup-NEMO-notes/blob/main/nemo/buildandrun_NEMODINO.md). The Zenodo reference data contains 50 years of DINO output files to train on. + **Download reference data from Zenodo** ```bash # TODO: add download instructions once Zenodo record is created ``` -3. **Create a virtual environment and install dependencies** +2. **(Optional) Combine restart files and mesh mask** using [REBUILD_NEMO](https://forge.nemo-ocean.eu/nemo/nemo/-/tree/4.2.0/tools/REBUILD_NEMO): + + This step is only required if you are using your own NEMO run. The Zenodo reference data already includes combined files. You can use the same module environment used to run NEMO/DINO to compile `rebuild_nemo`. ```bash - python3 -m venv venv - source venv/bin/activate - pip install ./nemo-spinup-forecast - pip install ./nemo-spinup-restart - pip install ./nemo-spinup-evaluation + ./rebuild_nemo -n ./nam_rebuild /path/to/reference/data/restart/DINO_00576000_restart 36 + ./rebuild_nemo -n ./nam_rebuild /path/to/reference/data/mesh_mask 36 + ``` + +If more training data is needed, concatenate monthly outputs `*grid_T.nc` with `ncrcat`, part of the [NCO (netCDF Operators)](https://nco.sourceforge.net/). + + +### Spin-up acceleration + +3. **Establish a baseline evaluation** of the cold-start reference simulation: + + ```bash + nemo-spinup-evaluation \ + --sim-path /path/to/reference/data \ + --config nemo-spinup-evaluation/configs/DINO-setup.yaml \ + --mode both ``` -4. **Resample SSH** from monthly to annual using the notebook in `nemo-spinup-forecast/Notebooks/Resample_ssh.ipynb`. +5. **Resample data** + + > TODO: This step will soon use `cdo` to resample data. This is currently being done with [nemo-spinup-forecast/Notebooks/Resample_ssh.ipynb](https://github.com/m2lines/nemo-spinup-forecast/blob/main/Notebooks/Resample_ssh.ipynb) + + All data must be temporally aligned before forecasting. Use the [Resample_ssh.ipynb](https://github.com/m2lines/nemo-spinup-forecast/blob/main/Notebooks/Resample_ssh.ipynb) notebook to convert monthly SSH (`DINO_1m_grid_T.nc`) to annual (`DINO_1m_To_1y_grid_T.nc`). Temperature and salinity (3-D) are already annual (`DINO_1y_grid_T.nc`). + +6. **Create the projected state** -5. **Create the projected state** + Set `--path` to the NEMO/DINO data directory: ```bash - python -m nemo_spinup_forecast \ + nemo-spinup-forecast \ --ye True \ --start 20 \ --end 50 \ @@ -66,12 +107,14 @@ Reference data (DINO output, restart files, `mesh_mask.nc`) is archived on Zenod --techniques-config nemo-spinup-forecast/src/nemo_spinup_forecast/techniques_config.yaml ``` -6. **Prepare restart files** using REBUILD\_NEMO: - - ```bash - ./rebuild_nemo -n ./nam_rebuild /path/to/reference/data/restart/DINO_00576000_restart 36 - ./rebuild_nemo -n ./nam_rebuild /path/to/reference/data/mesh_mask 36 - ``` + - **`ye`** — simulation expressed in years (`True`) or months (`False`) + - **`start`** — starting year for training data + - **`end`** — ending year (usually the last simulated year) + - **`comp`** — number or ratio of components to use + - **`steps`** — jump size (years if `ye=True`, months otherwise) + - **`path`** — directory containing the simulation files + - **`ocean-terms`** — path to `ocean_terms.yaml` mapping logical terms (SSH, Salinity, Temperature) to dataset variable names; uses packaged default if omitted + - **`techniques-config`** — path to `techniques_config.yaml` selecting DR and forecast techniques; uses packaged default if omitted 7. **Create the updated restart file** @@ -84,20 +127,37 @@ Reference data (DINO output, restart files, `mesh_mask.nc`) is archived on Zenod --ocean_terms nemo-spinup-forecast/ocean_terms.yaml ``` -8. **Copy the updated restart files** back to the NEMO experiment directory and update - `namelist_cfg` (`nn_it000`, `nn_itend`, `cn_ocerst_in`, `ln_rstart = .true.`). + - **`--radical`** is the prefix of the restart file (e.g. `DINO_00576000_restart`) + - Output files are named as the originals but with `NEW` prepended + +8. **Evaluate** the projected state and compare against the baseline: + + ```bash + nemo-spinup-evaluation \ + --sim-path /path/to/new/simulation \ + --ref-sim-path /path/to/simulation/data \ # optional: offline ground truth reference simulation + --config nemo-spinup-evaluation/configs/DINO-setup.yaml \ + --mode both + ``` + + - **`--ref-sim-path`** — the offline reference simulation used as ground truth for comparison. TODO: upload full reference simulation data to Zenodo. + +--- + +## Running NEMO with the new state + +1. **Copy the experiment directory** inside the NEMO repository as a backup; the original will be overwritten in the next step. + +2. **Copy the updated restart files** (`mesh_mask_.nc` and `DINO_