Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 120 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ contains the full end-to-end workflow as a citable, versioned snapshot.

## Repositories

| Submodule | Description |
|-----------|-------------|
| [`nemo-spinup-forecast`](nemo-spinup-forecast/) | Dimensionality reduction and forecasting |
| [`nemo-spinup-restart`](nemo-spinup-restart/) | Restart file generation |
| [`nemo-spinup-evaluation`](nemo-spinup-evaluation/) | Evaluation and validation |
| Submodule | Description |
| ----------------------------------------------------------------------------- | ---------------------------------------- |
| [`nemo-spinup-forecast`](https://github.com/m2lines/nemo-spinup-forecast) | Dimensionality reduction and forecasting |
| [`nemo-spinup-restart`](https://github.com/m2lines/nemo-spinup-restart) | Restart file generation |
| [`nemo-spinup-evaluation`](https://github.com/m2lines/nemo-spinup-evaluation) | Evaluation and validation |

---

Expand All @@ -25,7 +25,9 @@ Reference data (DINO output, restart files, `mesh_mask.nc`) is archived on Zenod

---

## End-to-End Steps for Running Spin-Up NEMO
## Installation of Nemo-Spinup-Bench dependencies

This installs all three nemo-spinup-{forecast, restart, evaluation} packages in a single virtual environment.

1. **Clone this repository with submodules**

Expand All @@ -34,70 +36,152 @@ Reference data (DINO output, restart files, `mesh_mask.nc`) is archived on Zenod
cd nemo-spinup-bench
```

2. **Download reference data from Zenodo**
2. **Create a virtual environment and install dependencies**

```bash
python3 -m venv venv
source venv/bin/activate
pip install ./nemo-spinup-{forecast,restart,evaluation}
```


## Benchmark end-to-end steps

This describes the complete end-to-end pipeline to run the benchmark. We omit details like building and compiling NEMO/DINO.

> The entire pipeline assumes NEMO 4.2.0 and a completed cold-start NEMO run, i.e. output files, restart files, and a `mesh_mask.nc` are available before starting.
>
> The commands below assume reference data is downloaded to `data/DINO/`. Substitute this with your own data directory if not using the reference data.

### Data preparation

1. **Get simulation data**

The entire benchmark will run using sample data hosted on Zenodo. Alternatively you may run NEMO/DINO yourself; we recommend running for at least 50–100 years. The Zenodo data contains 50 years of DINO output files to train on.

**Download data from Zenodo**

```bash
# TODO: add download instructions once Zenodo record is created
```

3. **Create a virtual environment and install dependencies**
2. **(Optional) Combine restart files and mesh mask** using [REBUILD_NEMO](https://forge.nemo-ocean.eu/nemo/nemo/-/tree/4.2.0/tools/REBUILD_NEMO):

This step is only required if you are using your own NEMO run. The Zenodo reference data already includes combined files. You can use the same module environment used to run NEMO/DINO to compile `rebuild_nemo`.

```bash
python3 -m venv venv
source venv/bin/activate
pip install ./nemo-spinup-forecast
pip install ./nemo-spinup-restart
pip install ./nemo-spinup-evaluation
./rebuild_nemo -n ./nam_rebuild data/DINO/restart/DINO_00576000_restart 36
./rebuild_nemo -n ./nam_rebuild data/DINO/mesh_mask 36
```

3. **Resample data**

> TODO: This step will soon use `cdo` to resample data. This is currently being done with [nemo-spinup-forecast/Notebooks/Resample_ssh.ipynb](https://github.com/m2lines/nemo-spinup-forecast/blob/main/Notebooks/Resample_ssh.ipynb)

All data must be temporally aligned before forecasting. Use the [Resample_ssh.ipynb](https://github.com/m2lines/nemo-spinup-forecast/blob/main/Notebooks/Resample_ssh.ipynb) notebook to convert monthly SSH (`DINO_1m_grid_T.nc`) to annual (`DINO_1m_To_1y_grid_T.nc`). Temperature and salinity (3-D) are already annual (`DINO_1y_grid_T.nc`).

If more training data is needed, concatenate monthly outputs `*grid_T.nc` with `ncrcat`, part of the [NCO (netCDF Operators)](https://nco.sourceforge.net/).

### Spin-up acceleration

The spin-up acceleration pipeline forecasts the ocean state forward in time using dimensionality reduction and Gaussian process regression, generates updated restart files, and evaluates the result against a reference numerical run. We begin with a baseline evaluation of the reference simulation so that the final evaluation can be compared against it.

4. **Establish a baseline evaluation** of the cold-start reference simulation:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear about what this actually does and what the output files represent.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean specifically the baseline cold-start evaluation?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think once I get an example dataset in this PR, it'll become clearer.


```bash
nemo-spinup-evaluation \
--sim-path data/DINO \
--config nemo-spinup-evaluation/configs/DINO-setup.yaml \
--results-dir output \
--result-file-prefix baseline \
--mode both
```

4. **Resample SSH** from monthly to annual using the notebook in `nemo-spinup-forecast/Notebooks/Resample_ssh.ipynb`.
Results are written to `output/baseline_restart.csv` and `output/baseline_grid.csv`.

5. **Create the projected state**

The default technique is PCA for dimensionality reduction with Gaussian process regression for forecasting. The key parameters to adjust are `--start` and `--steps`: `--start` controls how many years of spin-up are used for training (here 30 with 20 years thrown away), and `--steps` controls how many years are skipped forward (here 30). Increasing `--steps` gives a larger acceleration but may reduce accuracy.

```bash
python -m nemo_spinup_forecast \
nemo-spinup-forecast \
--ye True \
--start 20 \
--end 50 \
--comp 1 \
--steps 30 \
--path /path/to/reference/data \
--path data/DINO \
--ocean-terms nemo-spinup-forecast/ocean_terms.yaml \
--techniques-config nemo-spinup-forecast/src/nemo_spinup_forecast/techniques_config.yaml
```

6. **Prepare restart files** using REBUILD\_NEMO:
| Argument | Description |
| --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `--ye` | Simulation expressed in years (`True`) or months (`False`) |
| `--start` | Starting year for training data |
| `--end` | Ending year (usually the last simulated year) |
| `--comp` | Number or ratio of components to use |
| `--steps` | Jump size (years if `--ye True`, months otherwise) |
| `--path` | Directory containing the simulation files |
| `--ocean-terms` | Path to `ocean_terms.yaml` mapping logical terms (SSH, Salinity, Temperature) to dataset variable names; uses packaged default if omitted |
| `--techniques-config` | Path to `techniques_config.yaml` selecting DR and forecast techniques; uses packaged default if omitted |

```bash
./rebuild_nemo -n ./nam_rebuild /path/to/reference/data/restart/DINO_00576000_restart 36
./rebuild_nemo -n ./nam_rebuild /path/to/reference/data/mesh_mask 36
```
The forecast outputs predicted ocean state variables to `forecasts/simu_predicted/`.

6. **Create the updated restart file**

7. **Create the updated restart file**
Using the forecasted ocean state from the previous step, `nemo-spinup-restart` injects the predicted variables (SSH, temperature, salinity, and derived velocities) into the original NEMO restart file. A new restart file is created with `NEW_` prepended to the filename, leaving the original intact and ready to initialise NEMO at the projected year.

```bash
nemo-spinup-restart \
--restart_path /path/to/reference/data/restart/ \
--restart_path data/DINO/restart/ \
--radical DINO_00576000_restart \
--mask_file /path/to/reference/data/mesh_mask.nc \
--prediction_path /path/to/forecasts/latest/simu_predicted/ \
--mask_file data/DINO/mesh_mask.nc \
--prediction_path forecasts/simu_predicted/ \
--ocean_terms nemo-spinup-forecast/ocean_terms.yaml
```

8. **Copy the updated restart files** back to the NEMO experiment directory and update
`namelist_cfg` (`nn_it000`, `nn_itend`, `cn_ocerst_in`, `ln_rstart = .true.`).
- **`--radical`** is the prefix of the restart file (e.g. `DINO_00576000_restart`)
- Output files are named as the originals but with `NEW` prepended

7. **Evaluate** the projected state and compare against the baseline:

This is the evaluation on the updated restart state compared to the reference at 80 years. We recommend outputting restart files fairly frequently to test different step settings.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comparing against the baseline files that were created in step 4? If so, where are they referenced in the command? Also, how do they relate to the reference sim, if that's given?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same reference dataset (i.e. a full simulation run) but at a different time snapshot.


```bash
nemo-spinup-evaluation \
--sim-path ./data/DINO \
--ref-sim-path ./data/DINO/reference/ \ # optional: offline ground truth reference simulation
--config nemo-spinup-evaluation/configs/DINO-setup.yaml \
--results-dir output \
--result-file-prefix spinup_evaluation \
--mode restart
```

- **`--ref-sim-path`** — the offline reference simulation used as ground truth for comparison. TODO: upload full reference simulation data to Zenodo.

Results are written to `output/spinup_evaluation_restart.csv`. Compare these against the 80 year reference restart file output to understand how the ML forecast from the spin-up acceleration compares to the simulation.

> TODO: Does the DINO-setup.yaml need to modified here to point to the 80 year restart file.

---

## Running NEMO with the new state

1. **Copy the experiment directory** inside the NEMO repository as a backup; the original will be overwritten in the next step.

2. **Copy the updated restart files** (`NEW_DINO_<time>_restart_<proc_id>.nc`) back to the original experiment directory.

9. **Restart DINO** using the updated restart file.
3. **Update `namelist_cfg`** under `namrun`:

10. **Evaluate** the results:
| Parameter | Description |
| -------------- | ---------------------------------------------- |
| `nn_it000` | First timestep (last timestep + 1) |
| `nn_itend` | Final timestep |
| `cn_ocerst_in` | Restart filename (matches latest restart file) |
| `ln_rstart` | `.true.` to start from a restart file |

```bash
spinup-eval \
--sim-path /path/to/new/simulation \
--ref-sim-path /path/to/reference/data \
--config nemo-spinup-evaluation/configs/DINO-setup.yaml \
--mode both
```
4. **Restart DINO** using the updated restart file.

---

Expand Down