Skip to content

[GSoC Project Proposal]: IOOS Cloud Sandbox - model validation and verification tools #84

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
patrick-tripp opened this issue Feb 10, 2025 · 5 comments
Labels
GSoC25 project idea Designates a proposed project idea

Comments

@patrick-tripp
Copy link
Member

patrick-tripp commented Feb 10, 2025

Project Description

Add model validation and verification tools to https://github.com/ioos/Cloud-Sandbox

Model validation and verification can include comparing model results to observations, previously validated data, other model data, etc. It is required for many use cases including the following:

  • Bringing a model into the Sandbox from a different computing environment.
  • Changes to the model code or the way it is run.
  • Changes to the data that feeds the model.
  • Model hindcast runs.
  • and others.

We would like to have reusable tools for this. The tools should support multiple data sources and formats.

Expected Outcomes

A collection of scripts that can accomplish the following:

  • Compare model forecast/hindcast data to validation data.
  • Create plots of the gridded analysis data (map plot).
  • Create plots of timeseries analysis for specific points.
  • Create other plots as later determined.

Skills Required

Python, Linux BASH shell scripting, Jupyter Notebooks, statistical analysis methods used in ocean and atmospheric sciences, a basic understanding of numerical ocean modeling or numerical weather prediction.

Additional Background/Issues

There is some existing plotting code that can be used as examples to build on:

https://github.com/ioos/Cloud-Sandbox/blob/main/cloudflow/notebooks/sandbot_current_fcst_JS.ipynb
https://github.com/ioos/Cloud-Sandbox/blob/main/cloudflow/notebooks/ufs_test.ipynb

Mentor(s)

patrick-tripp,

Mentor Contact Email(s)

[email protected]

Expected Project Size

175 hours

Project Difficulty

Intermediate

@patrick-tripp patrick-tripp added GSoC25 project idea Designates a proposed project idea labels Feb 10, 2025
@RATED-R-SUNDRAM
Copy link

Hi @patrick-tripp , I am shivam sundram. I've experience in building statistical modes and their validation. With my experience in Python and data science at a production level, I am interested in contributing to the project. could you provide a head start or guidance on where to get started

@aasimkhan02
Copy link

Hello @patrick-tripp , I am Khan Mohd Aasim. I’m interested in contributing to the model validation and verification tools for the IOOS Cloud-Sandbox. I have experience with Python, Bash scripting, and data analysis and would love to help improve the validation workflows. I have reviewed the existing code and read the documentation, and I’m excited to contribute. Could you guide me on the next steps?

@patrick-tripp
Copy link
Member Author

I appreciate your interest. Please contact me via email for additional guidance.

@navinaamuthan
Copy link

Hi @patrick-tripp. I'm interested in working on this Project. I'm a Software Engineer with a Bachelor's in Information Science and I have practical experience working with Large Scale ML Model Building & Evaluation and Software Development. I've sent you a mail with a few quries. Kindly assist with the same.

@patrick-tripp
Copy link
Member Author

Follow up information:

This is an area of active development to meet the needs of the scientific community with the increased use of commercial cloud services.

Informational links:
https://ioos.noaa.gov/
https://oceanservice.noaa.gov/welcome.html

The Cloud Sandbox is helping them develop and run/test improvements to their models and new models.
The Great Lakes models use FVCOM model and although the output is netCDF, the grid is not the easiest to work with. Most of the other regional operational models use the ROMS model.

There is a web-based viewer that has observation data also. The “I” button on the layer list will show you where data is located for download.

https://oceansmap.com/link/Do6HqYDAQOeE1SuQw31AfA

Take a look at the python code and noteboooks linked to in th GSoC project. For starters, we would like to create timeseries plots for a single latitude/longitude (and depth) that compares model forecast output to actual observations.

There is a lot of data hosted here: https://opendap.co-ops.nos.noaa.gov/thredds/catalog/catalog.html
We have a 30 day revolving archive also for some of that data.

There are also datasets available for free use/download here:
https://www.noaa.gov/nodd/datasets#NOS

There is data on the NODD (NOAA Open Data Dissemination) that is optimized for cloud use. It uses Kerchunk.
https://registry.opendata.aws/noaa-nodd-kerchunk/

Ideally, the data analysis would be done in the cloud, close to where the data is located so large amounts of data don’t need to be downloaded. We have used JupyterHub and python notebooks to do this type of work.

Take a look through the above.. This is my first time really mentoring GSoC, so I am learning also. We might be able to provide access to a JupyterHub environment for you to use if selected.

We use Amazon Web Services (AWS), some familiarity with that and it’s BOTO3 python3 api would be good, especially for S3 (Simple Storage Service).

The following project provides more clarity as to what we are looking for in this project.

https://github.com/NOAA-CO-OPS/Next-Gen-NOS-OFS-Skill-Assessment

Out of respect for your time, I don't expect any applicants to spend a lot of time on coding, and there is not a lot of time left anyway.

But to increase your chances of being selected, it would be good to see some code in your personal GitHub account that demonstrates some basic things, such as:

            - The ability to open and read some of the relevant or similar data.
            - Use of relevant pre-existing python packages/APIs.
            - Output or plot some of the data.
            - Well formatted and well documented code.
            - Use of stubs and/or pseudo-code where appropriate to demonstrate good software engineering architecture.

You can create a fork of the IOOS-Cloud-Sandbox repository and place your code there in a new branch.

To encourage innovation, I do not have any other special requirements for the application. Feel free to use generative AI to assist you, just make sure the code is correct and that you completely understand it.

I will require a live video-chat with each applicant before making the final selection.

Thank you and I will try to be quicker to respond to any questions, especially between now and the application deadline.

Patrick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GSoC25 project idea Designates a proposed project idea
Projects
None yet
Development

No branches or pull requests

4 participants