Skip to content

[GSoC Project Proposal]: Extend CrocoLake's available datasets #75

Open
@enrico-mi

Description

@enrico-mi

Project Description

CrocoLake is a datalake gathering several physical and biogeochemical ocean observations, with the goal of providing an efficient format and a unified interface for data assimilation and ocean modeling activities. CrocoLakeTools contains the python modules to convert existing datasets to CrocoLake's structure (i.e. parquet format with common schema), and a unified interface to load them in the same dataframe.

This project consists in taking an existing dataset that is not yet included in CrocoLake and developing or adapting existing modules to convert it to CrocoLake's format. The list of possible datasets to include has already been put together by the mentor. The dataset to convert will be chosen together with the mentee, based on their experience and familiarity with the format and size of the original database. The project can then be tailored to the mentee's interests and skills: from the conversion of a single-file csv dataset, to the multi-processing conversion of a dataset containing multiple netCDF files.

Expected Outcomes

  • CrocoLake's success depends on the amount of observations that it can serve in a uniform manner, hence the importance of this project.
  • Adding more modules to convert datasets increases the examples on how to build one, reducing the efforts of future users that are interested in adding their own datasets.
  • CrocoLakeTools is at its infant stage, and this project would also provide feedback on its accessibility to new users.

Skills Required

Python (pandas, xarray), git

Additional Background/Issues

CrocoLakeTools current version is here.

Mentor(s)

Enrico Milanese ([email protected], @enrico-mi)

Expected Project Size

175 hours

Project Difficulty

Intermediate

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions