[GSoC Project Proposal]: Extend CrocoLake's available datasets

### Project Description

CrocoLake is a datalake gathering several physical and biogeochemical ocean observations, with the goal of providing an efficient format and a unified interface for data assimilation and ocean modeling activities. CrocoLakeTools contains the python modules to convert existing datasets to CrocoLake's structure (i.e. parquet format with common schema), and a unified interface to load them in the same dataframe.

This project consists in taking an existing dataset that is not yet included in CrocoLake and developing or adapting existing modules to convert it to CrocoLake's format. The list of possible datasets to include has already been put together by the mentor. The dataset to convert will be chosen together with the mentee, based on their experience and familiarity with the format and size of the original database. The project can then be tailored to the mentee's interests and skills: from the conversion of a single-file csv dataset, to the multi-processing conversion of a dataset containing multiple netCDF files.

### Expected Outcomes

- CrocoLake's success depends on the amount of observations that it can serve in a uniform manner, hence the importance of this project. 
- Adding more modules to convert datasets increases the examples on how to build one, reducing the efforts of future users that are interested in adding their own datasets.
- CrocoLakeTools is at its infant stage, and this project would also provide feedback on its accessibility to new users.

### Skills Required

Python (pandas, xarray), git

### Additional Background/Issues

CrocoLakeTools current version is [here](https://github.com/boom-lab/crocolaketools-public).

### Mentor(s)

Enrico Milanese (enrico.milanese@whoi.edu, @enrico-mi)

### Expected Project Size

175 hours

### Project Difficulty

Intermediate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GSoC Project Proposal]: Extend CrocoLake's available datasets #75

Project Description

Expected Outcomes

Skills Required

Additional Background/Issues

Mentor(s)

Expected Project Size

Project Difficulty

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[GSoC Project Proposal]: Extend CrocoLake's available datasets #75

Description

Project Description

Expected Outcomes

Skills Required

Additional Background/Issues

Mentor(s)

Expected Project Size

Project Difficulty

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions