Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/data tree cleanup #325

Draft
wants to merge 28 commits into
base: develop
Choose a base branch
from
Draft

Feature/data tree cleanup #325

wants to merge 28 commits into from

Conversation

sliu008
Copy link
Contributor

@sliu008 sliu008 commented Mar 19, 2025

Github Issue: #295

Description

  • Rewrite l2ss-py to use xarray DataTree instead of flattening the arrays

Overview of work done

Summarize the work you did

Overview of verification done

Summarize the testing and verification you've done. This includes unit tests or testing with specific data

Overview of integration done

Explain how this change was integration tested. Provide screenshots or logs if appropriate. An example of this would be a local Harmony deployment.

PR checklist:

  • Linted
  • Updated unit tests
  • Updated changelog
  • Integration testing

See Pull Request Review Checklist for pointers on reviewing this pull request

@sliu008
Copy link
Contributor Author

sliu008 commented Mar 27, 2025

Pending Implementations

Recent Changes & Issues

  • File Handling: No longer using netCDF4 to open files for preprocessing.
  • DataTree Compatibility: DataTree cannot use netCDF4 store to open netCDF4 objects after preprocessing.

DataTree Coordinate Handling

  • Issue: When subsetting a DataTree, child nodes inherit a copy of the coordinates from the parent. When subsetting and creating a new data tree the coords are also in the new tree although might not have been in the original dataset.
  • Solution: Implemented a function to remove coordinates from child nodes.
  • Impact: Unclear if retaining or removing these coordinates matters, but currently removing them.

Memory Optimization in Harmony

  • Previously, variables were loaded and then written to a NetCDF file to optimize memory usage in Harmony.
  • Open Question: Can we achieve the same memory efficiency with DataTree, or will it require handling each group and variable separately?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant