Skip to content

nasa/harmony-metadata-annotator

Repository files navigation

Harmony Metadata Annotator

This service updates the metadata attributes of an input file to values that are known to be correct, either amending, adding or deleting attributes as appropriate. The underlying methodology is to use a configuration file with earthdata-varinfo to supply known corrections to the metadata.

Directory structure

πŸ“
β”œβ”€β”€ .πŸ“ github
β”œβ”€β”€ CHANGELOG.md
β”œβ”€β”€ CONTRIBUTING.md
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ πŸ“ bin
β”œβ”€β”€ dev_requirements.txt
β”œβ”€β”€ πŸ“ docker
β”œβ”€β”€ πŸ“ harmony_service
β”œβ”€β”€ πŸ“ metadata_annotator
β”œβ”€β”€ requirements.txt
└── πŸ“ tests
  • .github - Contains CI/CD workflows and pull request template.
  • CHANGELOG.md - Contains a record of changes applied to each new release of the Harmony Metadata Annotator Service.
  • CONTRIBUTING.md - Instructions on how to contribute to the repository.
  • LICENSE - Required for distribution under NASA open-source approval. Details conditions for use, reproduction and distribution.
  • README.md - This file, containing guidance on developing the library and service.
  • bin - A directory containing utility scripts to build the service and test images. A script to extract the release notes for the most recent version, as contained in CHANGELOG.md is also in this directory.
  • dev_requirements.txt - Contains a list of Python packages required for local development, but not for the service itself.
  • docker - A directory containing the Dockerfiles for the service and test images. It also contains service_version.txt, which contains the semantic version number of the library and service image. Update this file with a new version to trigger a release.
  • harmony_service - A directory containing the Harmony Service specific Python code. adapter.py contains the MetadataAnnotatorAdapter class that is invoked by calls to the Harmony service.
  • metadata_annotator - Directory containing business logic for the service, including Harmony scaffolding, such as the adapter class for the service.
  • requirements.txt - Contains a list of Python packages needed to run the service.
  • tests - Contains the pytest test suite.

Developer Notes

Local development

Local testing of service functionality can be achieved via a local instance of Harmony aka Harmony-In-A-Box. Please see instructions there regarding creation of a local Harmony instance.

For local development and testing of library modifications or small functions independent of the main Harmony application:

  1. Create a Python virtual environment
  2. Install the dependencies in requirements.txt, and tests/test_requirements.txt
  3. Install the pre-commit hooks (described below).

Creating spatial dimension variables

SMAP L3 collections are missing spatial dimension variables. This service can generate them by using a combination of required CF-compliant attributes and temporary helper attributes.

Temporary attributes are identified by a _* prefix. They are defined in the earthdata-varinfo configuration and made available in the VarInfoFromNetCDF4 object for use in annotations. These attributes are not written to the DataTree object or the NetCDF output file.

Required spatial dimension attributes

  • standard_name β€” Must be either projection_x_coordinate or projection_y_coordinate (per CF conventions).
  • grid_mapping β€” References a properly configured CRS variable (described below).

To accommodate possible subsetting, one of the following is also required in the dimension coordinate variable configuration:

  • _*corner_point_offsets: history_subset_index_ranges β€” Indicates that the subset index range should be parsed from the history metadata attribute.
  • _*subset_index_reference: <variable-reference> β€” Indicates that the subset index range should be obtained from the referenced row or column grid variable. The referenced variable must be explicitly requested or, preferably configured as an ancillary variable in Harmony Opendap SubSetter - varinfo confituration to ensure it is always available to the metadata-annotator.

CRS variables

  • When used for creating spatial dimensions, the following attribute is required:
    • _*master_geotransform β€” Defines the grid details used to generate dimension coordinates.

Example Metadata Override configuration for creating a spatial dimension:

The configuration example below creates the /Soil_Moisture_Retrieval_Data/y variable for SPL3SMAP.

For the service to create a spatial dimension variable, all 3 overrides are required.

  • The first override creates /Soil_Moisture_Retrieval_Data/y as a new variable in the VarInfoFromNetCDF4 object.
  • The second override adds the grid_mapping attribute to all variables in the /Soil_Moisture_Retrieval_Data/ group (including /Soil_Moisture_Retrieval_Data/y).
  • The third override creates the CRS variable and includes the _*master_geotransform attribute (required for creating the spatial dimension coordinates).
    {
      "Applicability": {
        "Mission": "SMAP",
        "ShortNamePath": "SPL3SMAP",
        "VariablePattern": "/Soil_Moisture_Retrieval_Data/y"
      },
      "Attributes": [
        {
          "Name": "standard_name",
          "Value": "projection_y_coordinate"
        },
        {
          "Name": "long_name",
          "Value": "y coordinate of projection"
        },
        {
          "Name": "dimensions",
          "Value": "y"
        },
        {
          "Name": "axis",
          "Value": "Y"
        },
        {
          "Name": "units",
          "Value": "m"
        },
        {
          "Name": "type",
          "Value": "float64"
        },
        {
          "Name": "_*corner_point_offsets",
          "Value": "history_subset_index_ranges"
        }
      ],
      "_Description": "The pseudo-dimension variable is supplemented with variable attributes (as if it was a dimension variables) to fully specify the Y dimension."
    },
    {
      "Applicability": {
        "Mission": "SMAP",
        "ShortNamePath": "SPL3SMAP",
        "VariablePattern": "/Soil_Moisture_Retrieval_Data/.*"
      },
      "Attributes": [
        {
          "Name": "grid_mapping",
          "Value": "/EASE2_global_projection_9km"
        }
      ],
      "_Description": "SMAP L3 collections omit global grid mapping information"
    },
    {
      "Applicability": {
        "Mission": "SMAP",
        "ShortNamePath": "SPL3SMAP",
        "VariablePattern": "/EASE2_global_projection_9km"
      },
      "Attributes": [
        {
          "Name": "grid_mapping_name",
          "Value": "lambert_cylindrical_equal_area"
        },
        {
          "Name": "standard_parallel",
          "Value": 30.0
        },
        {
          "Name": "longitude_of_central_meridian",
          "Value": 0.0
        },
        {
          "Name": "false_easting",
          "Value": 0.0
        },
        {
          "Name": "false_northing",
          "Value": 0.0
        },
        {
          "Name": "horizontal_datum_name",
          "Value": "WGS84"
        },
        {
          "Name": "inverse_flattening",
          "Value": 298.257223563
        },
        {
          "Name": "semi_major_axis",
          "Value":  6378137.0
        },
        {
          "Name": "semi_minor_axis",
          "Value": 6356752.314245
        },
        {
          "Name": "_*master_geotransform",
          "Value": [-17367530.4451615, 9008.055210146, 0, 7314540.8306386, 0, -9008.055210146]
        }
      ],
      "_Description": "Provide missing global grid mapping attributes for SMAP L3 collections."
    },

Tests

This service utilises the Python pytest package to perform unit tests on classes and functions in the service. After local development is complete, and test have been updated, they can be run in Docker via:

$ ./bin/build-image && ./bin/build-test && ./bin/run-test

It is also possible to run the test scripts directly (without Docker) by just running the run_tests.sh script with a proper Python environment. Do note that the reports directory will appear in the directory you call the script from.

The tests/run_tests.sh script will also generate a coverage report, rendered in HTML, and scan the code with pylint.

Currently, the pytest suite is run automatically within a GitHub workflow as part of a CI/CD pipeline. These tests are run for all changes made in a PR against the main branch. The tests must pass in order to merge the PR.

pre-commit hooks

This repository uses pre-commit to enable pre-commit checks that enforce coding standard best practices. These include:

  • Removing trailing whitespaces.
  • Removing blank lines at the end of a file.
  • Ensure JSON files have valid formats.
  • ruff Python linting checks.
  • black Python code formatting checks.
  • Ensuring no committed files are above 500 kB.

To enable these checks:

# Install pre-commit Python package via the listed development requirements:
pip install -r dev_requirements.txt

# Install the git hook scripts:
pre-commit install

Versioning

Docker service images for the harmony-metadata-annotator adhere to semantic version numbers: major.minor.patch.

  • Major increments: These are non-backwards compatible API changes.
  • Minor increments: These are backwards compatible API changes.
  • Patch increments: These updates do not affect the API to the service.

Gotchas:

The service currently uses xarray.DataTree.to_netcdf to write the whole DataTree object out to a file. This is very memory intensive, meaning that the Harmony in a Box configuration listed above uses 8 GiB for the memory limit of the service. A future improvement would be to find a way to write things out incrementally. The Harmony SMAP L2 Gridder does perform such an operation, and may be a good model to update this code.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors 5