Skip to content

enable composing input data #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,33 @@
ClimaUtilities.jl Release Notes
===============================

main
-------
- [Feature] Add support for composing input variables.
PR [#105](https://github.com/CliMA/ClimaUtilities.jl/pull/105/)

This allows a list of `varnames` (and possibly `file_paths`) to be
passed to `TimeVaryingInput` or `SpaceVaryingInput`, along with a
`compose_function` to compose them, as so:

```julia
# Define the pointwise composing function we want, a simple sum in this case
compose_function = (x, y) -> x + y
# Define pre-processing function to convert units of input
unit_conversion_func = (data) -> 1000 * data

data_handler = TimeVaryingInputs.TimeVaryingInput("era5_example.nc",
["u", "v"],
target_space,
reference_date = Dates.DateTime(2000, 1, 1),
regridder_type = :InterpolationsRegridder,
file_reader_kwargs = (; preprocess_func = unit_conversion_func),
compose_function)
```

See the `TimeVaryingInput` or `DataHandler` docs "NetCDF file input"
sections for more details.

v0.1.14
-------

Expand Down
44 changes: 41 additions & 3 deletions docs/src/datahandling.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ interface). For instance, if need arises, the `DataHandler` can be used (almost)
directly to process files with a different format from NetCDF.

The key struct in `DataHandling` is the `DataHandler`. The `DataHandler`
contains a `FileReader`, a `Regridder`, and other metadata necessary to perform
contains one or more `FileReader`(s), a `Regridder`, and other metadata necessary to perform
its operations (e.g., target `ClimaCore.Space`). The `DataHandler` can be used
for static or temporal data, and exposes the following key functions:
- `regridded_snapshot(time)`: to obtain the regridded field at the given `time`.
Expand Down Expand Up @@ -52,7 +52,21 @@ It is possible to pass down keyword arguments to underlying constructors in
`DataHandler` with the `regridder_kwargs` and `file_reader_kwargs`. These have
to be a named tuple or a dictionary that maps `Symbol`s to values.

## Example
A `DataHandler` can contain information about a variable that we read directly from
an input file, or about a variable that is produced by composing data from multiple
input variables. In the latter case, the input variables may either all come from
the same input file, or may each come from a separate input file. The user must
provide the composing function, which operates pointwise on each of the inputs,
as well as an ordered list of the variable names to be passed to the function.
Additionally, input variables that are composed together must have the same
spatial and temporal dimensions.
Note that, if a non-identity pre-processing function is provided as part of
`file_reader_kwargs`, it will be applied to each input variable before they
are composed.
Composing multiple input variables is currently only supported with the
`InterpolationsRegridder`, not with `TempestRegridder`.

## Example: Linear interpolation of a single data variable

As an example, let us implement a simple linear interpolation for a variable `u`
defined in the `era5_example.nc` NetCDF file. The file contains monthly averages
Expand All @@ -68,7 +82,8 @@ import Interpolations

import Dates

unit_conversion_func = (data) -> 1000. * data
# Define pre-processing function to convert units of input
unit_conversion_func = (data) -> 1000 * data

data_handler = DataHandling.DataHandler("era5_example.nc",
"u",
Expand All @@ -93,6 +108,29 @@ function linear_interpolation(data_handler, time)
end
```

### Example appendix: Using multiple input data variables

Suppose that the input NetCDF file `era5_example.nc` contains two variables `u`
and `v`, and we care about their sum `u + v` but not their individual values.
We can provide a pointwise composing function to perform the sum, along with
the `InterpolationsRegridder` to produce the data we want, `u + v`.
The `preprocess_func` passed in `file_reader_kwargs` will be applied to `u`
and to `v` individually, before the composing function is applied. The regridding
is applied after the composing function. `u` and `v` could also come from separate
NetCDF files, but they must still have the same spatial and temporal dimensions.

```julia
# Define the pointwise composing function we want, a simple sum in this case
compose_function = (x, y) -> x + y
data_handler = DataHandling.DataHandler("era5_example.nc",
["u", "v"],
target_space,
reference_date = Dates.DateTime(2000, 1, 1),
regridder_type = :InterpolationsRegridder,
file_reader_kwargs = (; preprocess_func = unit_conversion_func),
compose_function)
```

## API

```@docs
Expand Down
66 changes: 61 additions & 5 deletions docs/src/inputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,66 @@ regridding onto the computational domains (using [`Regridders`](@ref) and
`TimeVaryingInputs` support:
- analytic functions of time;
- pairs of 1D arrays (for `PointSpaces`);
- 2/3D NetCDF files;
- 2/3D NetCDF files (including composing multiple variables from one or more files into one variable);
- linear interpolation in time (default), nearest neighbors, and "period filling";
- boundary conditions and repeating periodic data.

It is possible to pass down keyword arguments to underlying constructors in the
`Regridder` with the `regridder_kwargs` and `file_reader_kwargs`. These have to
be a named tuple or a dictionary that maps `Symbol`s to values.

### NetCDF file inputs
2D or 3D NetCDF files can be provided as inputs using `TimeVaryingInputs`. This
could be a single variable provided in a single file, multiple variables provided
in a single file, or multiple variables each coming from a unique file.
When using multiple variables, a composing function must be provided as well,
which will be used to combine the input variables into one data variable that
is ultimately stored in the `TimeVaryingInput`. In this case, the order of
variables provided in `varnames` determines the order of the arguments
passed to the composing function.

Note that if a non-identity pre-processing function is provided as part of
`file_reader_kwargs`, it will be applied to each input variable before they
are composed.
All input variables to be composed together must have the same spatial and
temporal dimensions.

Composing multiple input variables is currently only supported with the
`InterpolationsRegridder`, not with `TempestRegridder`. The regridding
is applied after the pre-processing and composing.

Composing multiple input variables in one `Input` is also possible with
a `SpaceVaryingInput`, and everything mentioned here applies in that case.

#### Example: NetCDF file input with multiple input variables

Suppose that the input NetCDF file `era5_example.nc` contains two variables `u`
and `v`, and we care about their sum `u + v` but not their individual values.
We can provide a pointwise composing function to perform the sum, along with
the `InterpolationsRegridder` to produce the data we want, `u + v`.
The `preprocess_func` passed in `file_reader_kwargs` will be applied to `u`
and to `v` individually, before the composing function is applied. The regridding
is applied after the composing function. `u` and `v` could also come from separate
NetCDF files, but they must still have the same spatial and temporal dimensions.

```julia
# Define the pointwise composing function we want, a simple sum in this case
compose_function = (x, y) -> x + y
# Define pre-processing function to convert units of input
unit_conversion_func = (data) -> 1000 * data

data_handler = TimeVaryingInputs.TimeVaryingInput("era5_example.nc",
["u", "v"],
target_space,
reference_date = Dates.DateTime(2000, 1, 1),
regridder_type = :InterpolationsRegridder,
file_reader_kwargs = (; preprocess_func = unit_conversion_func),
compose_function)
```

The same arguments (excluding `reference_date`) could be passed to a
`SpaceVaryingInput` to compose multiple input variables with that type.

### Extrapolation boundary conditions

`TimeVaryingInput`s can have multiple boundary conditions for extrapolation. By
Expand Down Expand Up @@ -131,7 +183,7 @@ by a factor of 100, we would change `albedo_tv` with
```julia
albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_space;
reference_date, regridder_kwargs = (; regrid_dir = "/tmp"),
file_reader_kwargs = (; preprocess_func = (x) -> 100x)
file_reader_kwargs = (; preprocess_func = (x) -> 100x))
```

!!! note In this example we used the [`TempestRegridder`](@ref). This is not the
Expand All @@ -153,10 +205,10 @@ albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_
(chiefly the [`DataHandling`](@ref) module) to construct a `Field` from
different sources.

`TimeVaryingInputs` support:
`SpaceVaryingInputs` support:
- analytic functions of coordinates;
- pairs of 1D arrays (for columns);
- 2/3D NetCDF files.
- 2/3D NetCDF files (including composing multiple variables from one or more files into one variable).

In some ways, a `SpaceVaryingInput` can be thought as an alternative constructor
for a `ClimaCore` `Field`.
Expand All @@ -165,6 +217,11 @@ It is possible to pass down keyword arguments to underlying constructors in the
`Regridder` with the `regridder_kwargs` and `file_reader_kwargs`. These have to
be a named tuple or a dictionary that maps `Symbol`s to values.

`SpaceVaryingInputs` support reading individual input variables from NetCDF files,
as well as composing multiple input variables into one `SpaceVaryingInput`.
See the [`TimeVaryingInput`](@ref) "NetCDF file inputs" section for more
information about this feature.

### Example

Let `target_space` be a `ClimaCore` `Space` where we want the `Field` to be
Expand Down Expand Up @@ -202,4 +259,3 @@ ClimaUtilities.TimeVaryingInputs.extrapolation_bc
Base.in
Base.close
```

Loading