Skip to content

Commit d39a234

Browse files
committed
support composing input variables
This PR adds functionality to `DataHandler`, `TimeVaryingInputs`, and `SpaceVaryingInputs` to enable composing multiple input variables into one data variable. To do this, the user must specify a composing function, multiple variable names, and the file paths where they can be read from. Most of the changes have been made at the `DataHandler` level. Each input variable has its own unique `FileReader` object, and each composed data variable has one `Time/SpaceVaryingInput` and one `DataHandler`. The composing function itself is applied in the `regridded_snapshot` function, just before regridding. The user will interact with this feature at the `Time/SpaceVaryingInput` level. This feature is only available when using `InterpolationsRegridder`, not `TempestRegridder`. Design decisions made include: - If a pre-processing function is provided, it is applied to each input variable before they are composed. - Variables are composed before regridding, to preserve higher resolution information - We assume that all input variables have the same temporal and spatial dimensions. This is explicitly checked in the `DataHandler` constructor, and will raise an informative error message if it is not true. - Multiple input variables can come from one file, or each from their own unique file. We don't currently support arbitrary numbers of input variables and files, since this would require more work to implement and is not an expected use case in the near term.
1 parent 2f14f11 commit d39a234

File tree

8 files changed

+429
-61
lines changed

8 files changed

+429
-61
lines changed

NEWS.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,33 @@
11
ClimaUtilities.jl Release Notes
22
===============================
33

4+
main
5+
-------
6+
- [Feature] Add support for composing input variables.
7+
PR [#105](https://github.com/CliMA/ClimaUtilities.jl/pull/105/)
8+
9+
This allows a list of `varnames` (and possibly `file_paths`) to be
10+
passed to `TimeVaryingInput` or `SpaceVaryingInput`, along with a
11+
`compose_function` to compose them, as so:
12+
13+
```julia
14+
# Define the pointwise composing function we want, a simple sum in this case
15+
compose_function = (x, y) -> x + y
16+
# Define pre-processing function to convert units of input
17+
unit_conversion_func = (data) -> 1000 * data
18+
19+
data_handler = TimeVaryingInputs.TimeVaryingInput("era5_example.nc",
20+
["u", "v"],
21+
target_space,
22+
reference_date = Dates.DateTime(2000, 1, 1),
23+
regridder_type = :InterpolationsRegridder,
24+
file_reader_kwargs = (; preprocess_func = unit_conversion_func),
25+
compose_function)
26+
```
27+
28+
See the `TimeVaryingInput` or `DataHandler` docs "NetCDF file input"
29+
sections for more details.
30+
431
v0.1.14
532
-------
633

docs/src/datahandling.md

Lines changed: 41 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ interface). For instance, if need arises, the `DataHandler` can be used (almost)
1717
directly to process files with a different format from NetCDF.
1818

1919
The key struct in `DataHandling` is the `DataHandler`. The `DataHandler`
20-
contains a `FileReader`, a `Regridder`, and other metadata necessary to perform
20+
contains one or more `FileReader`(s), a `Regridder`, and other metadata necessary to perform
2121
its operations (e.g., target `ClimaCore.Space`). The `DataHandler` can be used
2222
for static or temporal data, and exposes the following key functions:
2323
- `regridded_snapshot(time)`: to obtain the regridded field at the given `time`.
@@ -52,7 +52,21 @@ It is possible to pass down keyword arguments to underlying constructors in
5252
`DataHandler` with the `regridder_kwargs` and `file_reader_kwargs`. These have
5353
to be a named tuple or a dictionary that maps `Symbol`s to values.
5454

55-
## Example
55+
A `DataHandler` can contain information about a variable that we read directly from
56+
an input file, or about a variable that is produced by composing data from multiple
57+
input variables. In the latter case, the input variables may either all come from
58+
the same input file, or may each come from a separate input file. The user must
59+
provide the composing function, which operates pointwise on each of the inputs,
60+
as well as an ordered list of the variable names to be passed to the function.
61+
Additionally, input variables that are composed together must have the same
62+
spatial and temporal dimensions.
63+
Note that, if a non-identity pre-processing function is provided as part of
64+
`file_reader_kwargs`, it will be applied to each input variable before they
65+
are composed.
66+
Composing multiple input variables is currently only supported with the
67+
`InterpolationsRegridder`, not with `TempestRegridder`.
68+
69+
## Example: Linear interpolation of a single data variable
5670

5771
As an example, let us implement a simple linear interpolation for a variable `u`
5872
defined in the `era5_example.nc` NetCDF file. The file contains monthly averages
@@ -68,7 +82,8 @@ import Interpolations
6882

6983
import Dates
7084

71-
unit_conversion_func = (data) -> 1000. * data
85+
# Define pre-processing function to convert units of input
86+
unit_conversion_func = (data) -> 1000 * data
7287

7388
data_handler = DataHandling.DataHandler("era5_example.nc",
7489
"u",
@@ -93,6 +108,29 @@ function linear_interpolation(data_handler, time)
93108
end
94109
```
95110

111+
### Example appendix: Using multiple input data variables
112+
113+
Suppose that the input NetCDF file `era5_example.nc` contains two variables `u`
114+
and `v`, and we care about their sum `u + v` but not their individual values.
115+
We can provide a pointwise composing function to perform the sum, along with
116+
the `InterpolationsRegridder` to produce the data we want, `u + v`.
117+
The `preprocess_func` passed in `file_reader_kwargs` will be applied to `u`
118+
and to `v` individually, before the composing function is applied. The regridding
119+
is applied after the composing function. `u` and `v` could also come from separate
120+
NetCDF files, but they must still have the same spatial and temporal dimensions.
121+
122+
```julia
123+
# Define the pointwise composing function we want, a simple sum in this case
124+
compose_function = (x, y) -> x + y
125+
data_handler = DataHandling.DataHandler("era5_example.nc",
126+
["u", "v"],
127+
target_space,
128+
reference_date = Dates.DateTime(2000, 1, 1),
129+
regridder_type = :InterpolationsRegridder,
130+
file_reader_kwargs = (; preprocess_func = unit_conversion_func),
131+
compose_function)
132+
```
133+
96134
## API
97135

98136
```@docs

docs/src/inputs.md

Lines changed: 61 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,14 +28,66 @@ regridding onto the computational domains (using [`Regridders`](@ref) and
2828
`TimeVaryingInputs` support:
2929
- analytic functions of time;
3030
- pairs of 1D arrays (for `PointSpaces`);
31-
- 2/3D NetCDF files;
31+
- 2/3D NetCDF files (including composing multiple variables from one or more files into one variable);
3232
- linear interpolation in time (default), nearest neighbors, and "period filling";
3333
- boundary conditions and repeating periodic data.
3434

3535
It is possible to pass down keyword arguments to underlying constructors in the
3636
`Regridder` with the `regridder_kwargs` and `file_reader_kwargs`. These have to
3737
be a named tuple or a dictionary that maps `Symbol`s to values.
3838

39+
### NetCDF file inputs
40+
2D or 3D NetCDF files can be provided as inputs using `TimeVaryingInputs`. This
41+
could be a single variable provided in a single file, multiple variables provided
42+
in a single file, or multiple variables each coming from a unique file.
43+
When using multiple variables, a composing function must be provided as well,
44+
which will be used to combine the input variables into one data variable that
45+
is ultimately stored in the `TimeVaryingInput`. In this case, the order of
46+
variables provided in `varnames` determines the order of the arguments
47+
passed to the composing function.
48+
49+
Note that if a non-identity pre-processing function is provided as part of
50+
`file_reader_kwargs`, it will be applied to each input variable before they
51+
are composed.
52+
All input variables to be composed together must have the same spatial and
53+
temporal dimensions.
54+
55+
Composing multiple input variables is currently only supported with the
56+
`InterpolationsRegridder`, not with `TempestRegridder`. The regridding
57+
is applied after the pre-processing and composing.
58+
59+
Composing multiple input variables in one `Input` is also possible with
60+
a `SpaceVaryingInput`, and everything mentioned here applies in that case.
61+
62+
#### Example: NetCDF file input with multiple input variables
63+
64+
Suppose that the input NetCDF file `era5_example.nc` contains two variables `u`
65+
and `v`, and we care about their sum `u + v` but not their individual values.
66+
We can provide a pointwise composing function to perform the sum, along with
67+
the `InterpolationsRegridder` to produce the data we want, `u + v`.
68+
The `preprocess_func` passed in `file_reader_kwargs` will be applied to `u`
69+
and to `v` individually, before the composing function is applied. The regridding
70+
is applied after the composing function. `u` and `v` could also come from separate
71+
NetCDF files, but they must still have the same spatial and temporal dimensions.
72+
73+
```julia
74+
# Define the pointwise composing function we want, a simple sum in this case
75+
compose_function = (x, y) -> x + y
76+
# Define pre-processing function to convert units of input
77+
unit_conversion_func = (data) -> 1000 * data
78+
79+
data_handler = TimeVaryingInputs.TimeVaryingInput("era5_example.nc",
80+
["u", "v"],
81+
target_space,
82+
reference_date = Dates.DateTime(2000, 1, 1),
83+
regridder_type = :InterpolationsRegridder,
84+
file_reader_kwargs = (; preprocess_func = unit_conversion_func),
85+
compose_function)
86+
```
87+
88+
The same arguments (excluding `reference_date`) could be passed to a
89+
`SpaceVaryingInput` to compose multiple input variables with that type.
90+
3991
### Extrapolation boundary conditions
4092

4193
`TimeVaryingInput`s can have multiple boundary conditions for extrapolation. By
@@ -131,7 +183,7 @@ by a factor of 100, we would change `albedo_tv` with
131183
```julia
132184
albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_space;
133185
reference_date, regridder_kwargs = (; regrid_dir = "/tmp"),
134-
file_reader_kwargs = (; preprocess_func = (x) -> 100x)
186+
file_reader_kwargs = (; preprocess_func = (x) -> 100x))
135187
```
136188

137189
!!! note In this example we used the [`TempestRegridder`](@ref). This is not the
@@ -153,10 +205,10 @@ albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_
153205
(chiefly the [`DataHandling`](@ref) module) to construct a `Field` from
154206
different sources.
155207

156-
`TimeVaryingInputs` support:
208+
`SpaceVaryingInputs` support:
157209
- analytic functions of coordinates;
158210
- pairs of 1D arrays (for columns);
159-
- 2/3D NetCDF files.
211+
- 2/3D NetCDF files (including composing multiple variables from one or more files into one variable).
160212

161213
In some ways, a `SpaceVaryingInput` can be thought as an alternative constructor
162214
for a `ClimaCore` `Field`.
@@ -165,6 +217,11 @@ It is possible to pass down keyword arguments to underlying constructors in the
165217
`Regridder` with the `regridder_kwargs` and `file_reader_kwargs`. These have to
166218
be a named tuple or a dictionary that maps `Symbol`s to values.
167219

220+
`SpaceVaryingInputs` support reading individual input variables from NetCDF files,
221+
as well as composing multiple input variables into one `SpaceVaryingInput`.
222+
See the [`TimeVaryingInput`](@ref) "NetCDF file inputs" section for more
223+
information about this feature.
224+
168225
### Example
169226

170227
Let `target_space` be a `ClimaCore` `Space` where we want the `Field` to be
@@ -202,4 +259,3 @@ ClimaUtilities.TimeVaryingInputs.extrapolation_bc
202259
Base.in
203260
Base.close
204261
```
205-

0 commit comments

Comments
 (0)