- Greenland liquid water discharge from 1958 through 2019
- WARNING
- Citation
- Accessing this data
This is the source for “Greenland liquid water discharge from 1958 through 2019” and subsequent versions.
- The paper is located at https://doi.org/10.5194/essd-2020-47.
- The data sets are located at [[https://doi.org/10.22008/promice/freshwater][doi:10.22008/promice/freshwater]
- Companion paper: “Greenland Ice Sheet solid ice discharge from 1986 through 2019”
- Publication: doi:10.5194/essd-12-1367-2020
- Source: https://github.com/mankoff/ice_discharge/
- Data: doi:10.22008/promice/data/ice_discharge
The source for this work is hosted on GitHub at https://github.com/mankoff/freshwater. GitHub issues are used to collect suggested improvements to the paper, problems that made it through review, and mention of similar papers that have been published since this was accepted. The work may be under be under active development, including updating data (and therefore tables) within the source document.
- This diff shows changes between the published version of the paper and the current (active) development version.
- The source for the active development version can be viewed at https://github.com/mankoff/freshwater/tree/main
- The source for the published paper can be viewed at https://github.com/mankoff/freshwater/tree/TAG
Bugs may exist in this data or the ./discharge.py access script. All known bugs will be documented at https://github.com/mankoff/freshwater/issue. Before using this software or finalizing results, you should check if any open issues impact your results, or if any issue have been closed since you downloaded the data or script.
@article{Mankoff_2020_water,
doi = {10.5194/essd-2020-47},
url = {https://doi.org/10.5194/essd-2020-47},
year = {2020},
month = 4,
publisher = {Copernicus {GmbH}},
author = {Kenneth D. Mankoff and Brice Noël and Xavier Fettweis and Andreas P. Ahlstrøm
and William Colgan and Ken Kondo and Kirsty Langley and Shin Sugiyama
and Dirk van As and Robert S. Fausto},
title = {Greenland liquid water discharge from 1958 through 2019},
journal = {Earth System Science Data Discussions},
note = {In Review}
}
After the data have been downloaded, the discharge.py script allows access to outlets, basins, and their discharge within a region of interest (ROI). The ROI can be a point, a list describing a polygon, or a file. Optionally, upstream outlets, basins, and discharge from any land outlet(s) can be included. The script can be called from the command line (CLI) or within Python.
The ROI coordinate units can be either EPSG:4326 (lon,lat) or EPSG:3413. The units for the coordinates are guessed using the range of values. If the ROI is a point, basins that contain that point are selected. Either 1 (if the point is on land) or two (ice and the downstream land, if the point is on the ice) basins are selected, and optionally, all ice basins upstream from the one land basin. If the ROI is a polygon, all outlets within the polygon are selected. The polygon does not have to be closed - a convex hull is wrapped around it. If the argument is a file (e.g. KML file) then the first polygon is selected and used.
When the script is run from the command line, CSV data is written to stdout and can be redirected to a file. When the API is accessed from within Python, if the script is used to access outlets, a GeoPandas GeoDataFrame is returned and can be used for further analysis within Python, or written to any file format supported by GeoPandas or Pandas, for example CSV, or GeoPackage for QGIS. If the script is used to access discharge, an xarray Dataset is returned, and can be used for further analysis within Python, or written to any file format supported by xarray, for example CSV or NetCDF.
This script queries two database:
- land
- The land coast outlets and land basins.
- ice
- ice margin outlets and ice basins.
The folder structure required is a root folder (named freshwater in the examples below, but can be anything) and then a land and ice sub-folder. The geospatial files for land and ice must be in these folders (i.e. the k=1.0 Streams, Outlets, and Basins dataset from https://dataverse01.geus.dk/dataverse/freshwater). Then, each of the land and ice folders must contain a runoff folder with the appropriate (MAR & RACMO) yearly discharge NetCDF files
Example:
./freshwater/land/ ./freshwater/land/outlets.csv ./freshwater/land/basins.csv ./freshwater/land/streams.csv ./freshwater/land/streams.gpkg ./freshwater/land/outlets.gpkg ./freshwater/land/basins.gpkg ./freshwater/land/basins_filled.gpkg ./freshwater/land/runoff ./freshwater/land/runoff/MAR_<yyyy>.nc ./freshwater/land/runoff/RACMO_<yyyy>.nc ./freshwater/ice/ ./freshwater/ice/outlets.csv ./freshwater/ice/basins.csv ./freshwater/ice/streams.csv ./freshwater/ice/streams.gpkg ./freshwater/ice/outlets.gpkg ./freshwater/ice/basins.gpkg ./freshwater/ice/basins_filled.gpkg ./freshwater/ice/runoff ./freshwater/ice/runoff/MAR_<yyyy>.nc ./freshwater/ice/runoff/RACMO_<yyyy>.nc
- The script takes a few seconds to query the outlets and basins. The script takes ~10s of seconds to query each of the discharge time series datasets. Because there may be up to 6 discharge queries (2 RCMs for each of 1 land domain + ice domain + upstream ice), it can several minutes on a fast laptop to extract the data. To track progress, do not set the
quietflag toTrue. - If a polygon includes ice outlets, and the
upstreamflag is set, some ice outlets, basins, and discharge may be included twice, once as a “direct” selection within the polygon and once as an upstream outlet and basin from the land polygon. Further processing by the user can remove duplicates (see examples below). - The
idcolumn may not be unique for multiple reasons:- As above, the same outlet may be included twice.
id’s are unique within a dataset (i.e.land, andice), but not between datasets.
- Due to bash command-line parsing behavior, the syntax
--roi -60,60does not work. Use--roi=-60,06. - Longitude is expected in degrees East, and should therefore probably be negative.
See environment.yml file in Git repository, or
conda create -n freshwater_user python=3.7 xarray=0.15.1 fiona=1.8.13 shapely=1.7.0 geopandas=0.7.0 netcdf4=1.5.3 dask=2.15.0
conda activate freshwater_userpython ./discharge.py -husage: discharge.py [-h] --base BASE --roi ROI [-u] (-o | -d) [-q] Discharge data access optional arguments: -h, --help show this help message and exit --base BASE Folder containing freshwater data --roi ROI x,y OR lon,lat OR x0,y0 x1,y1 ... xn,yn OR lon0,lat0 lon1,lat1 ... lon_n,lat_n. [lon: degrees E] -u, --upstream Include upstream ice outlets draining into land basins -o, --outlets Return outlet IDs (same as basin IDs) -d, --discharge Return RCM discharge for each domain (outlets merged) -q, --quiet Be quiet
The simplest example is a point, in this case near the Watson River outlet. Because we select one point over land and do not request upstream outlets and basins, only one row should be returned.
python ./discharge.py --base ./freshwater --roi=-50.5,67.2 -o -q| index | id | lon | lat | x | y | elev | domain | upstream | coast_id | coast_lon | coast_lat | coast_x | coast_y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 112448 | -51.233 | 67.156 | -272150 | -2491850 | 42 | land | False | -1 | -1 | -1 |
If we move 10° east to somewhere over the ice, there should be four rows: one for the land outlet and basin, and three more for the three ice scenario:
python ./discharge.py --base ./freshwater --roi=-40.5,67.2 -o -q| index | id | lon | lat | x | y | elev | domain | upstream | coast_id | coast_lon | coast_lat | coast_x | coast_y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 118180 | -38.071 | 66.33 | 313650 | -2580750 | -78 | land | False | -1 | -1 | -1 | ||
| 1 | 67133 | -38.11 | 66.333 | 311850 | -2580650 | -58 | ice | False | 118180 | -38.071 | 66.33 | 313650 | -2580750 |
Here a polygon covers several land outlets near the end of a fjord, and several ice outlets of the nearby ice margin. In addition, we request all ice outlets upstream of all selected land basins.
We use the following simple KML file for our ROI (this can be copied-and-pasted into the Google Earth side-bar to see it). Rather than use this file with --roi=/path/to/file.kml, we use the coordinates directly, and demonstrate dropping the last coordinate because the code will wrap any polygon in a convex hull.
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<name>Ice and Land Sample</name>
<Placemark>
<name>ice and land</name>
<LineString>
<tessellate>1</tessellate>
<coordinates>-51.50,66.93 -51.21,66.74 -49.44,66.91 -49.84,67.18 -51.50,66.93</coordinates>
</LineString>
</Placemark>
</Document>
</kml>In this example, we query for upstream outlets, and for brevity show just the first three and last three lines.
python ./discharge.py --base ./freshwater --roi="-51.50,66.93 -51.21,66.74 -49.44,66.91 -49.84,67.18" -q -u -o | (head -n3 ;tail -n4)| index | id | lon | lat | x | y | elev | domain | upstream | coast_id | coast_lon | coast_lat | coast_x | coast_y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 113526 | -50.713 | 67.002 | -251250 | -2511450 | 20 | land | False | -1 | -1 | -1 | ||
| 1 | 113705 | -50.735 | 66.988 | -252350 | -2512850 | 7 | land | False | -1 | -1 | -1 | ||
| 205 | 67140 | -49.538 | 66.425 | -204850 | -2580850 | 794 | ice | True | 114920 | -50.652 | 66.868 | -250050 | -2526750 |
| 206 | 67163 | -49.544 | 66.419 | -205150 | -2581550 | 825 | ice | True | 114920 | -50.652 | 66.868 | -250050 | -2526750 |
| 207 | 67211 | -49.534 | 66.406 | -204850 | -2583050 | 866 | ice | True | 114920 | -50.652 | 66.868 | -250050 | -2526750 |
The discharge examples here use the same code as the “outlets and basins” examples above, except we use --discharge rather than --outlet.
The simplest example is a point, in this case near the Watson River outlet. Because we select one point over land and do not request upstream outlets and basins, two time series should be returned: MAR_land_100 and RACMO_land_100. Rather than showing results for every day from 1958 through 2019, we limit to the header and the first 10 days of June, 2012.
python ./discharge.py --base ./freshwater --roi=-50.5,67.2 -q -d | (head -n1; grep -A9 "^2012-06-01")| time | MAR_land | RACMO_land |
|---|---|---|
| 2012-06-01 | 0.043025 | 0.382903 |
| 2012-06-02 | 5.5e-05 | 0.095672 |
| 2012-06-03 | 5e-05 | 0.009784 |
| 2012-06-04 | 9e-06 | -0.007501 |
| 2012-06-05 | 0.008212 | 0.007498 |
| 2012-06-06 | 28.601947 | 0.607345 |
| 2012-06-07 | 0.333926 | 0.05691 |
| 2012-06-08 | 0.489437 | 0.204384 |
| 2012-06-09 | 0.038816 | 0.167325 |
| 2012-06-10 | 5.1e-05 | 0.011415 |
- If we move 10° east to somewhere over the ice we add two columns: One for each of the two RCMs over the ice domain.
- If the
--upstreamflag is set, we add two columns: One for each of the RCMs over the upstream ice domains. Results are summed across outlets per domain. - Results are therefore one of the following
- Two columns: 2 RCM * 1 land domain
- Four columns: 2 RCM * (1 land + 1 ice domain)
- Four columns: 2 RCM * (1 land + 1 upstream ice domain)
- Six columns: 2 RCM * (1 land + 1 ice + 1 upstream ice domain)
When querying using an ROI that covers multiple outlets, discharge is summed by domain. Therefore, even if 100s of outlets are within the ROI, either two columns, eight, eight, or fourteen columns are returned depending on the options.
The python API is similar to the command line interface, but rather than printing results to stdout, returns a GeoPandas GeoDataFrame of outlets, an xarray Dataset of discharge. The discharge is not summed by domain, but instead contains discharge for each outlet.
The simplest example is a point, in this case near the Watson River outlet. Because we select one point over land and do not request upstream outlets and basins, only one row should be returned.
from discharge import discharge
df = discharge(base="./freshwater", roi="-50.5,67.2", quiet=True).outlets()
The df variable is a Pandas GeoDataFrame.
It includes two geometry columns
outlet- A point for the location of the outlet (also available as the
xandycolumns) basin- A polygon describing this basin
Because the geometry columns do not display well in tabular form, we drop them.
df.drop(columns=["outlet","basin"])
| index | id | lon | lat | x | y | elev | domain | upstream | coast_id | coast_lon | coast_lat | coast_x | coast_y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 112448 | -51.2329 | 67.1555 | -272150 | -2491850 | 42 | land | False | -1 | nan | nan | -1 | -1 |
Here a polygon covers several land outlets near the end of a fjord, and several ice outlets of the nearby ice margin. In addition, we request all ice outlets upstream of all selected land basins. Results are shown in tabular form and written to geospatial file formats.
from discharge import discharge
df = discharge(base="./freshwater", roi="-51.50,66.93 -51.21,66.74 -49.44,66.91 -49.84,67.18", quiet=True, upstream=True).outlets()
View the first few rows, excluding the geometry columns
df.drop(columns=["outlet","basin"]).head()
| index | id | lon | lat | x | y | elev | domain | upstream | coast_id | coast_lon | coast_lat | coast_x | coast_y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 113526 | -50.713 | 67.0017 | -251250 | -2511450 | 20 | land | False | -1 | nan | nan | -1 | -1 |
| 1 | 113705 | -50.7346 | 66.9884 | -252350 | -2512850 | 7 | land | False | -1 | nan | nan | -1 | -1 |
| 2 | 113729 | -50.7771 | 66.9849 | -254250 | -2513050 | -1 | land | False | -1 | nan | nan | -1 | -1 |
| 3 | 113767 | -50.8634 | 66.9752 | -258150 | -2513750 | 14 | land | False | -1 | nan | nan | -1 | -1 |
| 4 | 113787 | -50.9575 | 66.9688 | -262350 | -2514050 | 12 | land | False | -1 | nan | nan | -1 | -1 |
View the last few rows:
Note that the domain and upstream columns can be used to subset the table.
df.drop(columns=["outlet","basin"]).tail()
| index | id | lon | lat | x | y | elev | domain | upstream | coast_id | coast_lon | coast_lat | coast_x | coast_y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 203 | 67070 | -49.5278 | 66.4399 | -204250 | -2579250 | 761 | ice | True | 114920 | -50.6517 | 66.8677 | -250050 | -2526750 |
| 204 | 67076 | -49.5386 | 66.4387 | -204750 | -2579350 | 758 | ice | True | 114920 | -50.6517 | 66.8677 | -250050 | -2526750 |
| 205 | 67140 | -49.5382 | 66.4254 | -204850 | -2580850 | 794 | ice | True | 114920 | -50.6517 | 66.8677 | -250050 | -2526750 |
| 206 | 67163 | -49.5436 | 66.419 | -205150 | -2581550 | 825 | ice | True | 114920 | -50.6517 | 66.8677 | -250050 | -2526750 |
| 207 | 67211 | -49.5344 | 66.406 | -204850 | -2583050 | 866 | ice | True | 114920 | -50.6517 | 66.8677 | -250050 | -2526750 |
Finally, write data to various file formats. GeoPandas DataFrames can only have one geometry, so we must select one and drop the other before writing the file.
df.drop(columns=["outlet","basin"]).to_csv("outlets.csv")
df.set_geometry("outlet").drop(columns="basin").to_file("outlets.gpkg", driver="GPKG")
df.set_geometry("basin").drop(columns="outlet").to_file("basins.gpkg", driver="GPKG")
The code here is the same as above from the “Outlets and basins” section, but we call discharge() rather than outlets().
The simplest example is a point, in this case near the Watson River outlet. Because we select one point over land and do not request upstream outlets and basins, only one row should be returned.
from discharge import discharge
ds = discharge(base="./freshwater", roi="-50.5,67.2").discharge()
Print the xarray Dataset:
print(ds)
<xarray.Dataset>
Dimensions: (land: 1, time: 22645)
Coordinates:
* time (time) datetime64[ns] 1958-01-01 1958-01-02 ... 2019-12-31
* land (land) uint64 112448
Data variables:
MAR_land (time, land) float64 nan nan nan ... 7.186e-07 3.928e-07
RACMO_land (time, land) float64 2.132 2.125 2.069 2.064 ... nan nan nan nan
Display the time series. Unlike the command line interface, here the outlets are not merged.
ds.sel(time=slice('2012-06-01','2012-06-10')).to_dataframe()
| MAR_land | RACMO_land | |
|---|---|---|
| (112448, Timestamp(‘2012-06-01 00:00:00’, freq=’D’)) | 0.0430252 | 0.382903 |
| (112448, Timestamp(‘2012-06-02 00:00:00’, freq=’D’)) | 5.47723e-05 | 0.0956719 |
| (112448, Timestamp(‘2012-06-03 00:00:00’, freq=’D’)) | 4.96042e-05 | 0.00978398 |
| (112448, Timestamp(‘2012-06-04 00:00:00’, freq=’D’)) | 9.40224e-06 | -0.00750081 |
| (112448, Timestamp(‘2012-06-05 00:00:00’, freq=’D’)) | 0.00821199 | 0.007498 |
| (112448, Timestamp(‘2012-06-06 00:00:00’, freq=’D’)) | 28.6019 | 0.607345 |
| (112448, Timestamp(‘2012-06-07 00:00:00’, freq=’D’)) | 0.333926 | 0.0569098 |
| (112448, Timestamp(‘2012-06-08 00:00:00’, freq=’D’)) | 0.489437 | 0.204384 |
| (112448, Timestamp(‘2012-06-09 00:00:00’, freq=’D’)) | 0.0388156 | 0.167325 |
| (112448, Timestamp(‘2012-06-10 00:00:00’, freq=’D’)) | 5.11108e-05 | 0.0114149 |
In order to merge the outlets, select all coordinates that are not time and merge them. Also, apply a rolling mean:
dims = [_ for _ in ds.dims.keys() if _ != 'time'] # get all dimensions except the time dimension
ds.sum(dim=dims)\
.rolling(time=7)\
.mean()\
.sel(time=slice('2012-06-01','2012-06-10'))\
.to_dataframe()
| time | MAR_land | RACMO_land |
|---|---|---|
| 2012-06-01 00:00:00 | 7.33495 | 3.46155 |
| 2012-06-02 00:00:00 | 6.63191 | 3.25688 |
| 2012-06-03 00:00:00 | 1.7918 | 1.09451 |
| 2012-06-04 00:00:00 | 0.0179128 | 0.604358 |
| 2012-06-05 00:00:00 | 0.0148878 | 0.281088 |
| 2012-06-06 00:00:00 | 4.09767 | 0.217489 |
| 2012-06-07 00:00:00 | 4.14103 | 0.164659 |
| 2012-06-08 00:00:00 | 4.20481 | 0.139156 |
| 2012-06-09 00:00:00 | 4.21034 | 0.149392 |
| 2012-06-10 00:00:00 | 4.21034 | 0.149625 |
Here a polygon covers several land outlets near the end of a fjord, and several ice outlets of the nearby ice margin. In addition, we request all ice outlets upstream of all selected land basins.
from discharge import discharge
ds = discharge(base="./freshwater", roi="-51.50,66.93 -51.21,66.74 -49.44,66.91 -49.84,67.18", quiet=True, upstream=True).discharge()
What are the dimensions (i.e. how many outlets in each domain?)
# print(ds.dims)
print(ds)
<xarray.Dataset>
Dimensions: (ice: 36, ice_upstream: 88, land: 84, time: 22645)
Coordinates:
* time (time) datetime64[ns] 1958-01-01 ... 2019-12-31
* land (land) uint64 113526 113705 113729 ... 115309 115334
* ice (ice) uint64 65558 65563 65579 ... 65741 65742 65786
* ice_upstream (ice_upstream) uint64 65540 65546 65548 ... 67163 67211
Data variables:
MAR_land (time, land) float64 nan nan nan ... 5.894e-09 3.539e-08
MAR_ice (time, ice) float64 nan nan nan ... 6.78e-07 0.0
RACMO_land (time, land) float64 1.726 0.3477 0.004691 ... nan nan
RACMO_ice (time, ice) float64 0.0 0.0 0.0 0.0 ... nan nan nan nan
MAR_ice_upstream (time, ice_upstream) float64 nan nan nan ... 0.0 0.0 0.0
RACMO_ice_upstream (time, ice_upstream) float64 0.0 0.0 0.0 ... nan nan nan
With these results:
- Sum all outlets within each domain
- Drop the land discharge and the upstream domains (keep only ice discharge explicitly within our ROI)
- Apply a 5-day rolling mean
- Plot 2012 discharge season
d = [_ for _ in ds.dims.keys() if _ != 'time'] # dims for summing (don't sum time dimension)
v = [_ for _ in ds.data_vars if ('land' in _) | ('_u' in _)] # vars containing '_u'
r = ds.sum(dim=d)\
.drop_vars(v)\
.rolling(time=5).mean()
import matplotlib.pyplot as plt
plt.style.use('seaborn')
for d in r.data_vars: r[d].sel(time=slice('2012-04-01','2012-11-15')).plot(drawstyle='steps', label=d)
_ = legend()
plt.savefig("./fig/api_example.png", bbox_inches='tight')
