Multi-resolution#
On top of a hundred foot pole you linger
Clinging to the first mark of the scale
How do you proceed higher?
It will take more than a leap of faith
Earth Observation ๐ฐ๏ธ and climate projection ๐ก๏ธ data can be captured at different levels of detail. In this lesson, weโll work with a multitude of spatial resolutions ๐, learning to respect the ground sampling distance or native resolution ๐ฌ of the physical variable being measured, while ๐ชถ minimizing memory usage. By the end of the lesson, you should be able to:
Find ๐ low and high spatial resolution climate datasets and load them from Zarr stores
Stack ๐ฅ and subset time-series datasets with different spatial resolutions stored in a hierarchical
datatree.DataTree
structureSlice ๐ช the multi-resolution dataset along the time-axis into monthly bins
๐ Links:
๐ Getting started#
These are the tools ๐ ๏ธ youโll need.
import matplotlib.pyplot as plt
import pandas as pd
import torchdata.dataloader2
import xarray as xr
import xpystac
import zen3geo
from datatree import DataTree
0๏ธโฃ Find climate model datasets ๐ชธ#
The two datasets weโll be working with are ๐ gridded climate projections, one that is in its original low ๐ spatial resolution, and another one of a higher ๐ spatial resolution. Specifically, weโll be looking at the maximum temperature ๐ก๏ธ (tasmax) variable from one of the Coupled Model Intercomparison Project Phase 6 (CMIP6) global coupled ocean-atmosphere general circulation model (GCM) ๐จ outputs that is of low-resolution (67.5 arcminute), and a super-resolution product from DeepSD ๐ค that is of a higher resolution (15 arcminute).
Note
The following tutorial will mostly use the term super-resolution ๐ญ from Computer Vision instead of downscaling โฌ. Itโs just that the term downscaling โฌ (going from low to high resolution) can get confused with downsampling ๐ (going from high to low resolution), whereas super-resolution ๐ญ is unambiguously about going from low ๐ to high ๐ resolution.
๐ References:
lowres_raw = "https://cpdataeuwest.blob.core.windows.net/cp-cmip/cmip6/ScenarioMIP/MRI/MRI-ESM2-0/ssp585/r1i1p1f1/Amon/tasmax/gn/v20191108"
highres_deepsd = "https://cpdataeuwest.blob.core.windows.net/cp-cmip/version1/data/DeepSD/ScenarioMIP.MRI.MRI-ESM2-0.ssp585.r1i1p1f1.month.DeepSD.tasmax.zarr"
This is how the projected maximum temperature ๐ฅต for August 2089 looks like over South Asia ๐ชท for a low-resolution ๐ Global Climate Model (left) and a high-resolution ๐ downscaled product (right).
Show code cell source
# Zarr datasets from https://github.com/carbonplan/research/blob/d05d148fd716ba6304e3833d765069dd890eaf4a/articles/cmip6-downscaling-explainer/components/downscaled-data.js#L97-L122
ds_gcm = xr.open_dataset(
filename_or_obj="https://cmip6downscaling.blob.core.windows.net/vis/article/fig1/regions/india/gcm-tasmax.zarr"
)
ds_gcm -= 273.15 # convert from Kelvin to Celsius
ds_downscaled = xr.open_dataset(
filename_or_obj="https://cmip6downscaling.blob.core.windows.net/vis/article/fig1/regions/india/downscaled-tasmax.zarr"
)
ds_downscaled -= 273.15 # convert from Kelvin to Celsius
# Plot projected maximum temperature over South Asia from GCM and GARD-MV
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 3), sharey=True)
img1 = ds_gcm.tasmax.plot.imshow(
ax=ax[0], cmap="inferno", vmin=16, vmax=48, add_colorbar=False
)
ax[0].set_title("Global Climate Model (67.5 arcminute)")
img2 = ds_downscaled.tasmax.plot.imshow(
ax=ax[1], cmap="inferno", vmin=16, vmax=48, add_colorbar=False
)
ax[1].set_title("Downscaled result (15 arcminute)")
cbar = fig.colorbar(mappable=img1, ax=ax.ravel().tolist(), extend="both")
cbar.set_label(label="Daily Max Near-Surface Air\nTemperature in Aug 2089 (ยฐC)")
plt.show()

Load Zarr stores ๐ฆ#
The Zarr stores ๐ง can be loaded into an
xarray.Dataset
via zen3geo.datapipes.XpySTACAssetReader
(functional name: read_from_xpystac
) with the engine="zarr"
keyword
argument.
dp_lowres = torchdata.datapipes.iter.IterableWrapper(iterable=[lowres_raw])
dp_highres = torchdata.datapipes.iter.IterableWrapper(iterable=[highres_deepsd])
dp_lowres_dataset = dp_lowres.read_from_xpystac(engine="zarr", chunks="auto")
dp_highres_dataset = dp_highres.read_from_xpystac(engine="zarr", chunks="auto")
Inspect the climate datasets ๐ฅ#
Letโs now preview ๐ the low-resolution ๐ and high-resolution ๐ temperature datasets.
it = iter(dp_lowres_dataset)
ds_lowres = next(it)
ds_lowres
<xarray.Dataset> Dimensions: (lat: 160, bnds: 2, lon: 320, time: 1032) Coordinates: height float64 ... * lat (lat) float64 -89.14 -88.03 -86.91 -85.79 ... 86.91 88.03 89.14 lat_bnds (lat, bnds) float64 dask.array<chunksize=(160, 2), meta=np.ndarray> * lon (lon) float64 0.0 1.125 2.25 3.375 ... 355.5 356.6 357.8 358.9 lon_bnds (lon, bnds) float64 dask.array<chunksize=(320, 2), meta=np.ndarray> * time (time) datetime64[ns] 2015-01-16T12:00:00 ... 2100-12-16T12:00:00 time_bnds (time, bnds) datetime64[ns] dask.array<chunksize=(1032, 2), meta=np.ndarray> Dimensions without coordinates: bnds Data variables: tasmax (time, lat, lon) float32 dask.array<chunksize=(516, 160, 320), meta=np.ndarray> Attributes: (12/47) Conventions: CF-1.7 CMIP-6.2 activity_id: ScenarioMIP branch_method: standard branch_time_in_child: 60265.0 branch_time_in_parent: 60265.0 cmor_version: 3.4.0 ... ... table_info: Creation Date:(14 December 2018) MD5:b2d32d1a0d9b... title: MRI-ESM2-0 output prepared for CMIP6 tracking_id: hdl:21.14100/421f03b2-8cb7-4473-9d03-0f772c8969c4 variable_id: tasmax variant_label: r1i1p1f1 version_id: v20191108
it = iter(dp_highres_dataset)
ds_highres = next(it)
ds_highres
<xarray.Dataset> Dimensions: (lat: 720, lon: 1440, time: 1020) Coordinates: * lat (lat) float64 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88 * lon (lon) float64 -179.9 -179.6 -179.4 -179.1 ... 179.4 179.6 179.9 * time (time) datetime64[ns] 2015-01-01 2015-02-01 ... 2099-12-01 Data variables: tasmax (time, lat, lon) float32 dask.array<chunksize=(1020, 144, 144), meta=np.ndarray> Attributes: (12/17) Conventions: CF-1.8 activity_id: ScenarioMIP cmip6_downscaling_contact: hello@carbonplan.org cmip6_downscaling_explainer: https://carbonplan.org/research/cmip6-do... cmip6_downscaling_institution: CarbonPlan cmip6_downscaling_license: CC-BY-4.0 ... ... institution_id: MRI member_id: r1i1p1f1 references: Eyring, V., Bony, S., Meehl, G. A., Seni... source_id: MRI-ESM2-0 timescale: day variable_id: tasmax
Notice that the low-resolution ๐ dataset has lon/lat pixels of shape (320, 160), whereas the high-resolution ๐ dataset is of shape (1440, 720). So there has been a 4.5x increase ๐ in spatial resolution going from the raw GCM ๐ grid to the super-resolution ๐ญ DeepSD grid.
Shift from 0-360 to -180-180 ๐#
A sharp eye ๐๏ธ would have noticed that the longitudinal range of the
low-resolution ๐
and high-resolution ๐ dataset are offset โ๏ธ by 180ยฐ, going
from 0ยฐ to 360ยฐ and -180ยฐ to +180ยฐ respectively. Letโs shift the coordinates ๐
of the low-resolution grid ๐ from 0-360 to -180-180 using a custom
torchdata.datapipes.iter.Mapper
(functional name: map
) function.
๐ References:
def shift_longitude_360_to_180(ds: xr.Dataset) -> xr.Dataset:
ds = ds.assign_coords(lon=(((ds.lon + 180) % 360) - 180))
ds = ds.roll(lon=int(len(ds.lon) / 2), roll_coords=True)
return ds
dp_lowres_dataset_180 = dp_lowres_dataset.map(fn=shift_longitude_360_to_180)
dp_lowres_dataset_180
MapperIterDataPipe
Double check that the low-resolution ๐ gridโs longitude coordinates ๐ข are now in the -180ยฐ to +180ยฐ range.
it = iter(dp_lowres_dataset_180)
ds_lowres_180 = next(it)
ds_lowres_180
<xarray.Dataset> Dimensions: (lat: 160, bnds: 2, lon: 320, time: 1032) Coordinates: height float64 ... * lat (lat) float64 -89.14 -88.03 -86.91 -85.79 ... 86.91 88.03 89.14 lat_bnds (lat, bnds) float64 dask.array<chunksize=(160, 2), meta=np.ndarray> * lon (lon) float64 -180.0 -178.9 -177.8 -176.6 ... 176.6 177.8 178.9 lon_bnds (lon, bnds) float64 dask.array<chunksize=(320, 2), meta=np.ndarray> * time (time) datetime64[ns] 2015-01-16T12:00:00 ... 2100-12-16T12:00:00 time_bnds (time, bnds) datetime64[ns] dask.array<chunksize=(1032, 2), meta=np.ndarray> Dimensions without coordinates: bnds Data variables: tasmax (time, lat, lon) float32 dask.array<chunksize=(516, 160, 320), meta=np.ndarray> Attributes: (12/47) Conventions: CF-1.7 CMIP-6.2 activity_id: ScenarioMIP branch_method: standard branch_time_in_child: 60265.0 branch_time_in_parent: 60265.0 cmor_version: 3.4.0 ... ... table_info: Creation Date:(14 December 2018) MD5:b2d32d1a0d9b... title: MRI-ESM2-0 output prepared for CMIP6 tracking_id: hdl:21.14100/421f03b2-8cb7-4473-9d03-0f772c8969c4 variable_id: tasmax variant_label: r1i1p1f1 version_id: v20191108
Spatiotemporal stack and subset ๐ฑ#
Following on from Stacking layers where multiple ๐ฅ layers with the same
spatial resolution were stacked together into an xarray.DataArray
object, this section will teach ๐งโ๐ซ you about stacking datasets with
different spatial resolutions ๐ถ into a datatree.DataTree
object that has a nested/hierarchical structure. That
datatree.DataTree
can then be subsetted ๐ฅฎ to the desired spatial
and temporal extent in one go ๐.
Stack multi-resolution datasets ๐#
First, weโll need to combine ๐ชข the low-resolution GCM and high-resolution
DeepSD xarray.Dataset
objects into a tuple ๐ต using
torchdata.datapipes.iter.Zipper
(functional name: zip).
dp_lowres_highres = dp_lowres_dataset_180.zip(dp_highres_dataset)
dp_lowres_highres
ZipperIterDataPipe
Next, use torchdata.datapipes.iter.Collator
(functional name:
collate
) to convert ๐คธ the tuple of xarray.Dataset
objects into
an datatree.DataTree
๐, similar to what was done in
Stacking layers. Note that weโll only take the โtasmaxโ โจ๏ธ (Daily Maximum
Near-Surface Air Temperature) xarray.DataArray
variable from each
of the xarray.Dataset
objects.
def multires_collate_fn(lowres_and_highres: tuple) -> DataTree:
"""
Combine a pair of xarray.Dataset (lowres, highres) inputs into a
datatree.DataTree with groups named 'lowres' and 'highres'.
"""
# Turn 2 xr.Dataset objects into 1 xr.DataTree with multiple groups
ds_lowres, ds_highres = lowres_and_highres
# Create DataTree with lowres and highres groups
datatree: DataTree = DataTree.from_dict(
d={"lowres": ds_lowres.tasmax, "highres": ds_highres.tasmax}
)
return datatree
dp_datatree = dp_lowres_highres.collate(collate_fn=multires_collate_fn)
dp_datatree
CollatorIterDataPipe
See the nested ๐ช structure of the datatree.DataTree
. The
low-resolution ๐
GCM and high-resolution ๐ DeepSD outputs have been placed in
separate groups ๐.
it = iter(dp_datatree)
datatree = next(it)
datatree
<xarray.DatasetView> Dimensions: () Data variables: *empty*
Subset multi-resolution layers ๐ฅฎ#
The climate model outputs above are a global ๐บ๏ธ one covering a timespan from
January 2015 to December 2100 ๐
. If youโre only interested in a particular
region ๐ or timespan โ, then the datatree.DataTree
will need to
be trimmed ๐ down. Letโs use datatree.DataTree.sel()
to subset the
multi-resolution data to just the Philippines ๐ต๐ญ for the period 2015 to 2030.
def spatiotemporal_subset(dt: DataTree) -> DataTree:
dt_subset = dt.sel(
lon=slice(116.4375, 126.5625),
lat=slice(5.607445, 19.065325),
time=slice("2015-01-01", "2030-12-31"),
)
return dt_subset
dp_datatree_subset = dp_datatree.map(fn=spatiotemporal_subset)
dp_datatree_subset
MapperIterDataPipe
Inspect the subsetted climate dataset ๐ต๏ธ
it = iter(dp_datatree_subset)
datatree_subset = next(it)
datatree_subset
<xarray.DatasetView> Dimensions: () Data variables: *empty*
Letโs plot the projected temperature ๐ก๏ธ for Dec 2030 over the Philippine Archipelago to ensure things look ok.
ds_lowres = (
datatree_subset["lowres/tasmax"]
.sel(time=slice("2030-12-01", "2030-12-31"))
.squeeze()
)
ds_lowres -= 273.15 # convert from Kelvin to Celsius
ds_highres = (
datatree_subset["highres/tasmax"]
.sel(time=slice("2030-12-01", "2030-12-31"))
.squeeze()
)
ds_highres -= 273.15 # convert from Kelvin to Celsius
# Plot projected maximum temperature over the Philippines from GCM and DeepSD
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 8), sharey=True)
img1 = ds_lowres.plot.imshow(
ax=ax[0], cmap="inferno", vmin=22, vmax=33, add_colorbar=False
)
ax[0].set_title("Global Climate Model (67.5 arcminute)")
img2 = ds_highres.plot.imshow(
ax=ax[1], cmap="inferno", vmin=22, vmax=33, add_colorbar=False
)
ax[1].set_title("DeepSD output (15 arcminute)")
cbar = fig.colorbar(mappable=img1, ax=ax.ravel().tolist(), extend="max")
cbar.set_label(label="Daily Max Near-Surface Air\nTemperature in Dec 2030 (ยฐC)")
plt.show()

Important
When slicing โ๏ธ different spatial resolution grids, put some ๐ง thought into the process. Do some ๐งฎ math to ensure the coordinates of the bounding box (min/max lon/lat) cut through the pixels exactly at the ๐ pixel boundaries whenever possible.
If your multi-resolution ๐ถ layers have spatial resolutions that are round multiples โ๏ธ of each other (e.g. 10m, 20m, 60m), it is advisable to align ๐ฏ the pixel corners, such that the high-resolution ๐ pixels fit within the low-resolution ๐ pixels (e.g. one 20m pixel should contain four 10m pixels). This can be done by resampling ๐๏ธ or interpolating the grid (typically the higher resolution one) onto a new reference frame ๐ผ๏ธ.
For datasets โน๏ธ that come from different sources and need to be reprojected ๐, you can do the reprojection and pixel alignment in a single step ๐. Be extra careful about resampling, as certain datasets (e.g. complex SAR ๐ก data that has been collected off-nadir) may require special ๐ท treatment.
Time to slice again โ#
So, we now have a datatree.DataTree
with two ๐ groups/nodes called
โlowresโ and โhighresโ that have tensor shapes (lat: 12, lon: 9, time: 192)
and (lat: 54, lon: 40, time: 192)
respectively. While the time dimension โฑ๏ธ
is of the same length, the timestamp values between the low-resolution ๐
GCM
and high-resolution ๐ DeepSD output are different. Specifically, the GCM
output dates at the middle of the month ๐
, while the DeepSD output has dates
at the start of the month. Letโs see how this can be handled ๐ซ.
Slicing by month ๐๏ธ#
Assuming that the roughly two week offset โ๏ธ between the monthly resolution GCM
and DeepSD time-series is negligible ๐ค, we can split the dataset on the time
dimension at the start/end of each month ๐. Letโs write a function and use
torchdata.datapipes.iter.FlatMapper
(functional name: flatmap
)
for this.
def split_on_month(dt: DataTree, node:str = "highres/tasmax") -> DataTree:
"""
Return a slice of data for every month in a datatree.DataTree time-series.
"""
for t in dt[node].time.to_pandas():
dt_slice = dt.sel(
time=slice(t + pd.offsets.MonthBegin(0), t + pd.offsets.MonthEnd(0))
)
yield dt_slice.squeeze(dim="time")
dp_datatree_timeslices = dp_datatree_subset.flatmap(fn=split_on_month)
dp_datatree_timeslices
FlatMapperIterDataPipe
The datapipe should yield a datatree.DataTree
with just one
monthโs ๐
worth of temperature ๐ก๏ธ data per iteration.
it = iter(dp_datatree_timeslices)
datatree_timeslice = next(it)
datatree_timeslice
<xarray.DatasetView> Dimensions: () Data variables: *empty*
See also
Those interested in slicing multi-resolution arrays spatially can keep an eye
on the ๐ง ongoing implementation at
xarray-contrib/xbatcher#171 and the discussion at
xarray-contrib/xbatcher#93. This ๐งโ๐ซ tutorial will be
updated โป๏ธ once thereโs a clean way to generate multi-resolution
datatree.DataTree
slices in a newer release of
xbatcher ๐
Visualize the final DataPipe graph โ๏ธ.
torchdata.datapipes.utils.to_graph(dp=dp_datatree_timeslices)
Into a DataLoader ๐๏ธ#
Ready to populate the torchdata.dataloader2.DataLoader2
๐ญ!
dataloader = torchdata.dataloader2.DataLoader2(datapipe=dp_datatree_timeslices)
for i, batch in enumerate(dataloader):
ds_lowres = batch["lowres/tasmax"]
ds_highres = batch["highres/tasmax"]
print(f"Batch {i} - lowres: {ds_lowres.shape}, highres: {ds_highres.shape}")
if i > 8:
break
Batch 0 - lowres: (12, 9), highres: (54, 40)
Batch 1 - lowres: (12, 9), highres: (54, 40)
Batch 2 - lowres: (12, 9), highres: (54, 40)
Batch 3 - lowres: (12, 9), highres: (54, 40)
Batch 4 - lowres: (12, 9), highres: (54, 40)
Batch 5 - lowres: (12, 9), highres: (54, 40)
Batch 6 - lowres: (12, 9), highres: (54, 40)
Batch 7 - lowres: (12, 9), highres: (54, 40)
Batch 8 - lowres: (12, 9), highres: (54, 40)
Batch 9 - lowres: (12, 9), highres: (54, 40)
Do super-resolution, but make no illusion ๐ง
See also
Credits to CarbonPlan for making the code and data for their CMIP6 downscaling work openly available. Find out more at https://docs.carbonplan.org/cmip6-downscaling!