API Reference

API Reference#

DataPipes#

Iterable-style DataPipes for geospatial raster 🌈 and vector 🚏 data.

Datashader#

DataPipes for datashader.

zen3geo.datapipes.DatashaderRasterizer#: alias of DatashaderRasterizerIterDataPipe

class zen3geo.datapipes.datashader.DatashaderRasterizerIterDataPipe(source_datapipe, vector_datapipe, agg=None, **kwargs)[source]#

Takes vector geopandas.GeoSeries or geopandas.GeoDataFrame geometries and rasterizes them using datashader.Canvas to yield an xarray.DataArray raster with the input geometries aggregated into a fixed-sized grid (functional name: rasterize_with_datashader).

Parameters:

source_datapipe (IterDataPipe[datashader.Canvas]) – A DataPipe that contains datashader.Canvas objects with a .crs attribute. This will be the template defining the output raster’s spatial extent and x/y range.
vector_datapipe (IterDataPipe[geopandas.GeoDataFrame]) – A DataPipe that contains geopandas.GeoSeries or geopandas.GeoDataFrame vector geometries with a .crs property.
agg (Optional[datashader.reductions.Reduction]) –
Reduction operation to compute. Default depends on the input vector type:
- For points, default is datashader.reductions.count
- For lines, default is datashader.reductions.any
- For polygons, default is datashader.reductions.any
For more information, refer to the section on Aggregation under datashader’s Pipeline docs.
kwargs (Optional) – Extra keyword arguments to pass to the datashader.Canvas class’s aggregation methods such as datashader.Canvas.points.

Yields:

raster (xarray.DataArray) – An xarray.DataArray object containing the raster data. This raster will have a rioxarray.rioxarray.XRasterBase.crs property and a proper affine transform viewable with rioxarray.rioxarray.XRasterBase.transform().

Raises:

ModuleNotFoundError – If spatialpandas is not installed. Please install it (e.g. via pip install spatialpandas) before using this class.
ValueError – If either the length of the vector_datapipe is not 1, or if the length of the vector_datapipe is not equal to the length of the source_datapipe. I.e. the ratio of vector:canvas must be 1:N or be exactly N:N.
AttributeError – If either the canvas in source_datapipe or vector geometry in vector_datapipe is missing a .crs attribute. Please set the coordinate reference system (e.g. using canvas.crs = 'OGC:CRS84' for the datashader.Canvas input or vector = vector.set_crs(crs='OGC:CRS84') for the geopandas.GeoSeries or geopandas.GeoDataFrame input) before passing them into the datapipe.
NotImplementedError – If the input vector geometry type to vector_datapipe is not supported, typically when a shapely.geometry.GeometryCollection is used. Supported types include Point, LineString, and Polygon, plus their multipart equivalents MultiPoint, MultiLineString, and MultiPolygon.

Example

>>> import pytest
>>> datashader = pytest.importorskip("datashader")
>>> pyogrio = pytest.importorskip("pyogrio")
>>> spatialpandas = pytest.importorskip("spatialpandas")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import DatashaderRasterizer
...
>>> # Read in a vector point data source
>>> geodataframe = pyogrio.read_dataframe(
...     "https://github.com/geopandas/pyogrio/raw/v0.4.0/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg",
...     read_geometry=True,
... )
>>> assert geodataframe.crs == "EPSG:4326"  # latitude/longitude coords
>>> dp_vector = IterableWrapper(iterable=[geodataframe])
...
>>> # Setup blank raster canvas where we will burn vector geometries onto
>>> canvas = datashader.Canvas(
...     plot_width=5,
...     plot_height=6,
...     x_range=(160000.0, 620000.0),
...     y_range=(0.0, 450000.0),
... )
>>> canvas.crs = "EPSG:32631"  # UTM Zone 31N, North of Gulf of Guinea
>>> dp_canvas = IterableWrapper(iterable=[canvas])
...
>>> # Rasterize vector point geometries onto blank canvas
>>> dp_datashader = dp_canvas.rasterize_with_datashader(
...     vector_datapipe=dp_vector
... )
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_datashader)
>>> dataarray = next(it)
>>> dataarray
<xarray.DataArray (y: 6, x: 5)>
array([[0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0]], dtype=uint32)
Coordinates:
  * x            (x) float64 2.094e+05 3.083e+05 4.072e+05 5.06e+05 6.049e+05
  * y            (y) float64 4.157e+05 3.47e+05 2.783e+05 ... 1.41e+05 7.237e+04
    spatial_ref  int64 0
...
>>> dataarray.rio.crs
CRS.from_epsg(32631)
>>> dataarray.rio.transform()
Affine(98871.00388807665, 0.0, 160000.0,
       0.0, -68660.4193667199, 450000.0)

zen3geo.datapipes.XarrayCanvas#: alias of XarrayCanvasIterDataPipe

class zen3geo.datapipes.datashader.XarrayCanvasIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: IterDataPipe[Union[DataArray, Dataset]]

Takes an xarray.DataArray or xarray.Dataset and creates a blank datashader.Canvas based on the spatial extent and coordinates of the input (functional name: canvas_from_xarray).

Parameters:

source_datapipe (IterDataPipe[xarrray.DataArray]) – A DataPipe that contains xarray.DataArray or xarray.Dataset objects. These data objects need to have both a .rio.x_dim and .rio.y_dim attribute, which is present if the original dataset was opened using rioxarray.open_rasterio(), or by setting it manually using rioxarray.rioxarray.XRasterBase.set_spatial_dims().
kwargs (Optional) – Extra keyword arguments to pass to datashader.Canvas.

Yields:

canvas (datashader.Canvas) – A datashader.Canvas object representing the same spatial extent and x/y coordinates of the input raster grid. This canvas will also have a .crs attribute that captures the original Coordinate Reference System from the input xarray object’s rioxarray.rioxarray.XRasterBase.crs property.

Raises:

ModuleNotFoundError – If datashader is not installed. Follow install instructions for datashader before using this class.

Example

>>> import pytest
>>> import numpy as np
>>> import xarray as xr
>>> datashader = pytest.importorskip("datashader")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import XarrayCanvas
...
>>> # Create blank canvas from xarray.DataArray using DataPipe
>>> y = np.arange(0, -3, step=-1)
>>> x = np.arange(0, 6)
>>> dataarray: xr.DataArray = xr.DataArray(
...     data=np.zeros(shape=(1, 3, 6)),
...     coords=dict(band=[1], y=y, x=x),
... )
>>> dataarray = dataarray.rio.set_spatial_dims(x_dim="x", y_dim="y")
>>> dp = IterableWrapper(iterable=[dataarray])
>>> dp_canvas = dp.canvas_from_xarray()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_canvas)
>>> canvas = next(it)
>>> print(canvas.raster(source=dataarray))
<xarray.DataArray (band: 1, y: 3, x: 6)>
array([[[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]]])
Coordinates:
  * x        (x) int64 0 1 2 3 4 5
  * y        (y) int64 0 -1 -2
  * band     (band) int64 1
...

Geopandas#

DataPipes for geopandas.

zen3geo.datapipes.GeoPandasRectangleClipper#: alias of GeoPandasRectangleClipperIterDataPipe

class zen3geo.datapipes.geopandas.GeoPandasRectangleClipperIterDataPipe(source_datapipe, mask_datapipe, **kwargs)[source]#

Bases: IterDataPipe

Takes vector geopandas.GeoSeries or geopandas.GeoDataFrame geometries and clips them with the rectangular extent of an xarray.DataArray or xarray.Dataset grid to yield tuples of spatially subsetted geopandas.GeoSeries or geopandas.GeoDataFrame vectors and the correponding xarray.DataArray or xarray.Dataset raster object used as the clip mask (functional name: clip_vector_with_rectangle).

Uses the rectangular clip algorithm of geopandas.clip(), with the bounding box rectangle (minx, miny, maxx, maxy) derived from input raster mask’s bounding box extent.

Note

If the input vector’s coordinate reference system (crs) is different to the raster mask’s coordinate reference system (rio.crs), the vector will be reprojected using geopandas.GeoDataFrame.to_crs() to match the raster’s coordinate reference system.

Parameters:

source_datapipe (IterDataPipe[geopandas.GeoDataFrame]) – A DataPipe that contains geopandas.GeoSeries or geopandas.GeoDataFrame vector geometries with a .crs property.
mask_datapipe (IterDataPipe[xarray.DataArray]) – A DataPipe that contains xarray.DataArray or xarray.Dataset objects with a .rio.crs property and .rio.bounds method.
kwargs (Optional) – Extra keyword arguments to pass to geopandas.clip().

Yields:

paired_obj (Tuple[geopandas.GeoDataFrame, xarray.DataArray]) – A tuple consisting of the spatially subsetted geopandas.GeoSeries or geopandas.GeoDataFrame vector, and the corresponding xarray.DataArray or xarray.Dataset raster used as the clip mask.

Raises:

ModuleNotFoundError – If geopandas is not installed. See install instructions for geopandas (e.g. via pip install geopandas) before using this class.
NotImplementedError – If the length of the vector source_datapipe is not 1. Currently, all of the vector geometries have to be merged into a single geopandas.GeoSeries or geopandas.GeoDataFrame. Refer to the section on Appending under geopandas’ Merging data docs.

Example

>>> import pytest
>>> import rioxarray
>>> gpd = pytest.importorskip("geopandas")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import GeoPandasRectangleClipper
...
>>> # Read in a vector polygon data source
>>> geodataframe = gpd.read_file(
...     filename="https://github.com/geopandas/geopandas/raw/v0.11.1/geopandas/tests/data/overlay/polys/df1.geojson",
... )
>>> assert geodataframe.crs == "EPSG:4326"  # latitude/longitude coords
>>> dp_vector = IterableWrapper(iterable=[geodataframe])
...
>>> # Get list of raster grids to cut up the vector polygon later
>>> dataarray = rioxarray.open_rasterio(
...     filename="https://github.com/rasterio/rasterio/raw/1.3.2/tests/data/world.byte.tif"
... )
>>> assert dataarray.rio.crs == "EPSG:4326"  # latitude/longitude coords
>>> dp_raster = IterableWrapper(
...     iterable=[
...         dataarray.sel(x=slice(0, 2)),  # longitude 0 to 2 degrees
...         dataarray.sel(x=slice(2, 4)),  # longitude 2 to 4 degrees
...     ]
... )
...
>>> # Clip vector point geometries based on raster masks
>>> dp_clipped = dp_vector.clip_vector_with_rectangle(
...     mask_datapipe=dp_raster
... )
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_clipped)
>>> geodataframe0, raster0 = next(it)
>>> geodataframe0
   col1                                           geometry
0     1  POLYGON ((0.00000 0.00000, 0.00000 2.00000, 2....
>>> raster0
<xarray.DataArray (band: 1, y: 1200, x: 16)>
array([[[0, 0, ..., 0, 0],
        [0, 0, ..., 0, 0],
        ...,
        [1, 1, ..., 1, 1],
        [1, 1, ..., 1, 1]]], dtype=uint8)
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 0.0625 0.1875 0.3125 0.4375 ... 1.688 1.812 1.938
  * y            (y) float64 74.94 74.81 74.69 74.56 ... -74.69 -74.81 -74.94
    spatial_ref  int64 0
...
>>> geodataframe1, raster1 = next(it)
>>> geodataframe1
   col1                                           geometry
1     2  POLYGON ((2.00000 2.00000, 2.00000 4.00000, 4....

Pyogrio#

DataPipes for pyogrio.

zen3geo.datapipes.PyogrioReader#: alias of PyogrioReaderIterDataPipe

class zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: IterDataPipe[StreamWrapper]

Takes vector files (e.g. FlatGeoBuf, GeoPackage, GeoJSON) from local disk or URLs (as long as they can be read by pyogrio) and yields geopandas.GeoDataFrame objects (functional name: read_from_pyogrio).

Based on pytorch/data

Parameters:

source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to vector files such as FlatGeoBuf, GeoPackage, GeoJSON, etc.
kwargs (Optional) – Extra keyword arguments to pass to pyogrio.read_dataframe().

Yields:

stream_obj (geopandas.GeoDataFrame) – A geopandas.GeoDataFrame object containing the vector data.

Raises:

ModuleNotFoundError – If pyogrio is not installed. See install instructions for pyogrio, and ensure that geopandas is installed too (e.g. via pip install pyogrio[geopandas]) before using this class.

Example

>>> import pytest
>>> pyogrio = pytest.importorskip("pyogrio")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import PyogrioReader
...
>>> # Read in GeoPackage data using DataPipe
>>> file_url: str = "https://github.com/geopandas/pyogrio/raw/v0.4.0/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg"
>>> dp = IterableWrapper(iterable=[file_url])
>>> dp_pyogrio = dp.read_from_pyogrio()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_pyogrio)
>>> geodataframe = next(it)
>>> geodataframe
StreamWrapper<   col_bool  col_int8  ...  col_float64                 geometry
0       1.0       1.0  ...          1.5  POINT (0.00000 0.00000)
1       0.0       2.0  ...          2.5  POINT (1.00000 1.00000)
2       1.0       3.0  ...          3.5  POINT (2.00000 2.00000)
3       NaN       NaN  ...          NaN  POINT (4.00000 4.00000)

[4 rows x 12 columns]>

PySTAC#

DataPipes for pystac.

zen3geo.datapipes.PySTACItemReader#: alias of PySTACItemReaderIterDataPipe

class zen3geo.datapipes.pystac.PySTACItemReaderIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: IterDataPipe

Takes files from local disk or URLs (as long as they can be read by pystac) and yields pystac.Item objects (functional name: read_to_pystac_item).

Parameters:

source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to STAC items.
kwargs (Optional) – Extra keyword arguments to pass to pystac.Item.from_file().

Yields:

stac_item (pystac.Item) – A pystac.Item object containing the specific pystac.STACObject implementation class represented in a JSON format.

Raises:

ModuleNotFoundError – If pystac is not installed. See install instructions for pystac, (e.g. via pip install pystac) before using this class.

Example

>>> import pytest
>>> pystac = pytest.importorskip("pystac")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import PySTACItemReader
...
>>> # Read in STAC Item using DataPipe
>>> item_url: str = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items/S2A_MSIL2A_20220115T032101_R118_T48NUG_20220115T170435"
>>> dp = IterableWrapper(iterable=[item_url])
>>> dp_pystac = dp.read_to_pystac_item()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_pystac)
>>> stac_item = next(it)
>>> stac_item.bbox
[103.20205689, 0.81602476, 104.18934086, 1.8096362]
>>> stac_item.properties  
{'datetime': '2022-01-15T03:21:01.024000Z',
 'platform': 'Sentinel-2A',
 'proj:epsg': 32648,
 'instruments': ['msi'],
 's2:mgrs_tile': '48NUG',
 'constellation': 'Sentinel 2',
 's2:granule_id': 'S2A_OPER_MSI_L2A_TL_ESRI_20220115T170436_A034292_T48NUG_N03.00',
 'eo:cloud_cover': 17.352597,
 's2:datatake_id': 'GS2A_20220115T032101_034292_N03.00',
 's2:product_uri': 'S2A_MSIL2A_20220115T032101_N0300_R118_T48NUG_20220115T170435.SAFE',
 's2:datastrip_id': 'S2A_OPER_MSI_L2A_DS_ESRI_20220115T170436_S20220115T033502_N03.00',
 's2:product_type': 'S2MSI2A',
 'sat:orbit_state': 'descending',
...

PySTAC Client#

DataPipes for pystac-client.

zen3geo.datapipes.PySTACAPISearcher#: alias of PySTACAPISearcherIterDataPipe

class zen3geo.datapipes.pystac_client.PySTACAPISearcherIterDataPipe(source_datapipe, catalog_url, **kwargs)[source]#

Takes dictionaries containing a STAC API query (as long as the parameters are understood by pystac_client.Client.search()) and yields pystac_client.ItemSearch objects (functional name: search_for_pystac_item).

Parameters:

source_datapipe (IterDataPipe[dict]) –
A DataPipe that contains STAC API query parameters in the form of a Python dictionary to pass to pystac_client.Client.search(). For example:
- bbox - A list, tuple, or iterator representing a bounding box of 2D or 3D coordinates. Results will be filtered to only those intersecting the bounding box.
- datetime - Either a single datetime or datetime range used to filter results. You may express a single datetime using a datetime.datetime instance, a RFC 3339-compliant timestamp, or a simple date string.
- collections - List of one or more Collection IDs or pystac.Collection instances. Only Items in one of the provided Collections will be searched.
catalog_url (str) – The URL of a STAC Catalog.
kwargs (Optional) –
Extra keyword arguments to pass to pystac_client.Client.open(). For example:
- headers - A dictionary of additional headers to use in all requests made to any part of this Catalog/API.
- parameters - Optional dictionary of query string parameters to include in all requests.
- modifier - A callable that modifies the children collection and items returned by this Client. This can be useful for injecting authentication parameters into child assets to access data from non-public sources.

Yields:

item_search (pystac_client.ItemSearch) – A pystac_client.ItemSearch object instance that represents a deferred query to a STAC search endpoint as described in the STAC API - Item Search spec.

Raises:

ModuleNotFoundError – If pystac_client is not installed. See install instructions for pystac-client, (e.g. via pip install pystac-client) before using this class.

Example

>>> import pytest
>>> pystac_client = pytest.importorskip("pystac_client")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import PySTACAPISearcher
...
>>> # Peform STAC API query using DataPipe
>>> query = dict(
...     bbox=[174.5, -41.37, 174.9, -41.19],  # xmin, ymin, xmax, ymax
...     datetime=["2012-02-20T00:00:00Z", "2022-12-22T00:00:00Z"],
...     collections=["cop-dem-glo-30"],
... )
>>> dp = IterableWrapper(iterable=[query])
>>> dp_pystac_client = dp.search_for_pystac_item(
...     catalog_url="https://planetarycomputer.microsoft.com/api/stac/v1",
...     # modifier=planetary_computer.sign_inplace,
... )
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_pystac_client)
>>> stac_item_search = next(it)
>>> stac_items = list(stac_item_search.items())
>>> stac_items
[<Item id=Copernicus_DSM_COG_10_S42_00_E174_00_DEM>]
>>> stac_items[0].properties  
{'gsd': 30,
 'datetime': '2021-04-22T00:00:00Z',
 'platform': 'TanDEM-X',
 'proj:epsg': 4326,
 'proj:shape': [3600, 3600],
 'proj:transform': [0.0002777777777777778,
  0.0,
  173.9998611111111,
  0.0,
  -0.0002777777777777778,
  -40.99986111111111]}

zen3geo.datapipes.PySTACAPIItemLister#: alias of PySTACAPIItemListerIterDataPipe

class zen3geo.datapipes.pystac_client.PySTACAPIItemListerIterDataPipe(source_datapipe)[source]#

Bases: IterDataPipe

Lists the pystac.Item objects that match the provided STAC API search parameters (functional name: list_pystac_items_by_search).

Parameters:

source_datapipe (IterDataPipe[pystac_client.ItemSearch]) –

A DataPipe that contains pystac_client.ItemSearch object instances that represents a deferred query to a STAC search endpoint as described in the STAC API - Item Search spec.

Yields:

stac_item (pystac.Item) – A pystac.Item object containing the specific pystac.STACObject implementation class represented in a JSON format.

Raises:

ModuleNotFoundError – If pystac_client is not installed. See install instructions for pystac-client, (e.g. via pip install pystac-client) before using this class.

Example

>>> import pytest
>>> pystac_client = pytest.importorskip("pystac_client")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import PySTACAPIItemLister
...
>>> # List STAC Items from a STAC API query
>>> catalog = pystac_client.Client.open(
...     url="https://explorer.digitalearth.africa/stac/"
... )
>>> search = catalog.search(
...     bbox=[57.2, -20.6, 57.9, -19.9],  # xmin, ymin, xmax, ymax
...     datetime=["2023-01-01T00:00:00Z", "2023-01-31T00:00:00Z"],
...     collections=["s2_l2a"],
... )
>>> dp = IterableWrapper(iterable=[search])
>>> dp_pystac_item_list = dp.list_pystac_items_by_search()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_pystac_item_list)
>>> stac_item = next(it)
>>> stac_item
<Item id=ec16dbf6-9729-5a8f-9d72-5e83a8b9f30d>
>>> stac_item.properties  
{'title': 'S2B_MSIL2A_20230103T062449_N0509_R091_T40KED_20230103T075000',
 'gsd': 10,
 'proj:epsg': 32740,
 'platform': 'sentinel-2b',
 'view:off_nadir': 0,
 'instruments': ['msi'],
 'eo:cloud_cover': 0.02,
 'odc:file_format': 'GeoTIFF',
 'odc:region_code': '40KED',
 'constellation': 'sentinel-2',
 'sentinel:sequence': '0',
 'sentinel:utm_zone': 40,
 'sentinel:product_id': 'S2B_MSIL2A_20230103T062449_N0509_R091_T40KED_20230103T075000',
 'sentinel:grid_square': 'ED',
 'sentinel:data_coverage': 28.61,
 'sentinel:latitude_band': 'K',
 'created': '2023-01-03T06:24:53Z',
 'sentinel:valid_cloud_cover': True,
 'sentinel:boa_offset_applied': True,
 'sentinel:processing_baseline': '05.09',
 'proj:shape': [10980, 10980],
 'proj:transform': [10.0, 0.0, 499980.0, 0.0, -10.0, 7900000.0, 0.0, 0.0, 1.0],
 'cubedash:region_code': '40KED',
 'datetime': '2023-01-03T06:24:53Z'}

Rioxarray#

DataPipes for rioxarray.

zen3geo.datapipes.RioXarrayReader#: alias of RioXarrayReaderIterDataPipe

class zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: IterDataPipe[StreamWrapper]

Takes raster files (e.g. GeoTIFFs) from local disk or URLs (as long as they can be read by rioxarray and/or rasterio) and yields xarray.DataArray objects (functional name: read_from_rioxarray).

Based on pytorch/data

Parameters:

source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to raster files such as GeoTIFFs.
kwargs (Optional) – Extra keyword arguments to pass to rioxarray.open_rasterio() and/or rasterio.open().

Yields:

stream_obj (xarray.DataArray) – An xarray.DataArray object containing the raster data.

Example

>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import RioXarrayReader
...
>>> # Read in GeoTIFF data using DataPipe
>>> file_url: str = "https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif"
>>> dp = IterableWrapper(iterable=[file_url])
>>> dp_rioxarray = dp.read_from_rioxarray()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_rioxarray)
>>> dataarray = next(it)
>>> dataarray.encoding["source"]
'https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif'
>>> dataarray
StreamWrapper<<xarray.DataArray (band: 1, y: 960, x: 1920)>
[1843200 values with dtype=uint8]
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -179.9 -179.7 -179.5 -179.3 ... 179.5 179.7 179.9
  * y            (y) float64 89.91 89.72 89.53 89.34 ... -89.53 -89.72 -89.91
    spatial_ref  int64 0
...

Stackstac#

DataPipes for stackstac.

zen3geo.datapipes.StackSTACMosaicker#: alias of StackSTACMosaickerIterDataPipe

class zen3geo.datapipes.stackstac.StackSTACMosaickerIterDataPipe(source_datapipe, **kwargs)[source]#

Takes xarray.DataArray objects, flattens a dimension by picking the first valid pixel, to yield mosaicked xarray.DataArray objects (functional name: mosaic_dataarray).

Parameters:

source_datapipe (IterDataPipe[xarray.DataArray]) – A DataPipe that contains xarray.DataArray objects, with e.g. dimensions (“time”, “band”, “y”, “x”).
kwargs (Optional) – Extra keyword arguments to pass to stackstac.mosaic().

Yields:

dataarray (xarray.DataArray) – An xarray.DataArray that has been mosaicked with e.g. dimensions (“band”, “y”, “x”).

Raises:

ModuleNotFoundError – If stackstac is not installed. See install instructions for stackstac, (e.g. via pip install stackstac) before using this class.

Example

>>> import pytest
>>> import xarray as xr
>>> pystac = pytest.importorskip("pystac")
>>> stackstac = pytest.importorskip("stackstac")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import StackSTACMosaicker
...
>>> # Get list of ALOS DEM tiles to mosaic together later
>>> item_urls = [
...     "https://planetarycomputer.microsoft.com/api/stac/v1/collections/alos-dem/items/ALPSMLC30_N022E113_DSM",
...     "https://planetarycomputer.microsoft.com/api/stac/v1/collections/alos-dem/items/ALPSMLC30_N022E114_DSM",
... ]
>>> stac_items = [pystac.Item.from_file(href=url) for url in item_urls]
>>> dataarray = stackstac.stack(items=stac_items)
>>> assert dataarray.sizes == {'time': 2, 'band': 1, 'y': 3600, 'x': 7200}
...
>>> # Mosaic different tiles in an xarray.DataArray using DataPipe
>>> dp = IterableWrapper(iterable=[dataarray])
>>> dp_mosaic = dp.mosaic_dataarray()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_mosaic)
>>> dataarray = next(it)
>>> print(dataarray.sizes)
Frozen({'band': 1, 'y': 3600, 'x': 7200})
>>> print(dataarray.coords)
Coordinates:
  * band         (band) <U4 'data'
  * x            (x) float64 113.0 113.0 113.0 113.0 ... 115.0 115.0 115.0 115.0
  * y            (y) float64 23.0 23.0 23.0 23.0 23.0 ... 22.0 22.0 22.0 22.0
...
>>> print(dataarray.attrs["spec"])
RasterSpec(epsg=4326, bounds=(113.0, 22.0, 115.0, 23.0), resolutions_xy=(0.0002777777777777778, 0.0002777777777777778))

zen3geo.datapipes.StackSTACStacker#: alias of StackSTACStackerIterDataPipe

class zen3geo.datapipes.stackstac.StackSTACStackerIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: IterDataPipe[DataArray]

Takes pystac.Item objects, reprojects them to the same grid and stacks them along time, to yield xarray.DataArray objects (functional name: stack_stac_items).

Parameters:

source_datapipe (IterDataPipe[pystac.Item]) – A DataPipe that contains pystac.Item objects.
kwargs (Optional) – Extra keyword arguments to pass to stackstac.stack().

Yields:

datacube (xarray.DataArray) – An xarray.DataArray backed by a dask.array.Array containing the time-series datacube. The dimensions will be (“time”, “band”, “y”, “x”).

Raises:

ModuleNotFoundError – If stackstac is not installed. See install instructions for stackstac, (e.g. via pip install stackstac) before using this class.

Example

>>> import pytest
>>> pystac = pytest.importorskip("pystac")
>>> stacstac = pytest.importorskip("stackstac")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import StackSTACStacker
...
>>> # Stack different bands in a STAC Item using DataPipe
>>> item_url: str = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-grd/items/S1A_IW_GRDH_1SDV_20220914T093226_20220914T093252_044999_056053"
>>> stac_item = pystac.Item.from_file(href=item_url)
>>> dp = IterableWrapper(iterable=[stac_item])
>>> dp_stackstac = dp.stack_stac_items(
...     assets=["vh", "vv"], epsg=32652, resolution=10
... )
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_stackstac)
>>> dataarray = next(it)
>>> print(dataarray.sizes)
Frozen({'time': 1, 'band': 2, 'y': 20686, 'x': 28043})
>>> print(dataarray.coords)
Coordinates:
  * time                                   (time) datetime64[ns] 2022-09-14T0...
    id                                     (time) <U62 'S1A_IW_GRDH_1SDV_2022...
  * band                                   (band) <U2 'vh' 'vv'
  * x                                      (x) float64 1.354e+05 ... 4.158e+05
  * y                                      (y) float64 4.305e+06 ... 4.098e+06
...
>>> print(dataarray.attrs["spec"])
RasterSpec(epsg=32652, bounds=(135370, 4098080, 415800, 4304940), resolutions_xy=(10, 10))

Xbatcher#

DataPipes for xbatcher.

zen3geo.datapipes.XbatcherSlicer#: alias of XbatcherSlicerIterDataPipe

class zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe(source_datapipe, input_dims, **kwargs)[source]#

Bases: IterDataPipe[Union[DataArray, Dataset]]

Takes an xarray.DataArray or xarray.Dataset and creates a sliced window view (also known as a chip or tile) of the n-dimensional array (functional name: slice_with_xbatcher).

Parameters:

source_datapipe (IterDataPipe[xarray.DataArray]) – A DataPipe that contains xarray.DataArray or xarray.Dataset objects.
input_dims (dict) – A dictionary specifying the size of the inputs in each dimension to slice along, e.g. {'lon': 64, 'lat': 64}. These are the dimensions the machine learning library will see. All other dimensions will be stacked into one dimension called batch.
kwargs (Optional) – Extra keyword arguments to pass to xbatcher.BatchGenerator.

Yields:

chip (xarray.DataArray) – An xarray.DataArray or xarray.Dataset object containing the sliced raster data, with the size/shape defined by the input_dims parameter.

Raises:

ModuleNotFoundError – If xbatcher is not installed. Follow install instructions for xbatcher before using this class.

Example

>>> import pytest
>>> import numpy as np
>>> import xarray as xr
>>> xbatcher = pytest.importorskip("xbatcher")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import XbatcherSlicer
...
>>> # Sliced window view of xarray.DataArray using DataPipe
>>> dataarray: xr.DataArray = xr.DataArray(
...     data=np.ones(shape=(3, 64, 64)),
...     name="foo",
...     dims=["band", "y", "x"]
... )
>>> dp = IterableWrapper(iterable=[dataarray])
>>> dp_xbatcher = dp.slice_with_xbatcher(input_dims={"y": 2, "x": 2})
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_xbatcher)
>>> dataarray_chip = next(it)
>>> dataarray_chip
<xarray.DataArray 'foo' (band: 3, y: 2, x: 2)>
array([[[1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.]]])
Dimensions without coordinates: band, y, x

XpySTAC#

DataPipes for xpystac.

zen3geo.datapipes.XpySTACAssetReader#: alias of XpySTACAssetReaderIterDataPipe

class zen3geo.datapipes.xpystac.XpySTACAssetReaderIterDataPipe(source_datapipe, engine='stac', **kwargs)[source]#

Bases: IterDataPipe[StreamWrapper]

Takes a pystac.Asset object containing n-dimensional data (e.g. Zarr, NetCDF, Cloud-Optimized GeoTIFF, etc) from local disk or URLs (as long as they can be read by xpystac) and yields xarray.Dataset objects (functional name: read_from_xpystac).

Based on pytorch/data

Parameters:

source_datapipe (IterDataPipe[pystac.Asset]) – A DataPipe that contains pystac.Asset objects to n-dimensional files such as Zarr, NetCDF, Cloud-Optimized GeoTIFF, etc.
engine (str or xarray.backends.BackendEntrypoint) – Engine to use when reading files. If not provided, the default engine will be the “stac” backend from xpystac. Alternatively, set engine=None to let xarray choose the default engine based on available dependencies, with a preference for “netcdf4”. See also xarray.open_dataset() for details about other engine options.
kwargs (Optional) – Extra keyword arguments to pass to xarray.open_dataset().

Yields:

stream_obj (xarray.Dataset) – An xarray.Dataset object containing the n-dimensional data.

Raises:

ModuleNotFoundError – If xpystac is not installed. See install instructions for xpystac, (e.g. via pip install xpystac) before using this class.

Example

>>> import pytest
>>> pystac = pytest.importorskip("pystac")
>>> xpystac = pytest.importorskip("xpystac")
>>> zarr = pytest.importorskip("zarr")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import XpySTACAssetReader
...
>>> # Read in STAC Asset using DataPipe
>>> collection_url: str = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/nasa-nex-gddp-cmip6"
>>> asset: pystac.Asset = pystac.Collection.from_file(href=collection_url).assets[
...     "ACCESS-CM2.historical"
... ]
>>> dp = IterableWrapper(iterable=[asset])
>>> dp_xpystac = dp.read_from_xpystac()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_xpystac)
>>> dataset = next(it)
>>> dataset.sizes
Frozen({'time': 23741, 'lat': 600, 'lon': 1440})
>>> print(dataset.data_vars)
Data variables:
    hurs     (time, lat, lon) float32 ...
    huss     (time, lat, lon) float32 ...
    pr       (time, lat, lon) float32 ...
    rlds     (time, lat, lon) float32 ...
    rsds     (time, lat, lon) float32 ...
    sfcWind  (time, lat, lon) float32 ...
    tas      (time, lat, lon) float32 ...
    tasmax   (time, lat, lon) float32 ...
    tasmin   (time, lat, lon) float32 ...
>>> dataset.attrs  
{'Conventions': 'CF-1.7',
 'activity': 'NEX-GDDP-CMIP6',
 'cmip6_institution_id': 'CSIRO-ARCCSS',
 'cmip6_license': 'CC-BY-SA 4.0',
 'cmip6_source_id': 'ACCESS-CM2',
 ...
 'history': '2021-10-04T13:59:21.654137+00:00: install global attributes',
 'institution': 'NASA Earth Exchange, NASA Ames Research Center, ...
 'product': 'output',
 'realm': 'atmos',
 'references': 'BCSD method: Thrasher et al., 2012, ...
 'resolution_id': '0.25 degree',
 'scenario': 'historical',
 'source': 'BCSD',
 'title': 'ACCESS-CM2, r1i1p1f1, historical, global downscaled CMIP6 ...
 'tracking_id': '16d27564-470f-41ea-8077-f4cc3efa5bfe',
 'variant_label': 'r1i1p1f1',
 'version': '1.0'}

API Reference

Contents

API Reference#

DataPipes#

Datashader#

Geopandas#

Pyogrio#

PySTAC#

PySTAC Client#

Rioxarray#

Stackstac#

Xbatcher#

XpySTAC#