API Reference#


Iterable-style DataPipes for geospatial raster 🌈 and vector 🚏 data.


DataPipes for datashader.


alias of zen3geo.datapipes.datashader.DatashaderRasterizerIterDataPipe

class zen3geo.datapipes.datashader.DatashaderRasterizerIterDataPipe(source_datapipe, vector_datapipe, agg=None, **kwargs)[source]#

Takes vector geopandas.GeoSeries or geopandas.GeoDataFrame geometries and rasterizes them using datashader.Canvas to yield an xarray.DataArray raster with the input geometries aggregated into a fixed-sized grid (functional name: rasterize_with_datashader).


raster (xarray.DataArray) – An xarray.DataArray object containing the raster data. This raster will have a rioxarray.rioxarray.XRasterBase.crs property and a proper affine transform viewable with rioxarray.rioxarray.XRasterBase.transform().

  • ModuleNotFoundError – If spatialpandas is not installed. Please install it (e.g. via pip install spatialpandas) before using this class.

  • ValueError – If either the length of the vector_datapipe is not 1, or if the length of the vector_datapipe is not equal to the length of the source_datapipe. I.e. the ratio of vector:canvas must be 1:N or be exactly N:N.

  • AttributeError – If either the canvas in source_datapipe or vector geometry in vector_datapipe is missing a .crs attribute. Please set the coordinate reference system (e.g. using canvas.crs = 'OGC:CRS84' for the datashader.Canvas input or vector = vector.set_crs(crs='OGC:CRS84') for the geopandas.GeoSeries or geopandas.GeoDataFrame input) before passing them into the datapipe.

  • NotImplementedError – If the input vector geometry type to vector_datapipe is not supported, typically when a shapely.geometry.GeometryCollection is used. Supported types include Point, LineString, and Polygon, plus their multipart equivalents MultiPoint, MultiLineString, and MultiPolygon.

Return type



>>> import pytest
>>> datashader = pytest.importorskip("datashader")
>>> pyogrio = pytest.importorskip("pyogrio")
>>> spatialpandas = pytest.importorskip("spatialpandas")
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import DatashaderRasterizer
>>> # Read in a vector point data source
>>> geodataframe = pyogrio.read_dataframe(
...     "https://github.com/geopandas/pyogrio/raw/v0.4.0/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg",
...     read_geometry=True,
... )
>>> assert geodataframe.crs == "EPSG:4326"  # latitude/longitude coords
>>> dp_vector = IterableWrapper(iterable=[geodataframe])
>>> # Setup blank raster canvas where we will burn vector geometries onto
>>> canvas = datashader.Canvas(
...     plot_width=5,
...     plot_height=6,
...     x_range=(160000.0, 620000.0),
...     y_range=(0.0, 450000.0),
... )
>>> canvas.crs = "EPSG:32631"  # UTM Zone 31N, North of Gulf of Guinea
>>> dp_canvas = IterableWrapper(iterable=[canvas])
>>> # Rasterize vector point geometries onto blank canvas
>>> dp_datashader = dp_canvas.rasterize_with_datashader(
...     vector_datapipe=dp_vector
... )
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_datashader)
>>> dataarray = next(it)
>>> dataarray
<xarray.DataArray (y: 6, x: 5)>
array([[0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0]], dtype=uint32)
  * x            (x) float64 2.094e+05 3.083e+05 4.072e+05 5.06e+05 6.049e+05
  * y            (y) float64 4.157e+05 3.47e+05 2.783e+05 ... 1.41e+05 7.237e+04
    spatial_ref  int64 0
>>> dataarray.rio.crs
>>> dataarray.rio.transform()
Affine(98871.00388807665, 0.0, 160000.0,
       0.0, -68660.4193667199, 450000.0)

alias of zen3geo.datapipes.datashader.XarrayCanvasIterDataPipe

class zen3geo.datapipes.datashader.XarrayCanvasIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe[Union[xarray.core.dataarray.DataArray, xarray.core.dataset.Dataset]]

Takes an xarray.DataArray or xarray.Dataset and creates a blank datashader.Canvas based on the spatial extent and coordinates of the input (functional name: canvas_from_xarray).


canvas (datashader.Canvas) – A datashader.Canvas object representing the same spatial extent and x/y coordinates of the input raster grid. This canvas will also have a .crs attribute that captures the original Coordinate Reference System from the input xarray object’s rioxarray.rioxarray.XRasterBase.crs property.


ModuleNotFoundError – If datashader is not installed. Follow install instructions for datashader before using this class.

Return type



>>> import pytest
>>> import numpy as np
>>> import xarray as xr
>>> datashader = pytest.importorskip("datashader")
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import XarrayCanvas
>>> # Create blank canvas from xarray.DataArray using DataPipe
>>> y = np.arange(0, -3, step=-1)
>>> x = np.arange(0, 6)
>>> dataarray: xr.DataArray = xr.DataArray(
...     data=np.zeros(shape=(1, 3, 6)),
...     coords=dict(band=[1], y=y, x=x),
... )
>>> dataarray = dataarray.rio.set_spatial_dims(x_dim="x", y_dim="y")
>>> dp = IterableWrapper(iterable=[dataarray])
>>> dp_canvas = dp.canvas_from_xarray()
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_canvas)
>>> canvas = next(it)
>>> print(canvas.raster(source=dataarray))
<xarray.DataArray (band: 1, y: 3, x: 6)>
array([[[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]]])
  * x        (x) int64 0 1 2 3 4 5
  * y        (y) int64 0 -1 -2
  * band     (band) int64 1


DataPipes for geopandas.


alias of zen3geo.datapipes.geopandas.GeoPandasRectangleClipperIterDataPipe

class zen3geo.datapipes.geopandas.GeoPandasRectangleClipperIterDataPipe(source_datapipe, mask_datapipe, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe

Takes vector geopandas.GeoSeries or geopandas.GeoDataFrame geometries and clips them with the rectangular extent of an xarray.DataArray or xarray.Dataset grid to yield tuples of spatially subsetted geopandas.GeoSeries or geopandas.GeoDataFrame vectors and the correponding xarray.DataArray or xarray.Dataset raster object used as the clip mask (functional name: clip_vector_with_rectangle).

Uses the rectangular clip algorithm of geopandas.clip(), with the bounding box rectangle (minx, miny, maxx, maxy) derived from input raster mask’s bounding box extent.


If the input vector’s coordinate reference system (crs) is different to the raster mask’s coordinate reference system (rio.crs), the vector will be reprojected using geopandas.GeoDataFrame.to_crs() to match the raster’s coordinate reference system.


paired_obj (Tuple[geopandas.GeoDataFrame, xarray.DataArray]) – A tuple consisting of the spatially subsetted geopandas.GeoSeries or geopandas.GeoDataFrame vector, and the corresponding xarray.DataArray or xarray.Dataset raster used as the clip mask.

Return type



>>> import pytest
>>> import rioxarray
>>> gpd = pytest.importorskip("geopandas")
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import GeoPandasRectangleClipper
>>> # Read in a vector polygon data source
>>> geodataframe = gpd.read_file(
...     filename="https://github.com/geopandas/geopandas/raw/v0.11.1/geopandas/tests/data/overlay/polys/df1.geojson",
... )
>>> assert geodataframe.crs == "EPSG:4326"  # latitude/longitude coords
>>> dp_vector = IterableWrapper(iterable=[geodataframe])
>>> # Get list of raster grids to cut up the vector polygon later
>>> dataarray = rioxarray.open_rasterio(
...     filename="https://github.com/rasterio/rasterio/raw/1.3.2/tests/data/world.byte.tif"
... )
>>> assert dataarray.rio.crs == "EPSG:4326"  # latitude/longitude coords
>>> dp_raster = IterableWrapper(
...     iterable=[
...         dataarray.sel(x=slice(0, 2)),  # longitude 0 to 2 degrees
...         dataarray.sel(x=slice(2, 4)),  # longitude 2 to 4 degrees
...     ]
... )
>>> # Clip vector point geometries based on raster masks
>>> dp_clipped = dp_vector.clip_vector_with_rectangle(
...     mask_datapipe=dp_raster
... )
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_clipped)
>>> geodataframe0, raster0 = next(it)
>>> geodataframe0
   col1                                           geometry
0     1  POLYGON ((0.00000 0.00000, 0.00000 2.00000, 2....
>>> raster0
<xarray.DataArray (band: 1, y: 1200, x: 16)>
array([[[0, 0, ..., 0, 0],
        [0, 0, ..., 0, 0],
        [1, 1, ..., 1, 1],
        [1, 1, ..., 1, 1]]], dtype=uint8)
  * band         (band) int64 1
  * x            (x) float64 0.0625 0.1875 0.3125 0.4375 ... 1.688 1.812 1.938
  * y            (y) float64 74.94 74.81 74.69 74.56 ... -74.69 -74.81 -74.94
    spatial_ref  int64 0
>>> geodataframe1, raster1 = next(it)
>>> geodataframe1
   col1                                           geometry
1     2  POLYGON ((2.00000 2.00000, 2.00000 4.00000, 4....


DataPipes for pyogrio.


alias of zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe

class zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe[torch.utils.data.datapipes.utils.common.StreamWrapper]

Takes vector files (e.g. FlatGeoBuf, GeoPackage, GeoJSON) from local disk or URLs (as long as they can be read by pyogrio) and yields geopandas.GeoDataFrame objects (functional name: read_from_pyogrio).

Based on https://github.com/pytorch/data/blob/v0.4.0/torchdata/datapipes/iter/load/iopath.py#L42-L97

  • source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to vector files such as FlatGeoBuf, GeoPackage, GeoJSON, etc.

  • kwargs (Optional) – Extra keyword arguments to pass to pyogrio.read_dataframe().


stream_obj (geopandas.GeoDataFrame) – A geopandas.GeoDataFrame object containing the vector data.


ModuleNotFoundError – If pyogrio is not installed. See install instructions for pyogrio, and ensure that geopandas is installed too (e.g. via pip install pyogrio[geopandas]) before using this class.

Return type



>>> import pytest
>>> pyogrio = pytest.importorskip("pyogrio")
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import PyogrioReader
>>> # Read in GeoPackage data using DataPipe
>>> file_url: str = "https://github.com/geopandas/pyogrio/raw/v0.4.0/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg"
>>> dp = IterableWrapper(iterable=[file_url])
>>> dp_pyogrio = dp.read_from_pyogrio()
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_pyogrio)
>>> geodataframe = next(it)
>>> geodataframe
StreamWrapper<   col_bool  col_int8  ...  col_float64                 geometry
0       1.0       1.0  ...          1.5  POINT (0.00000 0.00000)
1       0.0       2.0  ...          2.5  POINT (1.00000 1.00000)
2       1.0       3.0  ...          3.5  POINT (2.00000 2.00000)
3       NaN       NaN  ...          NaN  POINT (4.00000 4.00000)

[4 rows x 12 columns]>


DataPipes for rioxarray.


alias of zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe

class zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe[torch.utils.data.datapipes.utils.common.StreamWrapper]

Takes raster files (e.g. GeoTIFFs) from local disk or URLs (as long as they can be read by rioxarray and/or rasterio) and yields xarray.DataArray objects (functional name: read_from_rioxarray).

Based on https://github.com/pytorch/data/blob/v0.4.0/torchdata/datapipes/iter/load/online.py#L55-L96

  • source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to raster files such as GeoTIFFs.

  • kwargs (Optional) – Extra keyword arguments to pass to rioxarray.open_rasterio() and/or rasterio.open().


stream_obj (xarray.DataArray) – An xarray.DataArray object containing the raster data.

Return type



>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import RioXarrayReader
>>> # Read in GeoTIFF data using DataPipe
>>> file_url: str = "https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif"
>>> dp = IterableWrapper(iterable=[file_url])
>>> dp_rioxarray = dp.read_from_rioxarray()
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_rioxarray)
>>> dataarray = next(it)
>>> dataarray.encoding["source"]
>>> dataarray
StreamWrapper<<xarray.DataArray (band: 1, y: 960, x: 1920)>
[1843200 values with dtype=uint8]
  * band         (band) int64 1
  * x            (x) float64 -179.9 -179.7 -179.5 -179.3 ... 179.5 179.7 179.9
  * y            (y) float64 89.91 89.72 89.53 89.34 ... -89.53 -89.72 -89.91
    spatial_ref  int64 0


DataPipes for xbatcher.


alias of zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe

class zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe(source_datapipe, input_dims, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe[Union[xarray.core.dataarray.DataArray, xarray.core.dataset.Dataset]]

Takes an xarray.DataArray or xarray.Dataset and creates a sliced window view (also known as a chip or tile) of the n-dimensional array (functional name: slice_with_xbatcher).

  • source_datapipe (IterDataPipe[xarray.DataArray]) – A DataPipe that contains xarray.DataArray or xarray.Dataset objects.

  • input_dims (dict) – A dictionary specifying the size of the inputs in each dimension to slice along, e.g. {'lon': 64, 'lat': 64}. These are the dimensions the machine learning library will see. All other dimensions will be stacked into one dimension called batch.

  • kwargs (Optional) – Extra keyword arguments to pass to xbatcher.BatchGenerator().


chip (xarray.DataArray) – An xarray.DataArray or xarray.Dataset object containing the sliced raster data, with the size/shape defined by the input_dims parameter.


ModuleNotFoundError – If xbatcher is not installed. Follow install instructions for xbatcher before using this class.

Return type



>>> import pytest
>>> import numpy as np
>>> import xarray as xr
>>> xbatcher = pytest.importorskip("xbatcher")
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import XbatcherSlicer
>>> # Sliced window view of xarray.DataArray using DataPipe
>>> dataarray: xr.DataArray = xr.DataArray(
...     data=np.ones(shape=(3, 128, 128)),
...     name="foo",
...     dims=["band", "y", "x"]
... )
>>> dp = IterableWrapper(iterable=[dataarray])
>>> dp_xbatcher = dp.slice_with_xbatcher(input_dims={"y": 64, "x": 64})
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_xbatcher)
>>> dataarray_chip = next(it)
>>> dataarray_chip
Dimensions:  (band: 3, y: 64, x: 64)
Dimensions without coordinates: band, y, x
Data variables:
    foo      (band, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0