API Reference#

DataPipes#

Iterable-style DataPipes for geospatial raster ๐ŸŒˆ and vector ๐Ÿš data.

Datashader#

DataPipes for datashader.

zen3geo.datapipes.DatashaderRasterizer#

alias of zen3geo.datapipes.datashader.DatashaderRasterizerIterDataPipe

class zen3geo.datapipes.datashader.DatashaderRasterizerIterDataPipe(source_datapipe, vector_datapipe, agg=None, **kwargs)[source]#

Takes vector geopandas.GeoSeries or geopandas.GeoDataFrame geometries and rasterizes them using datashader.Canvas to yield an xarray.DataArray raster with the input geometries aggregated into a fixed-sized grid (functional name: rasterize_with_datashader).

Parameters
Yields

raster (xarray.DataArray) โ€“ An xarray.DataArray object containing the raster data. This raster will have a rioxarray.rioxarray.XRasterBase.crs property and a proper affine transform viewable with rioxarray.rioxarray.XRasterBase.transform().

Raises
  • ModuleNotFoundError โ€“ If spatialpandas is not installed. Please install it (e.g. via pip install spatialpandas) before using this class.

  • ValueError โ€“ If either the length of the vector_datapipe is not 1, or if the length of the vector_datapipe is not equal to the length of the source_datapipe. I.e. the ratio of vector:canvas must be 1:N or be exactly N:N.

  • AttributeError โ€“ If either the canvas in source_datapipe or vector geometry in vector_datapipe is missing a .crs attribute. Please set the coordinate reference system (e.g. using canvas.crs = 'EPSG:4326' for the datashader.Canvas input or vector = vector.set_crs(epsg=4326) for the geopandas.GeoSeries or geopandas.GeoDataFrame input) before passing them into the datapipe.

  • NotImplementedError โ€“ If the input vector geometry type to vector_datapipe is not supported, typically when a shapely.geometry.GeometryCollection is used. Supported types include Point, LineString, and Polygon, plus their multipart equivalents MultiPoint, MultiLineString, and MultiPolygon.

Return type

None

Example

>>> import pytest
>>> datashader = pytest.importorskip("datashader")
>>> pyogrio = pytest.importorskip("pyogrio")
>>> spatialpandas = pytest.importorskip("spatialpandas")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import DatashaderRasterizer
...
>>> # Read in a vector point data source
>>> geodataframe = pyogrio.read_dataframe(
...     "https://github.com/geopandas/pyogrio/raw/v0.4.0/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg",
...     read_geometry=True,
... )
>>> assert geodataframe.crs == "EPSG:4326"  # longitude/latitude coords
>>> dp_vector = IterableWrapper(iterable=[geodataframe])
...
>>> # Setup blank raster canvas where we will burn vector geometries onto
>>> canvas = datashader.Canvas(
...     plot_width=5,
...     plot_height=6,
...     x_range=(160000.0, 620000.0),
...     y_range=(0.0, 450000.0),
... )
>>> canvas.crs = "EPSG:32631"  # UTM Zone 31N, North of Gulf of Guinea
>>> dp_canvas = IterableWrapper(iterable=[canvas])
...
>>> # Rasterize vector point geometries onto blank canvas
>>> dp_datashader = dp_canvas.rasterize_with_datashader(
...     vector_datapipe=dp_vector
... )
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_datashader)
>>> dataarray = next(it)
>>> dataarray
<xarray.DataArray (y: 6, x: 5)>
array([[0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0]], dtype=uint32)
Coordinates:
  * x            (x) float64 2.094e+05 3.083e+05 4.072e+05 5.06e+05 6.049e+05
  * y            (y) float64 4.157e+05 3.47e+05 2.783e+05 ... 1.41e+05 7.237e+04
    spatial_ref  int64 0
...
>>> dataarray.rio.crs
CRS.from_epsg(32631)
>>> dataarray.rio.transform()
Affine(98871.00388807665, 0.0, 160000.0,
       0.0, -68660.4193667199, 450000.0)
zen3geo.datapipes.XarrayCanvas#

alias of zen3geo.datapipes.datashader.XarrayCanvasIterDataPipe

class zen3geo.datapipes.datashader.XarrayCanvasIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe[Union[xarray.core.dataarray.DataArray, xarray.core.dataset.Dataset]]

Takes an xarray.DataArray or xarray.Dataset and creates a blank datashader.Canvas based on the spatial extent and coordinates of the input (functional name: canvas_from_xarray).

Parameters
Yields

canvas (datashader.Canvas) โ€“ A datashader.Canvas object representing the same spatial extent and x/y coordinates of the input raster grid. This canvas will also have a .crs attribute that captures the original Coordinate Reference System from the input xarray objectโ€™s rioxarray.rioxarray.XRasterBase.crs property.

Raises

ModuleNotFoundError โ€“ If datashader is not installed. Follow install instructions for datashader before using this class.

Return type

None

Example

>>> import pytest
>>> import numpy as np
>>> import xarray as xr
>>> datashader = pytest.importorskip("datashader")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import XarrayCanvas
...
>>> # Create blank canvas from xarray.DataArray using DataPipe
>>> y = np.arange(0, -3, step=-1)
>>> x = np.arange(0, 6)
>>> dataarray: xr.DataArray = xr.DataArray(
...     data=np.zeros(shape=(1, 3, 6)),
...     coords=dict(band=[1], y=y, x=x),
... )
>>> dataarray = dataarray.rio.set_spatial_dims(x_dim="x", y_dim="y")
>>> dp = IterableWrapper(iterable=[dataarray])
>>> dp_canvas = dp.canvas_from_xarray()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_canvas)
>>> canvas = next(it)
>>> print(canvas.raster(source=dataarray))
<xarray.DataArray (band: 1, y: 3, x: 6)>
array([[[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]]])
Coordinates:
  * x        (x) int64 0 1 2 3 4 5
  * y        (y) int64 0 -1 -2
  * band     (band) int64 1
...

Pyogrio#

DataPipes for pyogrio.

zen3geo.datapipes.PyogrioReader#

alias of zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe

class zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe[torch.utils.data.datapipes.utils.common.StreamWrapper]

Takes vector files (e.g. FlatGeoBuf, GeoPackage, GeoJSON) from local disk or URLs (as long as they can be read by pyogrio) and yields geopandas.GeoDataFrame objects (functional name: read_from_pyogrio).

Based on https://github.com/pytorch/data/blob/v0.4.0/torchdata/datapipes/iter/load/iopath.py#L42-L97

Parameters
  • source_datapipe (IterDataPipe[str]) โ€“ A DataPipe that contains filepaths or URL links to vector files such as FlatGeoBuf, GeoPackage, GeoJSON, etc.

  • kwargs (Optional) โ€“ Extra keyword arguments to pass to pyogrio.read_dataframe().

Yields

stream_obj (geopandas.GeoDataFrame) โ€“ A geopandas.GeoDataFrame object containing the vector data.

Raises

ModuleNotFoundError โ€“ If pyogrio is not installed. See install instructions for pyogrio, and ensure that geopandas is installed too (e.g. via pip install pyogrio[geopandas]) before using this class.

Return type

None

Example

>>> import pytest
>>> pyogrio = pytest.importorskip("pyogrio")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import PyogrioReader
...
>>> # Read in GeoPackage data using DataPipe
>>> file_url: str = "https://github.com/geopandas/pyogrio/raw/v0.4.0/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg"
>>> dp = IterableWrapper(iterable=[file_url])
>>> dp_pyogrio = dp.read_from_pyogrio()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_pyogrio)
>>> geodataframe = next(it)
>>> geodataframe
StreamWrapper<   col_bool  col_int8  ...  col_float64                 geometry
0       1.0       1.0  ...          1.5  POINT (0.00000 0.00000)
1       0.0       2.0  ...          2.5  POINT (1.00000 1.00000)
2       1.0       3.0  ...          3.5  POINT (2.00000 2.00000)
3       NaN       NaN  ...          NaN  POINT (4.00000 4.00000)

[4 rows x 12 columns]>

Rioxarray#

DataPipes for rioxarray.

zen3geo.datapipes.RioXarrayReader#

alias of zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe

class zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe[torch.utils.data.datapipes.utils.common.StreamWrapper]

Takes raster files (e.g. GeoTIFFs) from local disk or URLs (as long as they can be read by rioxarray and/or rasterio) and yields xarray.DataArray objects (functional name: read_from_rioxarray).

Based on https://github.com/pytorch/data/blob/v0.4.0/torchdata/datapipes/iter/load/online.py#L55-L96

Parameters
  • source_datapipe (IterDataPipe[str]) โ€“ A DataPipe that contains filepaths or URL links to raster files such as GeoTIFFs.

  • kwargs (Optional) โ€“ Extra keyword arguments to pass to rioxarray.open_rasterio() and/or rasterio.open().

Yields

stream_obj (xarray.DataArray) โ€“ An xarray.DataArray object containing the raster data.

Return type

None

Example

>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import RioXarrayReader
...
>>> # Read in GeoTIFF data using DataPipe
>>> file_url: str = "https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif"
>>> dp = IterableWrapper(iterable=[file_url])
>>> dp_rioxarray = dp.read_from_rioxarray()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_rioxarray)
>>> dataarray = next(it)
>>> dataarray.encoding["source"]
'https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif'
>>> dataarray
StreamWrapper<<xarray.DataArray (band: 1, y: 960, x: 1920)>
[1843200 values with dtype=uint8]
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -179.9 -179.7 -179.5 -179.3 ... 179.5 179.7 179.9
  * y            (y) float64 89.91 89.72 89.53 89.34 ... -89.53 -89.72 -89.91
    spatial_ref  int64 0
...

Xbatcher#

DataPipes for xbatcher.

zen3geo.datapipes.XbatcherSlicer#

alias of zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe

class zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe(source_datapipe, input_dims, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe[Union[xarray.core.dataarray.DataArray, xarray.core.dataset.Dataset]]

Takes an xarray.DataArray or xarray.Dataset and creates a sliced window view (also known as a chip or tile) of the n-dimensional array (functional name: slice_with_xbatcher).

Parameters
  • source_datapipe (IterDataPipe[xarray.DataArray]) โ€“ A DataPipe that contains xarray.DataArray or xarray.Dataset objects.

  • input_dims (dict) โ€“ A dictionary specifying the size of the inputs in each dimension to slice along, e.g. {'lon': 64, 'lat': 64}. These are the dimensions the machine learning library will see. All other dimensions will be stacked into one dimension called batch.

  • kwargs (Optional) โ€“ Extra keyword arguments to pass to xbatcher.BatchGenerator().

Yields

chip (xarray.DataArray) โ€“ An xarray.DataArray or xarray.Dataset object containing the sliced raster data, with the size/shape defined by the input_dims parameter.

Raises

ModuleNotFoundError โ€“ If xbatcher is not installed. Follow install instructions for xbatcher before using this class.

Return type

None

Example

>>> import pytest
>>> import numpy as np
>>> import xarray as xr
>>> xbatcher = pytest.importorskip("xbatcher")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import XbatcherSlicer
...
>>> # Sliced window view of xarray.DataArray using DataPipe
>>> dataarray: xr.DataArray = xr.DataArray(
...     data=np.ones(shape=(3, 128, 128)),
...     name="foo",
...     dims=["band", "y", "x"]
... )
>>> dp = IterableWrapper(iterable=[dataarray])
>>> dp_xbatcher = dp.slice_with_xbatcher(input_dims={"y": 64, "x": 64})
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_xbatcher)
>>> dataarray_chip = next(it)
>>> dataarray_chip
<xarray.Dataset>
Dimensions:  (band: 3, y: 64, x: 64)
Dimensions without coordinates: band, y, x
Data variables:
    foo      (band, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0