API Reference#

DataPipes#

Iterable-style DataPipes for geospatial raster 🌈 and vector 🚏 data.

Rioxarray#

DataPipes for rioxarray.

zen3geo.datapipes.RioXarrayReader#

alias of zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe

class zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe[torch.utils.data.datapipes.utils.common.StreamWrapper]

Takes raster files (e.g. GeoTIFFs) from local disk or URLs (as long as they can be read by rioxarray and/or rasterio) and yields xarray.DataArray objects (functional name: read_from_rioxarray).

Based on https://github.com/pytorch/data/blob/v0.4.0/torchdata/datapipes/iter/load/online.py#L55-L96

Parameters
  • source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to raster files such as GeoTIFFs.

  • kwargs (Optional) – Extra keyword arguments to pass to rioxarray.open_rasterio() and/or rasterio.open().

Yields

stream_obj (xarray.DataArray) – An xarray.DataArray object containing the raster data.

Return type

None

Example

>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import RioXarrayReader
...
>>> # Read in GeoTIFF data using DataPipe
>>> file_url: str = "https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif"
>>> dp = IterableWrapper(iterable=[file_url])
>>> dp_rioxarray = dp.read_from_rioxarray()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_rioxarray)
>>> dataarray = next(it)
>>> dataarray.encoding["source"]
'https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif'
>>> dataarray
StreamWrapper<<xarray.DataArray (band: 1, y: 960, x: 1920)>
[1843200 values with dtype=uint8]
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -179.9 -179.7 -179.5 -179.3 ... 179.5 179.7 179.9
  * y            (y) float64 89.91 89.72 89.53 89.34 ... -89.53 -89.72 -89.91
    spatial_ref  int64 0
...

Pyogrio#

DataPipes for pyogrio.

zen3geo.datapipes.PyogrioReader#

alias of zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe

class zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe(source_datapipe, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe[Tuple[str, torch.utils.data.datapipes.utils.common.StreamWrapper]]

Takes vector files (e.g. FlatGeoBuf, GeoPackage, GeoJSON) from local disk or URLs (as long as they can be read by pyogrio) and yields tuples of filename and geopandas.GeoDataFrame objects (functional name: read_from_pyogrio).

Based on https://github.com/pytorch/data/blob/v0.3.0/torchdata/datapipes/iter/load/iopath.py#L37-L83

Parameters
  • source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to vector files such as FlatGeoBuf, GeoPackage, GeoJSON, etc.

  • kwargs (Optional) – Extra keyword arguments to pass to pyogrio.read_dataframe().

Yields

stream_obj (Tuple[str, geopandas.GeoDataFrame]) – A tuple consisting of the filename that was passed in, and a geopandas.GeoDataFrame object containing the vector data.

Raises

ModuleNotFoundError – If pyogrio is not installed. See install instructions for pyogrio, and ensure that geopandas is installed too (e.g. via pip install pyogrio[geopandas]) before using this class.

Return type

None

Example

>>> import pytest
>>> pyogrio = pytest.importorskip("pyogrio")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import PyogrioReader
...
>>> # Read in GeoPackage data using DataPipe
>>> file_url: str = "https://github.com/geopandas/pyogrio/raw/v0.4.0a1/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg"
>>> dp = IterableWrapper(iterable=[file_url])
>>> dp_pyogrio = dp.read_from_pyogrio()
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_pyogrio)
>>> filename, geodataframe = next(it)
>>> filename
'https://github.com/geopandas/pyogrio/raw/v0.4.0a1/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg'
>>> geodataframe
StreamWrapper<   col_bool  col_int8  ...  col_float64                 geometry
0       1.0       1.0  ...          1.5  POINT (0.00000 0.00000)
1       0.0       2.0  ...          2.5  POINT (1.00000 1.00000)
2       1.0       3.0  ...          3.5  POINT (2.00000 2.00000)
3       NaN       NaN  ...          NaN  POINT (4.00000 4.00000)

[4 rows x 12 columns]>

Xbatcher#

DataPipes for xbatcher.

zen3geo.datapipes.XbatcherSlicer#

alias of zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe

class zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe(source_datapipe, input_dims, **kwargs)[source]#

Bases: torch.utils.data.datapipes.datapipe.IterDataPipe[Union[xarray.core.dataarray.DataArray, xarray.core.dataset.Dataset]]

Takes an xarray.DataArray or xarray.Dataset and creates a sliced window view (also known as a chip or tile) of the n-dimensional array (functional name: slice_with_xbatcher).

Parameters
  • source_datapipe (IterDataPipe[xr.DataArray]) – A DataPipe that contains xarray.DataArray or xarray.Dataset objects.

  • input_dims (dict) – A dictionary specifying the size of the inputs in each dimension to slice along, e.g. {'lon': 64, 'lat': 64}. These are the dimensions the machine learning library will see. All other dimensions will be stacked into one dimension called batch.

  • kwargs (Optional) – Extra keyword arguments to pass to xbatcher.BatchGenerator().

Yields

chip (xarray.DataArray) – An xarray.DataArray or xarray.Dataset object containing the sliced raster data, with the size/shape defined by the input_dims parameter.

Raises

ModuleNotFoundError – If xbatcher is not installed. Follow install instructions for xbatcher before using this class.

Return type

None

Example

>>> import pytest
>>> import numpy as np
>>> import xarray as xr
>>> xbatcher = pytest.importorskip("xbatcher")
...
>>> from torchdata.datapipes.iter import IterableWrapper
>>> from zen3geo.datapipes import XbatcherSlicer
...
>>> # Sliced window view of xarray.DataArray using DataPipe
>>> dataarray: xr.DataArray = xr.DataArray(
...     data=np.ones(shape=(3, 128, 128)),
...     name="foo",
...     dims=["band", "y", "x"]
... )
>>> dp = IterableWrapper(iterable=[dataarray])
>>> dp_xbatcher = dp.slice_with_xbatcher(input_dims={"y": 64, "x": 64})
...
>>> # Loop or iterate over the DataPipe stream
>>> it = iter(dp_xbatcher)
>>> dataarray_chip = next(it)
>>> dataarray_chip
<xarray.Dataset>
Dimensions:  (band: 3, y: 64, x: 64)
Dimensions without coordinates: band, y, x
Data variables:
    foo      (band, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0