API Reference
Contents
API Reference#
DataPipes#
Iterable-style DataPipes for geospatial raster 🌈 and vector 🚏 data.
Rioxarray#
DataPipes for rioxarray.
- zen3geo.datapipes.RioXarrayReader#
alias of
zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe
- class zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
torch.utils.data.datapipes.datapipe.IterDataPipe[torch.utils.data.datapipes.utils.common.StreamWrapper]Takes raster files (e.g. GeoTIFFs) from local disk or URLs (as long as they can be read by rioxarray and/or rasterio) and yields
xarray.DataArrayobjects (functional name:read_from_rioxarray).Based on https://github.com/pytorch/data/blob/v0.4.0/torchdata/datapipes/iter/load/online.py#L55-L96
- Parameters
source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to raster files such as GeoTIFFs.
kwargs (Optional) – Extra keyword arguments to pass to
rioxarray.open_rasterio()and/orrasterio.open().
- Yields
stream_obj (xarray.DataArray) – An
xarray.DataArrayobject containing the raster data.- Return type
None
Example
>>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import RioXarrayReader ... >>> # Read in GeoTIFF data using DataPipe >>> file_url: str = "https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif" >>> dp = IterableWrapper(iterable=[file_url]) >>> dp_rioxarray = dp.read_from_rioxarray() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_rioxarray) >>> dataarray = next(it) >>> dataarray.encoding["source"] 'https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif' >>> dataarray StreamWrapper<<xarray.DataArray (band: 1, y: 960, x: 1920)> [1843200 values with dtype=uint8] Coordinates: * band (band) int64 1 * x (x) float64 -179.9 -179.7 -179.5 -179.3 ... 179.5 179.7 179.9 * y (y) float64 89.91 89.72 89.53 89.34 ... -89.53 -89.72 -89.91 spatial_ref int64 0 ...
Pyogrio#
DataPipes for pyogrio.
- zen3geo.datapipes.PyogrioReader#
alias of
zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe
- class zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
torch.utils.data.datapipes.datapipe.IterDataPipe[Tuple[str,torch.utils.data.datapipes.utils.common.StreamWrapper]]Takes vector files (e.g. FlatGeoBuf, GeoPackage, GeoJSON) from local disk or URLs (as long as they can be read by pyogrio) and yields tuples of filename and
geopandas.GeoDataFrameobjects (functional name:read_from_pyogrio).Based on https://github.com/pytorch/data/blob/v0.3.0/torchdata/datapipes/iter/load/iopath.py#L37-L83
- Parameters
source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to vector files such as FlatGeoBuf, GeoPackage, GeoJSON, etc.
kwargs (Optional) – Extra keyword arguments to pass to
pyogrio.read_dataframe().
- Yields
stream_obj (Tuple[str, geopandas.GeoDataFrame]) – A tuple consisting of the filename that was passed in, and a
geopandas.GeoDataFrameobject containing the vector data.- Raises
ModuleNotFoundError – If
pyogriois not installed. See install instructions for pyogrio, and ensure thatgeopandasis installed too (e.g. viapip install pyogrio[geopandas]) before using this class.- Return type
None
Example
>>> import pytest >>> pyogrio = pytest.importorskip("pyogrio") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import PyogrioReader ... >>> # Read in GeoPackage data using DataPipe >>> file_url: str = "https://github.com/geopandas/pyogrio/raw/v0.4.0a1/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg" >>> dp = IterableWrapper(iterable=[file_url]) >>> dp_pyogrio = dp.read_from_pyogrio() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_pyogrio) >>> filename, geodataframe = next(it) >>> filename 'https://github.com/geopandas/pyogrio/raw/v0.4.0a1/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg' >>> geodataframe StreamWrapper< col_bool col_int8 ... col_float64 geometry 0 1.0 1.0 ... 1.5 POINT (0.00000 0.00000) 1 0.0 2.0 ... 2.5 POINT (1.00000 1.00000) 2 1.0 3.0 ... 3.5 POINT (2.00000 2.00000) 3 NaN NaN ... NaN POINT (4.00000 4.00000) [4 rows x 12 columns]>
Xbatcher#
DataPipes for xbatcher.
- zen3geo.datapipes.XbatcherSlicer#
alias of
zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe
- class zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe(source_datapipe, input_dims, **kwargs)[source]#
Bases:
torch.utils.data.datapipes.datapipe.IterDataPipe[Union[xarray.core.dataarray.DataArray,xarray.core.dataset.Dataset]]Takes an
xarray.DataArrayorxarray.Datasetand creates a sliced window view (also known as a chip or tile) of the n-dimensional array (functional name:slice_with_xbatcher).- Parameters
source_datapipe (IterDataPipe[xr.DataArray]) – A DataPipe that contains
xarray.DataArrayorxarray.Datasetobjects.input_dims (dict) – A dictionary specifying the size of the inputs in each dimension to slice along, e.g.
{'lon': 64, 'lat': 64}. These are the dimensions the machine learning library will see. All other dimensions will be stacked into one dimension calledbatch.kwargs (Optional) – Extra keyword arguments to pass to
xbatcher.BatchGenerator().
- Yields
chip (xarray.DataArray) – An
xarray.DataArrayorxarray.Datasetobject containing the sliced raster data, with the size/shape defined by theinput_dimsparameter.- Raises
ModuleNotFoundError – If
xbatcheris not installed. Follow install instructions for xbatcher before using this class.- Return type
None
Example
>>> import pytest >>> import numpy as np >>> import xarray as xr >>> xbatcher = pytest.importorskip("xbatcher") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import XbatcherSlicer ... >>> # Sliced window view of xarray.DataArray using DataPipe >>> dataarray: xr.DataArray = xr.DataArray( ... data=np.ones(shape=(3, 128, 128)), ... name="foo", ... dims=["band", "y", "x"] ... ) >>> dp = IterableWrapper(iterable=[dataarray]) >>> dp_xbatcher = dp.slice_with_xbatcher(input_dims={"y": 64, "x": 64}) ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_xbatcher) >>> dataarray_chip = next(it) >>> dataarray_chip <xarray.Dataset> Dimensions: (band: 3, y: 64, x: 64) Dimensions without coordinates: band, y, x Data variables: foo (band, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0