API Reference
Contents
API Reference#
DataPipes#
Iterable-style DataPipes for geospatial raster 🌈 and vector 🚏 data.
Rioxarray#
DataPipes for rioxarray.
- zen3geo.datapipes.RioXarrayReader#
alias of
zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe
- class zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
torch.utils.data.datapipes.datapipe.IterDataPipe
[torch.utils.data.datapipes.utils.common.StreamWrapper
]Takes raster files (e.g. GeoTIFFs) from local disk or URLs (as long as they can be read by rioxarray and/or rasterio) and yields
xarray.DataArray
objects (functional name:read_from_rioxarray
).Based on https://github.com/pytorch/data/blob/v0.4.0/torchdata/datapipes/iter/load/online.py#L55-L96
- Parameters
source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to raster files such as GeoTIFFs.
kwargs (Optional) – Extra keyword arguments to pass to
rioxarray.open_rasterio()
and/orrasterio.open()
.
- Yields
stream_obj (xarray.DataArray) – An
xarray.DataArray
object containing the raster data.- Return type
None
Example
>>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import RioXarrayReader ... >>> # Read in GeoTIFF data using DataPipe >>> file_url: str = "https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif" >>> dp = IterableWrapper(iterable=[file_url]) >>> dp_rioxarray = dp.read_from_rioxarray() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_rioxarray) >>> dataarray = next(it) >>> dataarray.encoding["source"] 'https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif' >>> dataarray StreamWrapper<<xarray.DataArray (band: 1, y: 960, x: 1920)> [1843200 values with dtype=uint8] Coordinates: * band (band) int64 1 * x (x) float64 -179.9 -179.7 -179.5 -179.3 ... 179.5 179.7 179.9 * y (y) float64 89.91 89.72 89.53 89.34 ... -89.53 -89.72 -89.91 spatial_ref int64 0 ...
Pyogrio#
DataPipes for pyogrio.
- zen3geo.datapipes.PyogrioReader#
alias of
zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe
- class zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
torch.utils.data.datapipes.datapipe.IterDataPipe
[Tuple
[str
,torch.utils.data.datapipes.utils.common.StreamWrapper
]]Takes vector files (e.g. FlatGeoBuf, GeoPackage, GeoJSON) from local disk or URLs (as long as they can be read by pyogrio) and yields tuples of filename and
geopandas.GeoDataFrame
objects (functional name:read_from_pyogrio
).Based on https://github.com/pytorch/data/blob/v0.3.0/torchdata/datapipes/iter/load/iopath.py#L37-L83
- Parameters
source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to vector files such as FlatGeoBuf, GeoPackage, GeoJSON, etc.
kwargs (Optional) – Extra keyword arguments to pass to
pyogrio.read_dataframe()
.
- Yields
stream_obj (Tuple[str, geopandas.GeoDataFrame]) – A tuple consisting of the filename that was passed in, and a
geopandas.GeoDataFrame
object containing the vector data.- Raises
ModuleNotFoundError – If
pyogrio
is not installed. See install instructions for pyogrio, and ensure thatgeopandas
is installed too (e.g. viapip install pyogrio[geopandas]
) before using this class.- Return type
None
Example
>>> import pytest >>> pyogrio = pytest.importorskip("pyogrio") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import PyogrioReader ... >>> # Read in GeoPackage data using DataPipe >>> file_url: str = "https://github.com/geopandas/pyogrio/raw/v0.4.0a1/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg" >>> dp = IterableWrapper(iterable=[file_url]) >>> dp_pyogrio = dp.read_from_pyogrio() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_pyogrio) >>> filename, geodataframe = next(it) >>> filename 'https://github.com/geopandas/pyogrio/raw/v0.4.0a1/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg' >>> geodataframe StreamWrapper< col_bool col_int8 ... col_float64 geometry 0 1.0 1.0 ... 1.5 POINT (0.00000 0.00000) 1 0.0 2.0 ... 2.5 POINT (1.00000 1.00000) 2 1.0 3.0 ... 3.5 POINT (2.00000 2.00000) 3 NaN NaN ... NaN POINT (4.00000 4.00000) [4 rows x 12 columns]>
Xbatcher#
DataPipes for xbatcher.
- zen3geo.datapipes.XbatcherSlicer#
alias of
zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe
- class zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe(source_datapipe, input_dims, **kwargs)[source]#
Bases:
torch.utils.data.datapipes.datapipe.IterDataPipe
[Union
[xarray.core.dataarray.DataArray
,xarray.core.dataset.Dataset
]]Takes an
xarray.DataArray
orxarray.Dataset
and creates a sliced window view (also known as a chip or tile) of the n-dimensional array (functional name:slice_with_xbatcher
).- Parameters
source_datapipe (IterDataPipe[xr.DataArray]) – A DataPipe that contains
xarray.DataArray
orxarray.Dataset
objects.input_dims (dict) – A dictionary specifying the size of the inputs in each dimension to slice along, e.g.
{'lon': 64, 'lat': 64}
. These are the dimensions the machine learning library will see. All other dimensions will be stacked into one dimension calledbatch
.kwargs (Optional) – Extra keyword arguments to pass to
xbatcher.BatchGenerator()
.
- Yields
chip (xarray.DataArray) – An
xarray.DataArray
orxarray.Dataset
object containing the sliced raster data, with the size/shape defined by theinput_dims
parameter.- Raises
ModuleNotFoundError – If
xbatcher
is not installed. Follow install instructions for xbatcher before using this class.- Return type
None
Example
>>> import pytest >>> import numpy as np >>> import xarray as xr >>> xbatcher = pytest.importorskip("xbatcher") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import XbatcherSlicer ... >>> # Sliced window view of xarray.DataArray using DataPipe >>> dataarray: xr.DataArray = xr.DataArray( ... data=np.ones(shape=(3, 128, 128)), ... name="foo", ... dims=["band", "y", "x"] ... ) >>> dp = IterableWrapper(iterable=[dataarray]) >>> dp_xbatcher = dp.slice_with_xbatcher(input_dims={"y": 64, "x": 64}) ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_xbatcher) >>> dataarray_chip = next(it) >>> dataarray_chip <xarray.Dataset> Dimensions: (band: 3, y: 64, x: 64) Dimensions without coordinates: band, y, x Data variables: foo (band, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0