API Reference
Contents
API Reference#
DataPipes#
Iterable-style DataPipes for geospatial raster 🌈 and vector 🚏 data.
Datashader#
DataPipes for datashader.
- zen3geo.datapipes.DatashaderRasterizer#
alias of
zen3geo.datapipes.datashader.DatashaderRasterizerIterDataPipe
- class zen3geo.datapipes.datashader.DatashaderRasterizerIterDataPipe(source_datapipe, vector_datapipe, agg=None, **kwargs)[source]#
Takes vector
geopandas.GeoSeriesorgeopandas.GeoDataFramegeometries and rasterizes them usingdatashader.Canvasto yield anxarray.DataArrayraster with the input geometries aggregated into a fixed-sized grid (functional name:rasterize_with_datashader).- Parameters
source_datapipe (IterDataPipe[datashader.Canvas]) – A DataPipe that contains
datashader.Canvasobjects with a.crsattribute. This will be the template defining the output raster’s spatial extent and x/y range.vector_datapipe (IterDataPipe[geopandas.GeoDataFrame]) – A DataPipe that contains
geopandas.GeoSeriesorgeopandas.GeoDataFramevector geometries with a.crsproperty.agg (Optional[datashader.reductions.Reduction]) –
Reduction operation to compute. Default depends on the input vector type:
For points, default is
datashader.reductions.countFor lines, default is
datashader.reductions.anyFor polygons, default is
datashader.reductions.any
For more information, refer to the section on Aggregation under datashader’s Pipeline docs.
kwargs (Optional) – Extra keyword arguments to pass to the
datashader.Canvasclass’s aggregation methods such asdatashader.Canvas.points.
- Yields
raster (xarray.DataArray) – An
xarray.DataArrayobject containing the raster data. This raster will have arioxarray.rioxarray.XRasterBase.crsproperty and a proper affine transform viewable withrioxarray.rioxarray.XRasterBase.transform().- Raises
ModuleNotFoundError – If
spatialpandasis not installed. Please install it (e.g. viapip install spatialpandas) before using this class.ValueError – If either the length of the
vector_datapipeis not 1, or if the length of thevector_datapipeis not equal to the length of thesource_datapipe. I.e. the ratio of vector:canvas must be 1:N or be exactly N:N.AttributeError – If either the canvas in
source_datapipeor vector geometry invector_datapipeis missing a.crsattribute. Please set the coordinate reference system (e.g. usingcanvas.crs = 'OGC:CRS84'for thedatashader.Canvasinput orvector = vector.set_crs(crs='OGC:CRS84')for thegeopandas.GeoSeriesorgeopandas.GeoDataFrameinput) before passing them into the datapipe.NotImplementedError – If the input vector geometry type to
vector_datapipeis not supported, typically when ashapely.geometry.GeometryCollectionis used. Supported types include Point, LineString, and Polygon, plus their multipart equivalents MultiPoint, MultiLineString, and MultiPolygon.
- Return type
None
Example
>>> import pytest >>> datashader = pytest.importorskip("datashader") >>> pyogrio = pytest.importorskip("pyogrio") >>> spatialpandas = pytest.importorskip("spatialpandas") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import DatashaderRasterizer ... >>> # Read in a vector point data source >>> geodataframe = pyogrio.read_dataframe( ... "https://github.com/geopandas/pyogrio/raw/v0.4.0/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg", ... read_geometry=True, ... ) >>> assert geodataframe.crs == "EPSG:4326" # latitude/longitude coords >>> dp_vector = IterableWrapper(iterable=[geodataframe]) ... >>> # Setup blank raster canvas where we will burn vector geometries onto >>> canvas = datashader.Canvas( ... plot_width=5, ... plot_height=6, ... x_range=(160000.0, 620000.0), ... y_range=(0.0, 450000.0), ... ) >>> canvas.crs = "EPSG:32631" # UTM Zone 31N, North of Gulf of Guinea >>> dp_canvas = IterableWrapper(iterable=[canvas]) ... >>> # Rasterize vector point geometries onto blank canvas >>> dp_datashader = dp_canvas.rasterize_with_datashader( ... vector_datapipe=dp_vector ... ) ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_datashader) >>> dataarray = next(it) >>> dataarray <xarray.DataArray (y: 6, x: 5)> array([[0, 0, 0, 0, 1], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 1, 0, 0], [0, 1, 0, 0, 0], [1, 0, 0, 0, 0]], dtype=uint32) Coordinates: * x (x) float64 2.094e+05 3.083e+05 4.072e+05 5.06e+05 6.049e+05 * y (y) float64 4.157e+05 3.47e+05 2.783e+05 ... 1.41e+05 7.237e+04 spatial_ref int64 0 ... >>> dataarray.rio.crs CRS.from_epsg(32631) >>> dataarray.rio.transform() Affine(98871.00388807665, 0.0, 160000.0, 0.0, -68660.4193667199, 450000.0)
- zen3geo.datapipes.XarrayCanvas#
alias of
zen3geo.datapipes.datashader.XarrayCanvasIterDataPipe
- class zen3geo.datapipes.datashader.XarrayCanvasIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
torch.utils.data.datapipes.datapipe.IterDataPipe[Union[xarray.core.dataarray.DataArray,xarray.core.dataset.Dataset]]Takes an
xarray.DataArrayorxarray.Datasetand creates a blankdatashader.Canvasbased on the spatial extent and coordinates of the input (functional name:canvas_from_xarray).- Parameters
source_datapipe (IterDataPipe[xarrray.DataArray]) – A DataPipe that contains
xarray.DataArrayorxarray.Datasetobjects. These data objects need to have both a.rio.x_dimand.rio.y_dimattribute, which is present if the original dataset was opened usingrioxarray.open_rasterio(), or by setting it manually usingrioxarray.rioxarray.XRasterBase.set_spatial_dims().kwargs (Optional) – Extra keyword arguments to pass to
datashader.Canvas.
- Yields
canvas (datashader.Canvas) – A
datashader.Canvasobject representing the same spatial extent and x/y coordinates of the input raster grid. This canvas will also have a.crsattribute that captures the original Coordinate Reference System from the input xarray object’srioxarray.rioxarray.XRasterBase.crsproperty.- Raises
ModuleNotFoundError – If
datashaderis not installed. Follow install instructions for datashader before using this class.- Return type
None
Example
>>> import pytest >>> import numpy as np >>> import xarray as xr >>> datashader = pytest.importorskip("datashader") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import XarrayCanvas ... >>> # Create blank canvas from xarray.DataArray using DataPipe >>> y = np.arange(0, -3, step=-1) >>> x = np.arange(0, 6) >>> dataarray: xr.DataArray = xr.DataArray( ... data=np.zeros(shape=(1, 3, 6)), ... coords=dict(band=[1], y=y, x=x), ... ) >>> dataarray = dataarray.rio.set_spatial_dims(x_dim="x", y_dim="y") >>> dp = IterableWrapper(iterable=[dataarray]) >>> dp_canvas = dp.canvas_from_xarray() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_canvas) >>> canvas = next(it) >>> print(canvas.raster(source=dataarray)) <xarray.DataArray (band: 1, y: 3, x: 6)> array([[[0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.]]]) Coordinates: * x (x) int64 0 1 2 3 4 5 * y (y) int64 0 -1 -2 * band (band) int64 1 ...
Geopandas#
DataPipes for geopandas.
- zen3geo.datapipes.GeoPandasRectangleClipper#
alias of
zen3geo.datapipes.geopandas.GeoPandasRectangleClipperIterDataPipe
- class zen3geo.datapipes.geopandas.GeoPandasRectangleClipperIterDataPipe(source_datapipe, mask_datapipe, **kwargs)[source]#
Bases:
torch.utils.data.datapipes.datapipe.IterDataPipeTakes vector
geopandas.GeoSeriesorgeopandas.GeoDataFramegeometries and clips them with the rectangular extent of anxarray.DataArrayorxarray.Datasetgrid to yield tuples of spatially subsettedgeopandas.GeoSeriesorgeopandas.GeoDataFramevectors and the correpondingxarray.DataArrayorxarray.Datasetraster object used as the clip mask (functional name:clip_vector_with_rectangle).Uses the rectangular clip algorithm of
geopandas.clip(), with the bounding box rectangle (minx, miny, maxx, maxy) derived from input raster mask’s bounding box extent.Note
If the input vector’s coordinate reference system (
crs) is different to the raster mask’s coordinate reference system (rio.crs), the vector will be reprojected usinggeopandas.GeoDataFrame.to_crs()to match the raster’s coordinate reference system.- Parameters
source_datapipe (IterDataPipe[geopandas.GeoDataFrame]) – A DataPipe that contains
geopandas.GeoSeriesorgeopandas.GeoDataFramevector geometries with a.crsproperty.mask_datapipe (IterDataPipe[xarray.DataArray]) – A DataPipe that contains
xarray.DataArrayorxarray.Datasetobjects with a.rio.crsproperty and.rio.boundsmethod.kwargs (Optional) – Extra keyword arguments to pass to
geopandas.clip().
- Yields
paired_obj (Tuple[geopandas.GeoDataFrame, xarray.DataArray]) – A tuple consisting of the spatially subsetted
geopandas.GeoSeriesorgeopandas.GeoDataFramevector, and the correspondingxarray.DataArrayorxarray.Datasetraster used as the clip mask.- Raises
ModuleNotFoundError – If
geopandasis not installed. See install instructions for geopandas (e.g. viapip install geopandas) before using this class.NotImplementedError – If the length of the vector
source_datapipeis not 1. Currently, all of the vector geometries have to be merged into a singlegeopandas.GeoSeriesorgeopandas.GeoDataFrame. Refer to the section on Appending under geopandas’ Merging Data docs.
- Return type
None
Example
>>> import pytest >>> import rioxarray >>> gpd = pytest.importorskip("geopandas") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import GeoPandasRectangleClipper ... >>> # Read in a vector polygon data source >>> geodataframe = gpd.read_file( ... filename="https://github.com/geopandas/geopandas/raw/v0.11.1/geopandas/tests/data/overlay/polys/df1.geojson", ... ) >>> assert geodataframe.crs == "EPSG:4326" # latitude/longitude coords >>> dp_vector = IterableWrapper(iterable=[geodataframe]) ... >>> # Get list of raster grids to cut up the vector polygon later >>> dataarray = rioxarray.open_rasterio( ... filename="https://github.com/rasterio/rasterio/raw/1.3.2/tests/data/world.byte.tif" ... ) >>> assert dataarray.rio.crs == "EPSG:4326" # latitude/longitude coords >>> dp_raster = IterableWrapper( ... iterable=[ ... dataarray.sel(x=slice(0, 2)), # longitude 0 to 2 degrees ... dataarray.sel(x=slice(2, 4)), # longitude 2 to 4 degrees ... ] ... ) ... >>> # Clip vector point geometries based on raster masks >>> dp_clipped = dp_vector.clip_vector_with_rectangle( ... mask_datapipe=dp_raster ... ) ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_clipped) >>> geodataframe0, raster0 = next(it) >>> geodataframe0 col1 geometry 0 1 POLYGON ((0.00000 0.00000, 0.00000 2.00000, 2.... >>> raster0 <xarray.DataArray (band: 1, y: 1200, x: 16)> array([[[0, 0, ..., 0, 0], [0, 0, ..., 0, 0], ..., [1, 1, ..., 1, 1], [1, 1, ..., 1, 1]]], dtype=uint8) Coordinates: * band (band) int64 1 * x (x) float64 0.0625 0.1875 0.3125 0.4375 ... 1.688 1.812 1.938 * y (y) float64 74.94 74.81 74.69 74.56 ... -74.69 -74.81 -74.94 spatial_ref int64 0 ... >>> geodataframe1, raster1 = next(it) >>> geodataframe1 col1 geometry 1 2 POLYGON ((2.00000 2.00000, 2.00000 4.00000, 4....
Pyogrio#
DataPipes for pyogrio.
- zen3geo.datapipes.PyogrioReader#
alias of
zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe
- class zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
torch.utils.data.datapipes.datapipe.IterDataPipe[torch.utils.data.datapipes.utils.common.StreamWrapper]Takes vector files (e.g. FlatGeoBuf, GeoPackage, GeoJSON) from local disk or URLs (as long as they can be read by pyogrio) and yields
geopandas.GeoDataFrameobjects (functional name:read_from_pyogrio).Based on https://github.com/pytorch/data/blob/v0.4.0/torchdata/datapipes/iter/load/iopath.py#L42-L97
- Parameters
source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to vector files such as FlatGeoBuf, GeoPackage, GeoJSON, etc.
kwargs (Optional) – Extra keyword arguments to pass to
pyogrio.read_dataframe().
- Yields
stream_obj (geopandas.GeoDataFrame) – A
geopandas.GeoDataFrameobject containing the vector data.- Raises
ModuleNotFoundError – If
pyogriois not installed. See install instructions for pyogrio, and ensure thatgeopandasis installed too (e.g. viapip install pyogrio[geopandas]) before using this class.- Return type
None
Example
>>> import pytest >>> pyogrio = pytest.importorskip("pyogrio") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import PyogrioReader ... >>> # Read in GeoPackage data using DataPipe >>> file_url: str = "https://github.com/geopandas/pyogrio/raw/v0.4.0/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg" >>> dp = IterableWrapper(iterable=[file_url]) >>> dp_pyogrio = dp.read_from_pyogrio() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_pyogrio) >>> geodataframe = next(it) >>> geodataframe StreamWrapper< col_bool col_int8 ... col_float64 geometry 0 1.0 1.0 ... 1.5 POINT (0.00000 0.00000) 1 0.0 2.0 ... 2.5 POINT (1.00000 1.00000) 2 1.0 3.0 ... 3.5 POINT (2.00000 2.00000) 3 NaN NaN ... NaN POINT (4.00000 4.00000) [4 rows x 12 columns]>
Rioxarray#
DataPipes for rioxarray.
- zen3geo.datapipes.RioXarrayReader#
alias of
zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe
- class zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
torch.utils.data.datapipes.datapipe.IterDataPipe[torch.utils.data.datapipes.utils.common.StreamWrapper]Takes raster files (e.g. GeoTIFFs) from local disk or URLs (as long as they can be read by rioxarray and/or rasterio) and yields
xarray.DataArrayobjects (functional name:read_from_rioxarray).Based on https://github.com/pytorch/data/blob/v0.4.0/torchdata/datapipes/iter/load/online.py#L55-L96
- Parameters
source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to raster files such as GeoTIFFs.
kwargs (Optional) – Extra keyword arguments to pass to
rioxarray.open_rasterio()and/orrasterio.open().
- Yields
stream_obj (xarray.DataArray) – An
xarray.DataArrayobject containing the raster data.- Return type
None
Example
>>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import RioXarrayReader ... >>> # Read in GeoTIFF data using DataPipe >>> file_url: str = "https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif" >>> dp = IterableWrapper(iterable=[file_url]) >>> dp_rioxarray = dp.read_from_rioxarray() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_rioxarray) >>> dataarray = next(it) >>> dataarray.encoding["source"] 'https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif' >>> dataarray StreamWrapper<<xarray.DataArray (band: 1, y: 960, x: 1920)> [1843200 values with dtype=uint8] Coordinates: * band (band) int64 1 * x (x) float64 -179.9 -179.7 -179.5 -179.3 ... 179.5 179.7 179.9 * y (y) float64 89.91 89.72 89.53 89.34 ... -89.53 -89.72 -89.91 spatial_ref int64 0 ...
Xbatcher#
DataPipes for xbatcher.
- zen3geo.datapipes.XbatcherSlicer#
alias of
zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe
- class zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe(source_datapipe, input_dims, **kwargs)[source]#
Bases:
torch.utils.data.datapipes.datapipe.IterDataPipe[Union[xarray.core.dataarray.DataArray,xarray.core.dataset.Dataset]]Takes an
xarray.DataArrayorxarray.Datasetand creates a sliced window view (also known as a chip or tile) of the n-dimensional array (functional name:slice_with_xbatcher).- Parameters
source_datapipe (IterDataPipe[xarray.DataArray]) – A DataPipe that contains
xarray.DataArrayorxarray.Datasetobjects.input_dims (dict) – A dictionary specifying the size of the inputs in each dimension to slice along, e.g.
{'lon': 64, 'lat': 64}. These are the dimensions the machine learning library will see. All other dimensions will be stacked into one dimension calledbatch.kwargs (Optional) – Extra keyword arguments to pass to
xbatcher.BatchGenerator().
- Yields
chip (xarray.DataArray) – An
xarray.DataArrayorxarray.Datasetobject containing the sliced raster data, with the size/shape defined by theinput_dimsparameter.- Raises
ModuleNotFoundError – If
xbatcheris not installed. Follow install instructions for xbatcher before using this class.- Return type
None
Example
>>> import pytest >>> import numpy as np >>> import xarray as xr >>> xbatcher = pytest.importorskip("xbatcher") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import XbatcherSlicer ... >>> # Sliced window view of xarray.DataArray using DataPipe >>> dataarray: xr.DataArray = xr.DataArray( ... data=np.ones(shape=(3, 128, 128)), ... name="foo", ... dims=["band", "y", "x"] ... ) >>> dp = IterableWrapper(iterable=[dataarray]) >>> dp_xbatcher = dp.slice_with_xbatcher(input_dims={"y": 64, "x": 64}) ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_xbatcher) >>> dataarray_chip = next(it) >>> dataarray_chip <xarray.Dataset> Dimensions: (band: 3, y: 64, x: 64) Dimensions without coordinates: band, y, x Data variables: foo (band, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0