API Reference#
DataPipes#
Iterable-style DataPipes for geospatial raster 🌈 and vector 🚏 data.
Datashader#
DataPipes for datashader.
- zen3geo.datapipes.DatashaderRasterizer#
alias of
DatashaderRasterizerIterDataPipe
- class zen3geo.datapipes.datashader.DatashaderRasterizerIterDataPipe(source_datapipe, vector_datapipe, agg=None, **kwargs)[source]#
Takes vector
geopandas.GeoSeries
orgeopandas.GeoDataFrame
geometries and rasterizes them usingdatashader.Canvas
to yield anxarray.DataArray
raster with the input geometries aggregated into a fixed-sized grid (functional name:rasterize_with_datashader
).- Parameters:
source_datapipe (IterDataPipe[datashader.Canvas]) – A DataPipe that contains
datashader.Canvas
objects with a.crs
attribute. This will be the template defining the output raster’s spatial extent and x/y range.vector_datapipe (IterDataPipe[geopandas.GeoDataFrame]) – A DataPipe that contains
geopandas.GeoSeries
orgeopandas.GeoDataFrame
vector geometries with a.crs
property.agg (Optional[datashader.reductions.Reduction]) –
Reduction operation to compute. Default depends on the input vector type:
For points, default is
datashader.reductions.count
For lines, default is
datashader.reductions.any
For polygons, default is
datashader.reductions.any
For more information, refer to the section on Aggregation under datashader’s Pipeline docs.
kwargs (Optional) – Extra keyword arguments to pass to the
datashader.Canvas
class’s aggregation methods such asdatashader.Canvas.points
.
- Yields:
raster (xarray.DataArray) – An
xarray.DataArray
object containing the raster data. This raster will have arioxarray.rioxarray.XRasterBase.crs
property and a proper affine transform viewable withrioxarray.rioxarray.XRasterBase.transform()
.- Raises:
ModuleNotFoundError – If
spatialpandas
is not installed. Please install it (e.g. viapip install spatialpandas
) before using this class.ValueError – If either the length of the
vector_datapipe
is not 1, or if the length of thevector_datapipe
is not equal to the length of thesource_datapipe
. I.e. the ratio of vector:canvas must be 1:N or be exactly N:N.AttributeError – If either the canvas in
source_datapipe
or vector geometry invector_datapipe
is missing a.crs
attribute. Please set the coordinate reference system (e.g. usingcanvas.crs = 'OGC:CRS84'
for thedatashader.Canvas
input orvector = vector.set_crs(crs='OGC:CRS84')
for thegeopandas.GeoSeries
orgeopandas.GeoDataFrame
input) before passing them into the datapipe.NotImplementedError – If the input vector geometry type to
vector_datapipe
is not supported, typically when ashapely.geometry.GeometryCollection
is used. Supported types include Point, LineString, and Polygon, plus their multipart equivalents MultiPoint, MultiLineString, and MultiPolygon.
Example
>>> import pytest >>> datashader = pytest.importorskip("datashader") >>> pyogrio = pytest.importorskip("pyogrio") >>> spatialpandas = pytest.importorskip("spatialpandas") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import DatashaderRasterizer ... >>> # Read in a vector point data source >>> geodataframe = pyogrio.read_dataframe( ... "https://github.com/geopandas/pyogrio/raw/v0.4.0/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg", ... read_geometry=True, ... ) >>> assert geodataframe.crs == "EPSG:4326" # latitude/longitude coords >>> dp_vector = IterableWrapper(iterable=[geodataframe]) ... >>> # Setup blank raster canvas where we will burn vector geometries onto >>> canvas = datashader.Canvas( ... plot_width=5, ... plot_height=6, ... x_range=(160000.0, 620000.0), ... y_range=(0.0, 450000.0), ... ) >>> canvas.crs = "EPSG:32631" # UTM Zone 31N, North of Gulf of Guinea >>> dp_canvas = IterableWrapper(iterable=[canvas]) ... >>> # Rasterize vector point geometries onto blank canvas >>> dp_datashader = dp_canvas.rasterize_with_datashader( ... vector_datapipe=dp_vector ... ) ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_datashader) >>> dataarray = next(it) >>> dataarray <xarray.DataArray (y: 6, x: 5)> array([[0, 0, 0, 0, 1], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 1, 0, 0], [0, 1, 0, 0, 0], [1, 0, 0, 0, 0]], dtype=uint32) Coordinates: * x (x) float64 2.094e+05 3.083e+05 4.072e+05 5.06e+05 6.049e+05 * y (y) float64 4.157e+05 3.47e+05 2.783e+05 ... 1.41e+05 7.237e+04 spatial_ref int64 0 ... >>> dataarray.rio.crs CRS.from_epsg(32631) >>> dataarray.rio.transform() Affine(98871.00388807665, 0.0, 160000.0, 0.0, -68660.4193667199, 450000.0)
- zen3geo.datapipes.XarrayCanvas#
alias of
XarrayCanvasIterDataPipe
- class zen3geo.datapipes.datashader.XarrayCanvasIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
IterDataPipe
[Union
[DataArray
,Dataset
]]Takes an
xarray.DataArray
orxarray.Dataset
and creates a blankdatashader.Canvas
based on the spatial extent and coordinates of the input (functional name:canvas_from_xarray
).- Parameters:
source_datapipe (IterDataPipe[xarrray.DataArray]) – A DataPipe that contains
xarray.DataArray
orxarray.Dataset
objects. These data objects need to have both a.rio.x_dim
and.rio.y_dim
attribute, which is present if the original dataset was opened usingrioxarray.open_rasterio()
, or by setting it manually usingrioxarray.rioxarray.XRasterBase.set_spatial_dims()
.kwargs (Optional) – Extra keyword arguments to pass to
datashader.Canvas
.
- Yields:
canvas (datashader.Canvas) – A
datashader.Canvas
object representing the same spatial extent and x/y coordinates of the input raster grid. This canvas will also have a.crs
attribute that captures the original Coordinate Reference System from the input xarray object’srioxarray.rioxarray.XRasterBase.crs
property.- Raises:
ModuleNotFoundError – If
datashader
is not installed. Follow install instructions for datashader before using this class.
Example
>>> import pytest >>> import numpy as np >>> import xarray as xr >>> datashader = pytest.importorskip("datashader") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import XarrayCanvas ... >>> # Create blank canvas from xarray.DataArray using DataPipe >>> y = np.arange(0, -3, step=-1) >>> x = np.arange(0, 6) >>> dataarray: xr.DataArray = xr.DataArray( ... data=np.zeros(shape=(1, 3, 6)), ... coords=dict(band=[1], y=y, x=x), ... ) >>> dataarray = dataarray.rio.set_spatial_dims(x_dim="x", y_dim="y") >>> dp = IterableWrapper(iterable=[dataarray]) >>> dp_canvas = dp.canvas_from_xarray() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_canvas) >>> canvas = next(it) >>> print(canvas.raster(source=dataarray)) <xarray.DataArray (band: 1, y: 3, x: 6)> array([[[0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.]]]) Coordinates: * x (x) int64 0 1 2 3 4 5 * y (y) int64 0 -1 -2 * band (band) int64 1 ...
Geopandas#
DataPipes for geopandas.
- zen3geo.datapipes.GeoPandasRectangleClipper#
- class zen3geo.datapipes.geopandas.GeoPandasRectangleClipperIterDataPipe(source_datapipe, mask_datapipe, **kwargs)[source]#
Bases:
IterDataPipe
Takes vector
geopandas.GeoSeries
orgeopandas.GeoDataFrame
geometries and clips them with the rectangular extent of anxarray.DataArray
orxarray.Dataset
grid to yield tuples of spatially subsettedgeopandas.GeoSeries
orgeopandas.GeoDataFrame
vectors and the correpondingxarray.DataArray
orxarray.Dataset
raster object used as the clip mask (functional name:clip_vector_with_rectangle
).Uses the rectangular clip algorithm of
geopandas.clip()
, with the bounding box rectangle (minx, miny, maxx, maxy) derived from input raster mask’s bounding box extent.Note
If the input vector’s coordinate reference system (
crs
) is different to the raster mask’s coordinate reference system (rio.crs
), the vector will be reprojected usinggeopandas.GeoDataFrame.to_crs()
to match the raster’s coordinate reference system.- Parameters:
source_datapipe (IterDataPipe[geopandas.GeoDataFrame]) – A DataPipe that contains
geopandas.GeoSeries
orgeopandas.GeoDataFrame
vector geometries with a.crs
property.mask_datapipe (IterDataPipe[xarray.DataArray]) – A DataPipe that contains
xarray.DataArray
orxarray.Dataset
objects with a.rio.crs
property and.rio.bounds
method.kwargs (Optional) – Extra keyword arguments to pass to
geopandas.clip()
.
- Yields:
paired_obj (Tuple[geopandas.GeoDataFrame, xarray.DataArray]) – A tuple consisting of the spatially subsetted
geopandas.GeoSeries
orgeopandas.GeoDataFrame
vector, and the correspondingxarray.DataArray
orxarray.Dataset
raster used as the clip mask.- Raises:
ModuleNotFoundError – If
geopandas
is not installed. See install instructions for geopandas (e.g. viapip install geopandas
) before using this class.NotImplementedError – If the length of the vector
source_datapipe
is not 1. Currently, all of the vector geometries have to be merged into a singlegeopandas.GeoSeries
orgeopandas.GeoDataFrame
. Refer to the section on Appending under geopandas’ Merging data docs.
Example
>>> import pytest >>> import rioxarray >>> gpd = pytest.importorskip("geopandas") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import GeoPandasRectangleClipper ... >>> # Read in a vector polygon data source >>> geodataframe = gpd.read_file( ... filename="https://github.com/geopandas/geopandas/raw/v0.11.1/geopandas/tests/data/overlay/polys/df1.geojson", ... ) >>> assert geodataframe.crs == "EPSG:4326" # latitude/longitude coords >>> dp_vector = IterableWrapper(iterable=[geodataframe]) ... >>> # Get list of raster grids to cut up the vector polygon later >>> dataarray = rioxarray.open_rasterio( ... filename="https://github.com/rasterio/rasterio/raw/1.3.2/tests/data/world.byte.tif" ... ) >>> assert dataarray.rio.crs == "EPSG:4326" # latitude/longitude coords >>> dp_raster = IterableWrapper( ... iterable=[ ... dataarray.sel(x=slice(0, 2)), # longitude 0 to 2 degrees ... dataarray.sel(x=slice(2, 4)), # longitude 2 to 4 degrees ... ] ... ) ... >>> # Clip vector point geometries based on raster masks >>> dp_clipped = dp_vector.clip_vector_with_rectangle( ... mask_datapipe=dp_raster ... ) ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_clipped) >>> geodataframe0, raster0 = next(it) >>> geodataframe0 col1 geometry 0 1 POLYGON ((0.00000 0.00000, 0.00000 2.00000, 2.... >>> raster0 <xarray.DataArray (band: 1, y: 1200, x: 16)> array([[[0, 0, ..., 0, 0], [0, 0, ..., 0, 0], ..., [1, 1, ..., 1, 1], [1, 1, ..., 1, 1]]], dtype=uint8) Coordinates: * band (band) int64 1 * x (x) float64 0.0625 0.1875 0.3125 0.4375 ... 1.688 1.812 1.938 * y (y) float64 74.94 74.81 74.69 74.56 ... -74.69 -74.81 -74.94 spatial_ref int64 0 ... >>> geodataframe1, raster1 = next(it) >>> geodataframe1 col1 geometry 1 2 POLYGON ((2.00000 2.00000, 2.00000 4.00000, 4....
Pyogrio#
DataPipes for pyogrio.
- zen3geo.datapipes.PyogrioReader#
alias of
PyogrioReaderIterDataPipe
- class zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
IterDataPipe
[StreamWrapper
]Takes vector files (e.g. FlatGeoBuf, GeoPackage, GeoJSON) from local disk or URLs (as long as they can be read by pyogrio) and yields
geopandas.GeoDataFrame
objects (functional name:read_from_pyogrio
).Based on pytorch/data
- Parameters:
source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to vector files such as FlatGeoBuf, GeoPackage, GeoJSON, etc.
kwargs (Optional) – Extra keyword arguments to pass to
pyogrio.read_dataframe()
.
- Yields:
stream_obj (geopandas.GeoDataFrame) – A
geopandas.GeoDataFrame
object containing the vector data.- Raises:
ModuleNotFoundError – If
pyogrio
is not installed. See install instructions for pyogrio, and ensure thatgeopandas
is installed too (e.g. viapip install pyogrio[geopandas]
) before using this class.
Example
>>> import pytest >>> pyogrio = pytest.importorskip("pyogrio") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import PyogrioReader ... >>> # Read in GeoPackage data using DataPipe >>> file_url: str = "https://github.com/geopandas/pyogrio/raw/v0.4.0/pyogrio/tests/fixtures/test_gpkg_nulls.gpkg" >>> dp = IterableWrapper(iterable=[file_url]) >>> dp_pyogrio = dp.read_from_pyogrio() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_pyogrio) >>> geodataframe = next(it) >>> geodataframe StreamWrapper< col_bool col_int8 ... col_float64 geometry 0 1.0 1.0 ... 1.5 POINT (0.00000 0.00000) 1 0.0 2.0 ... 2.5 POINT (1.00000 1.00000) 2 1.0 3.0 ... 3.5 POINT (2.00000 2.00000) 3 NaN NaN ... NaN POINT (4.00000 4.00000) [4 rows x 12 columns]>
PySTAC#
DataPipes for pystac.
- zen3geo.datapipes.PySTACItemReader#
alias of
PySTACItemReaderIterDataPipe
- class zen3geo.datapipes.pystac.PySTACItemReaderIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
IterDataPipe
Takes files from local disk or URLs (as long as they can be read by pystac) and yields
pystac.Item
objects (functional name:read_to_pystac_item
).- Parameters:
source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to STAC items.
kwargs (Optional) – Extra keyword arguments to pass to
pystac.Item.from_file()
.
- Yields:
stac_item (pystac.Item) – An
pystac.Item
object containing the specific STACObject implementation class represented in a JSON format.- Raises:
ModuleNotFoundError – If
pystac
is not installed. See install instructions for pystac, (e.g. viapip install pystac
) before using this class.
Example
>>> import pytest >>> pystac = pytest.importorskip("pystac") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import PySTACItemReader ... >>> # Read in STAC Item using DataPipe >>> item_url: str = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items/S2A_MSIL2A_20220115T032101_R118_T48NUG_20220115T170435" >>> dp = IterableWrapper(iterable=[item_url]) >>> dp_pystac = dp.read_to_pystac_item() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_pystac) >>> stac_item = next(it) >>> stac_item.bbox [103.20205689, 0.81602476, 104.18934086, 1.8096362] >>> stac_item.properties {'datetime': '2022-01-15T03:21:01.024000Z', 'platform': 'Sentinel-2A', 'proj:epsg': 32648, 'instruments': ['msi'], 's2:mgrs_tile': '48NUG', 'constellation': 'Sentinel 2', 's2:granule_id': 'S2A_OPER_MSI_L2A_TL_ESRI_20220115T170436_A034292_T48NUG_N03.00', 'eo:cloud_cover': 17.352597, 's2:datatake_id': 'GS2A_20220115T032101_034292_N03.00', 's2:product_uri': 'S2A_MSIL2A_20220115T032101_N0300_R118_T48NUG_20220115T170435.SAFE', 's2:datastrip_id': 'S2A_OPER_MSI_L2A_DS_ESRI_20220115T170436_S20220115T033502_N03.00', 's2:product_type': 'S2MSI2A', 'sat:orbit_state': 'descending', ...
PySTAC Client#
DataPipes for pystac-client.
- zen3geo.datapipes.PySTACAPISearcher#
alias of
PySTACAPISearcherIterDataPipe
- class zen3geo.datapipes.pystac_client.PySTACAPISearcherIterDataPipe(source_datapipe, catalog_url, **kwargs)[source]#
Bases:
IterDataPipe
Takes dictionaries containing a STAC API query (as long as the parameters are understood by
pystac_client.Client.search()
) and yieldspystac_client.ItemSearch
objects (functional name:search_for_pystac_item
).- Parameters:
source_datapipe (IterDataPipe[dict]) –
A DataPipe that contains STAC API query parameters in the form of a Python dictionary to pass to
pystac_client.Client.search()
. For example:bbox - A list, tuple, or iterator representing a bounding box of 2D or 3D coordinates. Results will be filtered to only those intersecting the bounding box.
datetime - Either a single datetime or datetime range used to filter results. You may express a single datetime using a
datetime.datetime
instance, a RFC 3339-compliant timestamp, or a simple date string.collections - List of one or more Collection IDs or
pystac.Collection
instances. Only Items in one of the provided Collections will be searched.
catalog_url (str) – The URL of a STAC Catalog.
kwargs (Optional) –
Extra keyword arguments to pass to
pystac_client.Client.open()
. For example:headers - A dictionary of additional headers to use in all requests made to any part of this Catalog/API.
parameters - Optional dictionary of query string parameters to include in all requests.
modifier - A callable that modifies the children collection and items returned by this Client. This can be useful for injecting authentication parameters into child assets to access data from non-public sources.
- Yields:
item_search (pystac_client.ItemSearch) – A
pystac_client.ItemSearch
object instance that represents a deferred query to a STAC search endpoint as described in the STAC API - Item Search spec.- Raises:
ModuleNotFoundError – If
pystac_client
is not installed. See install instructions for pystac-client, (e.g. viapip install pystac-client
) before using this class.
Example
>>> import pytest >>> pystac_client = pytest.importorskip("pystac_client") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import PySTACAPISearcher ... >>> # Peform STAC API query using DataPipe >>> query = dict( ... bbox=[174.5, -41.37, 174.9, -41.19], ... datetime=["2012-02-20T00:00:00Z", "2022-12-22T00:00:00Z"], ... collections=["cop-dem-glo-30"], ... ) >>> dp = IterableWrapper(iterable=[query]) >>> dp_pystac_client = dp.search_for_pystac_item( ... catalog_url="https://planetarycomputer.microsoft.com/api/stac/v1", ... # modifier=planetary_computer.sign_inplace, ... ) >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_pystac_client) >>> stac_item_search = next(it) >>> stac_items = list(stac_item_search.items()) >>> stac_items [<Item id=Copernicus_DSM_COG_10_S42_00_E174_00_DEM>] >>> stac_items[0].properties {'gsd': 30, 'datetime': '2021-04-22T00:00:00Z', 'platform': 'TanDEM-X', 'proj:epsg': 4326, 'proj:shape': [3600, 3600], 'proj:transform': [0.0002777777777777778, 0.0, 173.9998611111111, 0.0, -0.0002777777777777778, -40.99986111111111]}
Rioxarray#
DataPipes for rioxarray.
- zen3geo.datapipes.RioXarrayReader#
alias of
RioXarrayReaderIterDataPipe
- class zen3geo.datapipes.rioxarray.RioXarrayReaderIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
IterDataPipe
[StreamWrapper
]Takes raster files (e.g. GeoTIFFs) from local disk or URLs (as long as they can be read by rioxarray and/or rasterio) and yields
xarray.DataArray
objects (functional name:read_from_rioxarray
).Based on pytorch/data
- Parameters:
source_datapipe (IterDataPipe[str]) – A DataPipe that contains filepaths or URL links to raster files such as GeoTIFFs.
kwargs (Optional) – Extra keyword arguments to pass to
rioxarray.open_rasterio()
and/orrasterio.open()
.
- Yields:
stream_obj (xarray.DataArray) – An
xarray.DataArray
object containing the raster data.
Example
>>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import RioXarrayReader ... >>> # Read in GeoTIFF data using DataPipe >>> file_url: str = "https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif" >>> dp = IterableWrapper(iterable=[file_url]) >>> dp_rioxarray = dp.read_from_rioxarray() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_rioxarray) >>> dataarray = next(it) >>> dataarray.encoding["source"] 'https://github.com/GenericMappingTools/gmtserver-admin/raw/master/cache/earth_day_HD.tif' >>> dataarray StreamWrapper<<xarray.DataArray (band: 1, y: 960, x: 1920)> [1843200 values with dtype=uint8] Coordinates: * band (band) int64 1 * x (x) float64 -179.9 -179.7 -179.5 -179.3 ... 179.5 179.7 179.9 * y (y) float64 89.91 89.72 89.53 89.34 ... -89.53 -89.72 -89.91 spatial_ref int64 0 ...
Stackstac#
DataPipes for stackstac.
- zen3geo.datapipes.StackSTACMosaicker#
alias of
StackSTACMosaickerIterDataPipe
- class zen3geo.datapipes.stackstac.StackSTACMosaickerIterDataPipe(source_datapipe, **kwargs)[source]#
Takes
xarray.DataArray
objects, flattens a dimension by picking the first valid pixel, to yield mosaickedxarray.DataArray
objects (functional name:mosaic_dataarray
).- Parameters:
source_datapipe (IterDataPipe[xarray.DataArray]) – A DataPipe that contains
xarray.DataArray
objects, with e.g. dimensions (“time”, “band”, “y”, “x”).kwargs (Optional) – Extra keyword arguments to pass to
stackstac.mosaic()
.
- Yields:
dataarray (xarray.DataArray) – An
xarray.DataArray
that has been mosaicked with e.g. dimensions (“band”, “y”, “x”).- Raises:
ModuleNotFoundError – If
stackstac
is not installed. See install instructions for stackstac, (e.g. viapip install stackstac
) before using this class.
Example
>>> import pytest >>> import xarray as xr >>> pystac = pytest.importorskip("pystac") >>> stackstac = pytest.importorskip("stackstac") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import StackSTACMosaicker ... >>> # Get list of ALOS DEM tiles to mosaic together later >>> item_urls = [ ... "https://planetarycomputer.microsoft.com/api/stac/v1/collections/alos-dem/items/ALPSMLC30_N022E113_DSM", ... "https://planetarycomputer.microsoft.com/api/stac/v1/collections/alos-dem/items/ALPSMLC30_N022E114_DSM", ... ] >>> stac_items = [pystac.Item.from_file(href=url) for url in item_urls] >>> dataarray = stackstac.stack(items=stac_items) >>> assert dataarray.sizes == {'time': 2, 'band': 1, 'y': 3600, 'x': 7200} ... >>> # Mosaic different tiles in an xarray.DataArray using DataPipe >>> dp = IterableWrapper(iterable=[dataarray]) >>> dp_mosaic = dp.mosaic_dataarray() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_mosaic) >>> dataarray = next(it) >>> print(dataarray.sizes) Frozen({'band': 1, 'y': 3600, 'x': 7200}) >>> print(dataarray.coords) Coordinates: * band (band) <U4 'data' * x (x) float64 113.0 113.0 113.0 113.0 ... 115.0 115.0 115.0 115.0 * y (y) float64 23.0 23.0 23.0 23.0 23.0 ... 22.0 22.0 22.0 22.0 ... >>> print(dataarray.attrs["spec"]) RasterSpec(epsg=4326, bounds=(113.0, 22.0, 115.0, 23.0), resolutions_xy=(0.0002777777777777778, 0.0002777777777777778))
- zen3geo.datapipes.StackSTACStacker#
alias of
StackSTACStackerIterDataPipe
- class zen3geo.datapipes.stackstac.StackSTACStackerIterDataPipe(source_datapipe, **kwargs)[source]#
Bases:
IterDataPipe
[DataArray
]Takes
pystac.Item
objects, reprojects them to the same grid and stacks them along time, to yieldxarray.DataArray
objects (functional name:stack_stac_items
).- Parameters:
source_datapipe (IterDataPipe[pystac.Item]) – A DataPipe that contains
pystac.Item
objects.kwargs (Optional) – Extra keyword arguments to pass to
stackstac.stack()
.
- Yields:
datacube (xarray.DataArray) – An
xarray.DataArray
backed by adask.array.Array
containing the time-series datacube. The dimensions will be (“time”, “band”, “y”, “x”).- Raises:
ModuleNotFoundError – If
stackstac
is not installed. See install instructions for stackstac, (e.g. viapip install stackstac
) before using this class.
Example
>>> import pytest >>> pystac = pytest.importorskip("pystac") >>> stacstac = pytest.importorskip("stackstac") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import StackSTACStacker ... >>> # Stack different bands in a STAC Item using DataPipe >>> item_url: str = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-grd/items/S1A_IW_GRDH_1SDV_20220914T093226_20220914T093252_044999_056053" >>> stac_item = pystac.Item.from_file(href=item_url) >>> dp = IterableWrapper(iterable=[stac_item]) >>> dp_stackstac = dp.stack_stac_items( ... assets=["vh", "vv"], epsg=32652, resolution=10 ... ) ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_stackstac) >>> dataarray = next(it) >>> print(dataarray.sizes) Frozen({'time': 1, 'band': 2, 'y': 20686, 'x': 28043}) >>> print(dataarray.coords) Coordinates: * time (time) datetime64[ns] 2022-09-14T0... id (time) <U62 'S1A_IW_GRDH_1SDV_2022... * band (band) <U2 'vh' 'vv' * x (x) float64 1.354e+05 ... 4.158e+05 * y (y) float64 4.305e+06 ... 4.098e+06 ... >>> print(dataarray.attrs["spec"]) RasterSpec(epsg=32652, bounds=(135370, 4098080, 415800, 4304940), resolutions_xy=(10, 10))
Xbatcher#
DataPipes for xbatcher.
- zen3geo.datapipes.XbatcherSlicer#
alias of
XbatcherSlicerIterDataPipe
- class zen3geo.datapipes.xbatcher.XbatcherSlicerIterDataPipe(source_datapipe, input_dims, **kwargs)[source]#
Bases:
IterDataPipe
[Union
[DataArray
,Dataset
]]Takes an
xarray.DataArray
orxarray.Dataset
and creates a sliced window view (also known as a chip or tile) of the n-dimensional array (functional name:slice_with_xbatcher
).- Parameters:
source_datapipe (IterDataPipe[xarray.DataArray]) – A DataPipe that contains
xarray.DataArray
orxarray.Dataset
objects.input_dims (dict) – A dictionary specifying the size of the inputs in each dimension to slice along, e.g.
{'lon': 64, 'lat': 64}
. These are the dimensions the machine learning library will see. All other dimensions will be stacked into one dimension calledbatch
.kwargs (Optional) – Extra keyword arguments to pass to
xbatcher.BatchGenerator
.
- Yields:
chip (xarray.DataArray) – An
xarray.DataArray
orxarray.Dataset
object containing the sliced raster data, with the size/shape defined by theinput_dims
parameter.- Raises:
ModuleNotFoundError – If
xbatcher
is not installed. Follow install instructions for xbatcher before using this class.
Example
>>> import pytest >>> import numpy as np >>> import xarray as xr >>> xbatcher = pytest.importorskip("xbatcher") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import XbatcherSlicer ... >>> # Sliced window view of xarray.DataArray using DataPipe >>> dataarray: xr.DataArray = xr.DataArray( ... data=np.ones(shape=(3, 64, 64)), ... name="foo", ... dims=["band", "y", "x"] ... ) >>> dp = IterableWrapper(iterable=[dataarray]) >>> dp_xbatcher = dp.slice_with_xbatcher(input_dims={"y": 2, "x": 2}) ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_xbatcher) >>> dataarray_chip = next(it) >>> dataarray_chip <xarray.DataArray 'foo' (band: 3, y: 2, x: 2)> array([[[1., 1.], [1., 1.]], [[1., 1.], [1., 1.]], [[1., 1.], [1., 1.]]]) Dimensions without coordinates: band, y, x
XpySTAC#
DataPipes for xpystac.
- zen3geo.datapipes.XpySTACAssetReader#
alias of
XpySTACAssetReaderIterDataPipe
- class zen3geo.datapipes.xpystac.XpySTACAssetReaderIterDataPipe(source_datapipe, engine='stac', **kwargs)[source]#
Bases:
IterDataPipe
[StreamWrapper
]Takes a
pystac.Asset
object containing n-dimensional data (e.g. Zarr, NetCDF, Cloud-Optimized GeoTIFF, etc) from local disk or URLs (as long as they can be read by xpystac) and yieldsxarray.Dataset
objects (functional name:read_from_xpystac
).Based on pytorch/data
- Parameters:
source_datapipe (IterDataPipe[pystac.Asset]) – A DataPipe that contains
pystac.Asset
objects to n-dimensional files such as Zarr, NetCDF, Cloud-Optimized GeoTIFF, etc.engine (str or xarray.backends.BackendEntrypoint) – Engine to use when reading files. If not provided, the default engine will be the “stac” backend from
xpystac
. Alternatively, setengine=None
to letxarray
choose the default engine based on available dependencies, with a preference for “netcdf4”. See alsoxarray.open_dataset()
for details about other engine options.kwargs (Optional) – Extra keyword arguments to pass to
xarray.open_dataset()
.
- Yields:
stream_obj (xarray.Dataset) – An
xarray.Dataset
object containing the n-dimensional data.- Raises:
ModuleNotFoundError – If
xpystac
is not installed. See install instructions for xpystac, (e.g. viapip install xpystac
) before using this class.
Example
>>> import pytest >>> pystac = pytest.importorskip("pystac") >>> xpystac = pytest.importorskip("xpystac") >>> zarr = pytest.importorskip("zarr") ... >>> from torchdata.datapipes.iter import IterableWrapper >>> from zen3geo.datapipes import XpySTACAssetReader ... >>> # Read in STAC Asset using DataPipe >>> collection_url: str = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/nasa-nex-gddp-cmip6" >>> asset: pystac.Asset = pystac.Collection.from_file(href=collection_url).assets[ ... "ACCESS-CM2.historical" ... ] >>> dp = IterableWrapper(iterable=[asset]) >>> dp_xpystac = dp.read_from_xpystac() ... >>> # Loop or iterate over the DataPipe stream >>> it = iter(dp_xpystac) >>> dataset = next(it) >>> dataset.sizes Frozen({'time': 23741, 'lat': 600, 'lon': 1440}) >>> print(dataset.data_vars) Data variables: hurs (time, lat, lon) float32 ... huss (time, lat, lon) float32 ... pr (time, lat, lon) float32 ... rlds (time, lat, lon) float32 ... rsds (time, lat, lon) float32 ... sfcWind (time, lat, lon) float32 ... tas (time, lat, lon) float32 ... tasmax (time, lat, lon) float32 ... tasmin (time, lat, lon) float32 ... >>> dataset.attrs {'Conventions': 'CF-1.7', 'activity': 'NEX-GDDP-CMIP6', 'cmip6_institution_id': 'CSIRO-ARCCSS', 'cmip6_license': 'CC-BY-SA 4.0', 'cmip6_source_id': 'ACCESS-CM2', ... 'history': '2021-10-04T13:59:21.654137+00:00: install global attributes', 'institution': 'NASA Earth Exchange, NASA Ames Research Center, ... 'product': 'output', 'realm': 'atmos', 'references': 'BCSD method: Thrasher et al., 2012, ... 'resolution_id': '0.25 degree', 'scenario': 'historical', 'source': 'BCSD', 'title': 'ACCESS-CM2, r1i1p1f1, historical, global downscaled CMIP6 ... 'tracking_id': '16d27564-470f-41ea-8077-f4cc3efa5bfe', 'variant_label': 'r1i1p1f1', 'version': '1.0'}