Vector segmentation masks#
Clouds float by, water flows on; in movement there is no grasping, in Chan there is no settling
For ๐งโ๐ซ supervised machine learning, labels ๐ท๏ธ are needed in addition to the input image ๐ผ๏ธ. Here, weโll step through an example workflow on matching vector ๐ label data (points, lines, polygons) to ๐ฐ๏ธ Earth Observation data inputs. Specifically, this tutorial will cover:
Reading shapefiles ๐ directly from the web via pyogrio
Rasterizing vector polygons from a
geopandas.GeoDataFrame
to anxarray.DataArray
Pairing ๐ฐ๏ธ satellite images with the rasterized label masks and feeding them into a DataLoader
๐ Getting started#
These are the tools ๐ ๏ธ youโll need.
import matplotlib.pyplot as plt
import numpy as np
import planetary_computer
import pyogrio
import pystac
import torch
import torchdata
import xarray as xr
import zen3geo
0๏ธโฃ Find cloud-hosted raster and vector data โณ#
In this case study, weโll look at the flood water extent over Johor, Malaysia ๐ฒ๐พ on 15 Dec 2019 that were digitized by ๐บ๐ณ UNITAR-UNOSATโs rapid mapping service over Synthetic Aperture Radar (SAR) ๐ฐ๏ธ images. Specifically, weโll be using the ๐ช๐บ Sentinel-1 Ground Range Detected (GRD) productโs VV polarization channel.
๐ Links:
To start, letโs get the ๐ฐ๏ธ satellite scene weโll be using for this tutorial.
item_url = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-grd/items/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955"
# Load the individual item metadata and sign the assets
item = pystac.Item.from_file(item_url)
signed_item = planetary_computer.sign(item)
signed_item
Item: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955
id: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
bbox: [102.75135746, 0.86301199, 105.26988478, 2.84097401] |
datetime: 2019-12-15T22:48:10.319601Z |
platform: SENTINEL-1A |
s1:shape: [25349, 16815] |
end_datetime: 2019-12-15 22:48:22.818436+00:00 |
constellation: Sentinel-1 |
s1:resolution: high |
s1:datatake_id: 227669 |
start_datetime: 2019-12-15 22:47:57.820767+00:00 |
s1:orbit_source: RESORB |
s1:slice_number: 3 |
s1:total_slices: 8 |
sar:looks_range: 5 |
sat:orbit_state: descending |
sar:product_type: GRD |
sar:looks_azimuth: 1 |
sar:polarizations: ['VV', 'VH'] |
sar:frequency_band: C |
sat:absolute_orbit: 30365 |
sat:relative_orbit: 18 |
s1:processing_level: 1 |
sar:instrument_mode: IW |
sar:center_frequency: 5.405 |
sar:resolution_range: 20 |
s1:product_timeliness: Fast-24h |
sar:resolution_azimuth: 22 |
sar:pixel_spacing_range: 10 |
sar:observation_direction: right |
sar:pixel_spacing_azimuth: 10 |
sar:looks_equivalent_number: 4.4 |
s1:instrument_configuration_ID: 6 |
sat:platform_international_designator: 2014-016A |
STAC Extensions
https://stac-extensions.github.io/sar/v1.0.0/schema.json |
https://stac-extensions.github.io/sat/v1.0.0/schema.json |
https://stac-extensions.github.io/eo/v1.0.0/schema.json |
Assets
Asset: VH: vertical transmit, horizontal receive
href: https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2019/12/15/IW/DV/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955_23D1/measurement/iw-vh.tiff?st=2023-06-02T02%3A40%3A06Z&se=2023-06-04T02%3A40%3A06Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-06-02T18%3A01%3A44Z&ske=2023-06-09T18%3A01%3A44Z&sks=b&skv=2021-06-08&sig=b6j0GCoN5kzzIYc0D3XnP/T7JgJ4em2xfotYIePJU8E%3D |
type: image/tiff; application=geotiff; profile=cloud-optimized |
title: VH: vertical transmit, horizontal receive |
description: Amplitude of signal transmitted with vertical polarization and received with horizontal polarization with radiometric terrain correction applied. |
roles: ['data'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
Asset: VV: vertical transmit, vertical receive
href: https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2019/12/15/IW/DV/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955_23D1/measurement/iw-vv.tiff?st=2023-06-02T02%3A40%3A06Z&se=2023-06-04T02%3A40%3A06Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-06-02T18%3A01%3A44Z&ske=2023-06-09T18%3A01%3A44Z&sks=b&skv=2021-06-08&sig=b6j0GCoN5kzzIYc0D3XnP/T7JgJ4em2xfotYIePJU8E%3D |
type: image/tiff; application=geotiff; profile=cloud-optimized |
title: VV: vertical transmit, vertical receive |
description: Amplitude of signal transmitted with vertical polarization and received with vertical polarization with radiometric terrain correction applied. |
roles: ['data'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
Asset: Preview Image
href: https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2019/12/15/IW/DV/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955_23D1/preview/quick-look.png?st=2023-06-02T02%3A40%3A06Z&se=2023-06-04T02%3A40%3A06Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-06-02T18%3A01%3A44Z&ske=2023-06-09T18%3A01%3A44Z&sks=b&skv=2021-06-08&sig=b6j0GCoN5kzzIYc0D3XnP/T7JgJ4em2xfotYIePJU8E%3D |
type: image/png |
title: Preview Image |
description: An averaged, decimated preview image in PNG format. Single polarisation products are represented with a grey scale image. Dual polarisation products are represented by a single composite colour image in RGB with the red channel (R) representing the co-polarisation VV or HH), the green channel (G) represents the cross-polarisation (VH or HV) and the blue channel (B) represents the ratio of the cross an co-polarisations. |
roles: ['thumbnail'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
Asset: Manifest File
href: https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2019/12/15/IW/DV/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955_23D1/manifest.safe?st=2023-06-02T02%3A40%3A06Z&se=2023-06-04T02%3A40%3A06Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-06-02T18%3A01%3A44Z&ske=2023-06-09T18%3A01%3A44Z&sks=b&skv=2021-06-08&sig=b6j0GCoN5kzzIYc0D3XnP/T7JgJ4em2xfotYIePJU8E%3D |
type: application/xml |
title: Manifest File |
description: General product metadata in XML format. Contains a high-level textual description of the product and references to all of product's components, the product metadata, including the product identification and the resource references, and references to the physical location of each component file contained in the product. |
roles: ['metadata'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
Asset: Noise Schema
href: https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2019/12/15/IW/DV/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955_23D1/annotation/calibration/noise-iw-vh.xml?st=2023-06-02T02%3A40%3A06Z&se=2023-06-04T02%3A40%3A06Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-06-02T18%3A01%3A44Z&ske=2023-06-09T18%3A01%3A44Z&sks=b&skv=2021-06-08&sig=b6j0GCoN5kzzIYc0D3XnP/T7JgJ4em2xfotYIePJU8E%3D |
type: application/xml |
title: Noise Schema |
description: Estimated thermal noise look-up tables |
roles: ['metadata'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
Asset: Noise Schema
href: https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2019/12/15/IW/DV/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955_23D1/annotation/calibration/noise-iw-vv.xml?st=2023-06-02T02%3A40%3A06Z&se=2023-06-04T02%3A40%3A06Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-06-02T18%3A01%3A44Z&ske=2023-06-09T18%3A01%3A44Z&sks=b&skv=2021-06-08&sig=b6j0GCoN5kzzIYc0D3XnP/T7JgJ4em2xfotYIePJU8E%3D |
type: application/xml |
title: Noise Schema |
description: Estimated thermal noise look-up tables |
roles: ['metadata'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
Asset: Product Schema
href: https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2019/12/15/IW/DV/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955_23D1/annotation/iw-vh.xml?st=2023-06-02T02%3A40%3A06Z&se=2023-06-04T02%3A40%3A06Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-06-02T18%3A01%3A44Z&ske=2023-06-09T18%3A01%3A44Z&sks=b&skv=2021-06-08&sig=b6j0GCoN5kzzIYc0D3XnP/T7JgJ4em2xfotYIePJU8E%3D |
type: application/xml |
title: Product Schema |
description: Describes the main characteristics corresponding to the band: state of the platform during acquisition, image properties, Doppler information, geographic location, etc. |
roles: ['metadata'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
Asset: Product Schema
href: https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2019/12/15/IW/DV/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955_23D1/annotation/iw-vv.xml?st=2023-06-02T02%3A40%3A06Z&se=2023-06-04T02%3A40%3A06Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-06-02T18%3A01%3A44Z&ske=2023-06-09T18%3A01%3A44Z&sks=b&skv=2021-06-08&sig=b6j0GCoN5kzzIYc0D3XnP/T7JgJ4em2xfotYIePJU8E%3D |
type: application/xml |
title: Product Schema |
description: Describes the main characteristics corresponding to the band: state of the platform during acquisition, image properties, Doppler information, geographic location, etc. |
roles: ['metadata'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
Asset: Calibration Schema
href: https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2019/12/15/IW/DV/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955_23D1/annotation/calibration/calibration-iw-vh.xml?st=2023-06-02T02%3A40%3A06Z&se=2023-06-04T02%3A40%3A06Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-06-02T18%3A01%3A44Z&ske=2023-06-09T18%3A01%3A44Z&sks=b&skv=2021-06-08&sig=b6j0GCoN5kzzIYc0D3XnP/T7JgJ4em2xfotYIePJU8E%3D |
type: application/xml |
title: Calibration Schema |
description: Calibration metadata including calibration information and the beta nought, sigma nought, gamma and digital number look-up tables that can be used for absolute product calibration. |
roles: ['metadata'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
Asset: Calibration Schema
href: https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2019/12/15/IW/DV/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955_23D1/annotation/calibration/calibration-iw-vv.xml?st=2023-06-02T02%3A40%3A06Z&se=2023-06-04T02%3A40%3A06Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-06-02T18%3A01%3A44Z&ske=2023-06-09T18%3A01%3A44Z&sks=b&skv=2021-06-08&sig=b6j0GCoN5kzzIYc0D3XnP/T7JgJ4em2xfotYIePJU8E%3D |
type: application/xml |
title: Calibration Schema |
description: Calibration metadata including calibration information and the beta nought, sigma nought, gamma and digital number look-up tables that can be used for absolute product calibration. |
roles: ['metadata'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
Asset: TileJSON with default rendering
href: https://planetarycomputer.microsoft.com/api/data/v1/item/tilejson.json?collection=sentinel-1-grd&item=S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955&assets=vv&assets=vh&expression=vv%3Bvh%3Bvv%2Fvh&rescale=0%2C600&rescale=0%2C270&rescale=0%2C9&asset_as_band=True&tile_format=png&format=png |
type: application/json |
title: TileJSON with default rendering |
roles: ['tiles'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
Asset: Rendered preview
href: https://planetarycomputer.microsoft.com/api/data/v1/item/preview.png?collection=sentinel-1-grd&item=S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955&assets=vv&assets=vh&expression=vv%3Bvh%3Bvv%2Fvh&rescale=0%2C600&rescale=0%2C270&rescale=0%2C9&asset_as_band=True&tile_format=png&format=png |
type: image/png |
title: Rendered preview |
roles: ['overview'] |
owner: S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
rel: preview |
Links
Link:
rel: self |
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-grd/items/S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
type: application/json |
Link:
rel: collection |
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-grd |
type: application/json |
Link:
rel: parent |
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-grd |
type: application/json |
Link:
rel: root |
href: https://planetarycomputer.microsoft.com/api/stac/v1/ |
type: application/json |
Link:
rel: license |
href: https://sentinel.esa.int/documents/247904/690755/Sentinel_Data_Legal_Notice |
Link: Map of item
rel: preview |
href: https://planetarycomputer.microsoft.com/api/data/v1/item/map?collection=sentinel-1-grd&item=S1A_IW_GRDH_1SDV_20191215T224757_20191215T224822_030365_037955 |
type: text/html |
title: Map of item |
This is how the Sentinel-1 ๐ฉป image looks like over Johor in Peninsular Malaysia on 15 Dec 2019.
Load and reproject image data ๐#
To keep things simple, weโll load just the VV channel into a DataPipe via
zen3geo.datapipes.RioXarrayReader
(functional name:
read_from_rioxarray
) ๐.
url = signed_item.assets["vv"].href
dp = torchdata.datapipes.iter.IterableWrapper(iterable=[url])
# Reading lower resolution grid using overview_level=3
dp_rioxarray = dp.read_from_rioxarray(overview_level=3)
dp_rioxarray
RioXarrayReaderIterDataPipe
The Sentinel-1 image from Planetary Computer comes in longitude/latitude ๐ geographic coordinates by default (OGC:CRS84). To make the pixels more equal ๐ฒ area, we can project it to a ๐ local projected coordinate system instead.
def reproject_to_local_utm(dataarray: xr.DataArray, resolution: float=80.0) -> xr.DataArray:
"""
Reproject an xarray.DataArray grid from OGC:CRS84 to a local UTM coordinate
reference system.
"""
# Estimate UTM coordinate reference from a single pixel
pixel = dataarray.isel(y=slice(0, 1), x=slice(0,1))
new_crs = dataarray.rio.reproject(dst_crs="OGC:CRS84").rio.estimate_utm_crs()
return dataarray.rio.reproject(dst_crs=new_crs, resolution=resolution)
dp_reprojected = dp_rioxarray.map(fn=reproject_to_local_utm)
Note
Universal Transverse Mercator (UTM) isnโt actually an equal-area projection system. However, Sentinel-1 ๐ฐ๏ธ satellite scenes from Copernicus are usually distributed in a UTM coordinate reference system, and UTM is typically a close enough ๐ค approximation to the local geographic area, or at least it wonโt matter much when weโre looking at spatial resolutions over several 10s of metres ๐.
Hint
For those wondering what OGC:CRS84
is, it is the longitude/latitude version
of EPSG:4326
๐ (latitude/longitude). I.e., itโs a
matter of axis order, with OGC:CRS84
being x/y and EPSG:4326
being y/x.
๐ References:
Transform and visualize raster data ๐#
Letโs visualize ๐ the Sentinel-1 image, but before that, weโll transform ๐ the VV data from linear to decibel scale.
def linear_to_decibel(dataarray: xr.DataArray) -> xr.DataArray:
"""
Transforming the input xarray.DataArray's VV or VH values from linear to
decibel scale using the formula ``10 * log_10(x)``.
"""
# Mask out areas with 0 so that np.log10 is not undefined
da_linear = dataarray.where(cond=dataarray != 0)
da_decibel = 10 * np.log10(da_linear)
return da_decibel
dp_decibel = dp_reprojected.map(fn=linear_to_decibel)
dp_decibel
MapperIterDataPipe
As an aside, weโll be using the Sentinel-1 image datapipe twice later, once as
a template to create a blank canvas ๐๏ธ, and another time by itself ๐ช. This
requires forking ๐ด the DataPipe into two branches, which can be achieved using
torchdata.datapipes.iter.Forker
(functional name: fork
).
dp_decibel_canvas, dp_decibel_image = dp_decibel.fork(num_instances=2)
dp_decibel_canvas, dp_decibel_image
(_ChildDataPipe, _ChildDataPipe)
Now to visualize the transformed Sentinel-1 image ๐ผ๏ธ. Letโs zoom in ๐ญ to one of the analysis extent areas weโll be working on later.
it = iter(dp_decibel_image)
dataarray = next(it)
da_clip = dataarray.rio.clip_box(minx=371483, miny=190459, maxx=409684, maxy=229474)
da_clip.isel(band=0).plot.imshow(figsize=(11.5, 9), cmap="Blues_r", vmin=18, vmax=26)
<matplotlib.image.AxesImage at 0x7f91c72a2d90>

Notice how the darker blue areas ๐ต tend to correlate more with water features like the meandering rivers and the ๐ sea on the NorthEast. This is because the SAR ๐ฐ๏ธ signal which is side looking reflects off flat water bodies like a mirror ๐ช, with little energy getting reflected ๐ back directly to the sensor (hence why it looks darker โซ).
Load and visualize cloud-hosted vector files ๐ #
Letโs now load some vector data from the web ๐ธ๏ธ. These are polygons of the segmented ๐ water extent digitized by UNOSATโs AI Based Rapid Mapping Service. Weโll be converting these vector polygons to ๐ raster masks later.
๐ Links:
# https://gdal.org/user/virtual_file_systems.html#vsizip-zip-archives
shape_url = "/vsizip/vsicurl/https://unosat-maps.web.cern.ch/MY/FL20191217MYS/FL20191217MYS_SHP.zip/ST1_20191215_WaterExtent_Johor_AOI2.shp"
This is a shapefile containing ๐ท polygons of the mapped water extent. Letโs
put it into a DataPipe called zen3geo.datapipes.PyogrioReader
(functional name: read_from_pyogrio
).
dp_shapes = torchdata.datapipes.iter.IterableWrapper(iterable=[shape_url])
dp_pyogrio = dp_shapes.read_from_pyogrio()
dp_pyogrio
PyogrioReaderIterDataPipe
This will take care of loading the shapefile into a
geopandas.GeoDataFrame
object. Letโs take a look at the data table
๐ to see what attributes are inside.
it = iter(dp_pyogrio)
geodataframe = next(it)
geodataframe.dropna(axis="columns")
Water_Clas | Sensor_ID | Sensor_Dat | Area_m2 | Area_ha | StaffID | EventCode | geometry | |
---|---|---|---|---|---|---|---|---|
0 | Flood Water | Sentinel-1 | 2019-12-15 | 2.178154e+07 | 2178.154127 | JT | FL20191217MYS | MULTIPOLYGON (((104.01442 1.72298, 104.01335 1... |
Cool, and we can also visualize the polygons ๐ท on a 2D map. To align the
coordinates with the ๐ฐ๏ธ Sentinel-1 image above, weโll first use
geopandas.GeoDataFrame.to_crs()
to reproject the vector from ๐
EPSG:9707 (WGS 84 + EGM96 height, latitude/longitude) to ๐ EPSG:32648 (UTM
Zone 48N).
print(f"Original bounds in EPSG:9707:\n{geodataframe.bounds}")
gdf = geodataframe.to_crs(crs="EPSG:32648")
print(f"New bounds in EPSG:32648:\n{gdf.bounds}")
Original bounds in EPSG:9707:
minx miny maxx maxy
0 103.844605 1.722825 104.187997 2.075693
New bounds in EPSG:32648:
minx miny maxx maxy
0 371483.597954 190459.328366 409683.869979 229473.310892
Plot it with geopandas.GeoDataFrame.plot()
. This vector map ๐บ๏ธ should
correspond to the zoomed in Sentinel-1 image plotted earlier above.
gdf.plot(figsize=(11.5, 9))
<Axes: >

Tip
Make sure to understand your raster and vector datasets well first! Open the files up in your favourite ๐ Geographic Information System (GIS) tool, see how they actually look like spatially. Then youโll have a better idea to decide on how to create your data pipeline. The zen3geo way puts you as the Master ๐ง in control.
1๏ธโฃ Create a canvas to paint on ๐จ#
In this section, weโll work on converting the flood water ๐ polygons above from a ๐ฉ vector to a ๐ raster format, i.e. rasterization. This will be done in two steps ๐ถ:
Defining a blank canvas ๐๏ธ
Paint the polygons onto this blank canvas ๐งโ๐จ
For this, weโll be using tools from zen3geo.datapipes.datashader()
.
Letโs see how this can be done.
Blank canvas from template raster ๐ผ๏ธ#
A canvas represents a 2D area with a height and a width ๐. For us, weโll be
using a datashader.Canvas
, which also defines the range of y-values
(ymin to ymax) and x-values (xmin to xmax), essentially coordinates for
every unit ๐พ height and ๐ฝ width.
Since we already have a Sentinel-1 ๐ฐ๏ธ raster grid with defined height/width
and y/x coordinates, letโs use it as a ๐ template to define our canvas. This
is done via zen3geo.datapipes.XarrayCanvas
(functional name:
canvas_from_xarray
).
dp_canvas = dp_decibel_canvas.canvas_from_xarray()
dp_canvas
XarrayCanvasIterDataPipe
Cool, and hereโs a quick inspection ๐ of the canvas dimensions and metadata.
it = iter(dp_canvas)
canvas = next(it)
print(f"Canvas height: {canvas.plot_height}, width: {canvas.plot_width}")
print(f"Y-range: {canvas.y_range}")
print(f"X-range: {canvas.x_range}")
print(f"Coordinate reference system: {canvas.crs}")
Canvas height: 2743, width: 3538
Y-range: (95200.47858355992, 314640.4785835599)
X-range: (247773.19092160053, 530813.1909216006)
Coordinate reference system: EPSG:32648
This information should match the template Sentinel-1 dataarray ๐.
print(f"Dimensions: {dict(dataarray.sizes)}")
print(f"Affine transform: {dataarray.rio.transform()}")
print(f"Bounding box: {dataarray.rio.bounds()}")
print(f"Coordinate reference system: {dataarray.rio.crs}")
Dimensions: {'band': 1, 'y': 2743, 'x': 3538}
Affine transform: | 80.00, 0.00, 247773.19|
| 0.00,-80.00, 314640.48|
| 0.00, 0.00, 1.00|
Bounding box: (247773.19092160053, 95200.47858355992, 530813.1909216006, 314640.4785835599)
Coordinate reference system: EPSG:32648
Rasterize vector polygons onto canvas ๐๏ธ#
Nowโs the time to paint or rasterize the
vector geopandas.GeoDataFrame
polygons ๐ท onto the blank
datashader.Canvas
! This would enable us to have a direct pixel-wise
X -> Y mapping โ๏ธ between the Sentinel-1 image (X) and target flood label (Y).
The vector polygons can be rasterized or painted ๐๏ธ onto the template canvas
using zen3geo.datapipes.DatashaderRasterizer
(functional name:
rasterize_with_datashader
).
dp_datashader = dp_canvas.rasterize_with_datashader(vector_datapipe=dp_pyogrio)
dp_datashader
DatashaderRasterizerIterDataPipe
This will turn the vector geopandas.GeoDataFrame
into a
raster xarray.DataArray
grid, with the spatial coordinates and
bounds matching exactly with the template Sentinel-1 image ๐.
Note
Since we have just one Sentinel-1 ๐ฐ๏ธ image and one raster ๐ง flood
mask, we have an easy 1:1 mapping. There are two other scenarios supported by
zen3geo.datapipes.DatashaderRasterizer
:
N:1 - Many
datashader.Canvas
objects to one vectorgeopandas.GeoDataFrame
. The single vector geodataframe will be broadcasted to match the length of the canvas list. This is useful for situations when you have a ๐ โglobalโ vector database that you want to pair with multiple ๐ฐ๏ธ satellite images.N:N - Many
datashader.Canvas
objects to many vectorgeopandas.GeoDataFrame
objects. In this case, the list of grids must โ have the same length as the list of vector geodataframes. E.g. if you have 5 grids, there must also be 5 vector files. This is so that a 1:1 pairing can be done, useful when each raster tile ๐ฝ has its own associated vector annotation.
See also
For more details on how rasterization of polygons work behind the scenes ๐ฆ, check out Datashaderโs documentation on:
The datashader pipeline (especially the section on Aggregation).
2๏ธโฃ Combine and conquer โ๏ธ#
So far, weโve got two datapipes that should be ๐งโ๐คโ๐ง paired up in an X -> Y manner:
The pre-processed Sentinel-1 ๐ raster image in
dp_decibel_image
The rasterized ๐ง flood segmentation masks in
dp_datashader
One way to get these two pieces in a Machine Learning ready chip format is via a stack, slice and split โข๏ธ approach. Think of it like a sandwich ๐ฅช, we first stack the bread ๐ and lettuce ๐ฅฌ, and then slice the pieces ๐ through the layers once. Ok, that was a bad analogy, letโs just stick with tensors ๐คช.
Stacking the raster layers ๐ฅ#
Each of our ๐ raster inputs are xarray.DataArray
objects with the
same spatial resolution and extent ๐ช, so these can be stacked into an
xarray.Dataset
with multiple data variables. First, weโll zip ๐ค
the two datapipes together using torchdata.datapipes.iter.Zipper
(functional name: zip
)
dp_zip = dp_decibel_image.zip(dp_datashader)
dp_zip
ZipperIterDataPipe
This will result in a DataPipe where each item is a tuple of (X, Y) pairs ๐งโ๐คโ๐ง.
Just to illustrate what weโve done so far, we can use
torchdata.datapipes.utils.to_graph
to visualize the data pipeline
โ๏ธ.
torchdata.datapipes.utils.to_graph(dp=dp_zip)
Next, letโs combine ๐๏ธ the two (X, Y) xarray.DataArray
objects in
the tuple into an xarray.Dataset
using
torchdata.datapipes.iter.Collator
(functional name: collate
).
Weโll also โ๏ธ clip the dataset to a bounding box area where the target water
mask has no 0 or NaN values.
def xr_collate_fn(image_and_mask: tuple) -> xr.Dataset:
"""
Combine a pair of xarray.DataArray (image, mask) inputs into an
xarray.Dataset with two data variables named 'image' and 'mask'.
"""
# Turn 2 xr.DataArray objects into 1 xr.Dataset with multiple data vars
image, mask = image_and_mask
dataset: xr.Dataset = xr.merge(
objects=[image.isel(band=0).rename("image"), mask.rename("mask")],
join="override",
)
# Clip dataset to bounding box extent of where labels are
mask_extent: tuple = mask.where(cond=mask == 1, drop=True).rio.bounds()
clipped_dataset: xr.Dataset = dataset.rio.clip_box(*mask_extent)
return clipped_dataset
dp_dataset = dp_zip.collate(collate_fn=xr_collate_fn)
dp_dataset
CollatorIterDataPipe
Double check to see that resulting xarray.Dataset
โs image and mask
looks ok ๐โโ๏ธ.
it = iter(dp_dataset)
dataset = next(it)
# Create subplot with VV image on the left and Water mask on the right
fig, axs = plt.subplots(ncols=2, figsize=(11.5, 4.5), sharey=True)
dataset.image.plot.imshow(ax=axs[0], cmap="Blues_r")
axs[0].set_title("Sentinel-1 VV channel")
dataset.mask.plot.imshow(ax=axs[1], cmap="Blues")
axs[1].set_title("Water mask")
plt.show()
/home/docs/checkouts/readthedocs.org/user_builds/zen3geo/envs/latest/lib/python3.11/site-packages/torch/utils/data/datapipes/iter/combining.py:297: UserWarning: Some child DataPipes are not exhausted when __iter__ is called. We are resetting the buffer and each child DataPipe will read from the start again.
warnings.warn("Some child DataPipes are not exhausted when __iter__ is called. We are resetting "

Slice into chips and turn into tensors ๐ก๏ธ#
To cut ๐ช the xarray.Dataset
into 128x128 sized chips, weโll use
zen3geo.datapipes.XbatcherSlicer
(functional name:
slice_with_xbatcher
). Refer to Chipping and batching data if you need a ๐งโ๐ refresher.
dp_xbatcher = dp_dataset.slice_with_xbatcher(input_dims={"y": 128, "x": 128})
dp_xbatcher
XbatcherSlicerIterDataPipe
Next step is to convert the 128x128 chips into a torch.Tensor
via
torchdata.datapipes.iter.Mapper
(functional name: map
). The ๐ฐ๏ธ
Sentinel-1 image and ๐ง water mask will be split out at this point too.
def dataset_to_tensors(chip: xr.Dataset) -> (torch.Tensor, torch.Tensor):
"""
Converts an xarray.Dataset into to two torch.Tensor objects, the first one
being the satellite image, and the second one being the target mask.
"""
image: torch.Tensor = torch.as_tensor(chip.image.data)
mask: torch.Tensor = torch.as_tensor(chip.mask.data.astype("uint8"))
return image, mask
dp_map = dp_xbatcher.map(fn=dataset_to_tensors)
dp_map
MapperIterDataPipe
At this point, we could do some batching and collating, but weโll point you again to Chipping and batching data to figure it out ๐. Letโs take a look at a graph of the complete data pipeline.
torchdata.datapipes.utils.to_graph(dp=dp_map)
Sweet, time for the final step โฉ.
Into a DataLoader ๐๏ธ#
Pass the DataPipe into torch.utils.data.DataLoader
๐คพ!
dataloader = torch.utils.data.DataLoader(dataset=dp_map)
for i, batch in enumerate(dataloader):
image, mask = batch
print(f"Batch {i} - image: {image.shape}, mask: {mask.shape}")
/home/docs/checkouts/readthedocs.org/user_builds/zen3geo/envs/latest/lib/python3.11/site-packages/torch/utils/data/datapipes/iter/combining.py:297: UserWarning: Some child DataPipes are not exhausted when __iter__ is called. We are resetting the buffer and each child DataPipe will read from the start again.
warnings.warn("Some child DataPipes are not exhausted when __iter__ is called. We are resetting "
Batch 0 - image: torch.Size([1, 128, 128]), mask: torch.Size([1, 128, 128])
Batch 1 - image: torch.Size([1, 128, 128]), mask: torch.Size([1, 128, 128])
Batch 2 - image: torch.Size([1, 128, 128]), mask: torch.Size([1, 128, 128])
Batch 3 - image: torch.Size([1, 128, 128]), mask: torch.Size([1, 128, 128])
Batch 4 - image: torch.Size([1, 128, 128]), mask: torch.Size([1, 128, 128])
Batch 5 - image: torch.Size([1, 128, 128]), mask: torch.Size([1, 128, 128])
Batch 6 - image: torch.Size([1, 128, 128]), mask: torch.Size([1, 128, 128])
Batch 7 - image: torch.Size([1, 128, 128]), mask: torch.Size([1, 128, 128])
Batch 8 - image: torch.Size([1, 128, 128]), mask: torch.Size([1, 128, 128])
Now go train some flood water detection models ๐๐๐
See also
To learn more about AI-based flood mapping with SAR, check out these resources: