From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow this link.

Working with SWOT Level 2 Water Mask Raster Image Data Product:

In AWS Cloud Version

Authors: Nicholas Tarpinian, PO.DAAC | Catalina Taglialatela (JPL, PO.DAAC)

Summary & Learning Objectives

Notebook showcasing how to work with multiple files from the SWOT Raster Image data product version C (aka 2.0) in the cloud

  • Utilizing the earthaccess Python package. For more information visit: https://nsidc.github.io/earthaccess/
  • Option to query the new dataset based on user’s choice; choosing between two resolutions either by ‘100m’ or ‘250m’.
  • Visualizing multiple raster images on a single map.
  • Stacking multiple raster images and creating a time dimension to analyze over time.
  • Adjusting images based on quality flag

Requirements

1. Compute environment

This tutorial is written to run in the following environment: - AWS instance running in us-west-2: NASA Earthdata Cloud data in S3 can be directly accessed via an s3fs session; this access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region. - This workflow as written works on a 14.8 GB RAM, upto 3.7 CPU cloud compute instance type. Smaller instances tent to crash.

2. Earthdata Login

An Earthdata Login account is required to access data, as well as discover restricted data, from the NASA Earthdata system. Thus, to access NASA data, you need Earthdata Login. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.

Import libraries

import io
import s3fs
import xarray as xr
import numpy as np
from datetime import datetime
from pathlib import Path
import hvplot
import hvplot.xarray 
import earthaccess

Authentication with earthaccess

In this notebook, we will be calling the authentication in the below cell.

auth = earthaccess.login()

Search for SWOT Raster products using earthaccess

Each dataset has its own unique collection concept ID. For the SWOT_L2_HR_Raster_2.0 dataset, we can find the collection ID here.

For this tutorial, we are looking at the Lake Mead Reservoir in the United States.

We used bbox finder to get the exact coordinates for our area of interest.

raster_results = earthaccess.search_data(
    short_name = 'SWOT_L2_HR_RASTER_2.0',
    bounding_box=(-115.112686,35.740939,-114.224167,36.937819),
    temporal =('2024-02-01 12:00:00', '2024-02-01 23:59:59'),
    granule_name = '*_100m_*',   #specify we are interested in the 100m standard raster
    count =200
)
Granules found: 2

Visualizing Multiple Tiles

Let’s now visualize multiple raster tiles that we searched and explore the data.

Utilizing xarray.open_mfdataset which supports the opening of multiple files.

ds = xr.open_mfdataset(earthaccess.open(raster_results), engine='h5netcdf',combine='nested', concat_dim='x')
ds
Opening 2 granules, approx size: 0.07 GB
using endpoint: https://archive.swot.podaac.earthdata.nasa.gov/s3credentials
<xarray.Dataset>
Dimensions:                  (y: 2784, x: 3074)
Coordinates:
  * y                        (y) float64 3.899e+06 3.899e+06 ... 4.177e+06
  * x                        (x) float64 5.391e+05 5.392e+05 ... 7.217e+05
Data variables: (12/39)
    crs                      (x) object b'1' b'1' b'1' b'1' ... b'1' b'1' b'1'
    longitude                (y, x) float64 dask.array<chunksize=(512, 513), meta=np.ndarray>
    latitude                 (y, x) float64 dask.array<chunksize=(512, 513), meta=np.ndarray>
    wse                      (y, x) float32 dask.array<chunksize=(768, 769), meta=np.ndarray>
    wse_qual                 (y, x) float32 dask.array<chunksize=(2784, 1538), meta=np.ndarray>
    wse_qual_bitwise         (y, x) float64 dask.array<chunksize=(768, 769), meta=np.ndarray>
    ...                       ...
    load_tide_fes            (y, x) float32 dask.array<chunksize=(768, 769), meta=np.ndarray>
    load_tide_got            (y, x) float32 dask.array<chunksize=(768, 769), meta=np.ndarray>
    pole_tide                (y, x) float32 dask.array<chunksize=(768, 769), meta=np.ndarray>
    model_dry_tropo_cor      (y, x) float32 dask.array<chunksize=(768, 769), meta=np.ndarray>
    model_wet_tropo_cor      (y, x) float32 dask.array<chunksize=(768, 769), meta=np.ndarray>
    iono_cor_gim_ka          (y, x) float32 dask.array<chunksize=(768, 769), meta=np.ndarray>
Attributes: (12/49)
    Conventions:                   CF-1.7
    title:                         Level 2 KaRIn High Rate Raster Data Product
    source:                        Ka-band radar interferometer
    history:                       2024-02-05T08:37:45Z : Creation
    platform:                      SWOT
    references:                    V1.2.1
    ...                            ...
    x_min:                         539100.0
    x_max:                         692800.0
    y_min:                         4023100.0
    y_max:                         4176900.0
    institution:                   CNES
    product_version:               01
raster_plot = ds.wse.hvplot.quadmesh(x='x', y='y', rasterize=True, title=f'SWOT Raster 100m: Lake Mead Reservoir')
raster_plot.opts(width=700, height=600, colorbar=True)

Creating a Time Series

SWOT Raster product does not include a time dimension, each file is a snapshot in time, but it can be inserted by extracting from the file name.

  1. Expand the time range of your earthaccess search to get an adequate range.
  2. Extract the datetime from the s3 file name then concatenate based on the new time dimension.
time_results = earthaccess.search_data(
    short_name = 'SWOT_L2_HR_RASTER_2.0',
    bounding_box=(-114.502048,36.060175,-114.390983,36.210182),
    temporal =('2024-01-25 00:00:00', '2024-03-04 23:59:59'),
    granule_name = '*_100m_*',
    count =200
)
Granules found: 3
fs_s3 = earthaccess.get_s3fs_session(results=time_results)
# Get links list
raster = []
for g in time_results:
    for l in earthaccess.results.DataGranule.data_links(g, access='direct'):
        raster.append(l)

print(len(raster))
raster
3
['s3://podaac-swot-ops-cumulus-protected/SWOT_L2_HR_Raster_2.0/SWOT_L2_HR_Raster_100m_UTM11S_N_x_x_x_010_205_109F_20240201T075048_20240201T075109_PIC0_01.nc',
 's3://podaac-swot-ops-cumulus-protected/SWOT_L2_HR_Raster_2.0/SWOT_L2_HR_Raster_100m_UTM11S_N_x_x_x_010_496_046F_20240211T170050_20240211T170111_PIC0_01.nc',
 's3://podaac-swot-ops-cumulus-protected/SWOT_L2_HR_Raster_2.0/SWOT_L2_HR_Raster_100m_UTM11S_N_x_x_x_011_205_109F_20240222T043554_20240222T043615_PIC0_01.nc']
def add_time_dimension(ds, file_path):
    # Extract filename from s3 file path
    file_name = file_path.split('/')[-1]
    # Extract date/time string from filename
    date_str = file_name.split('_')[-4][:15]
    # Convert the date string to a datetime object
    time_value = datetime.strptime(date_str, "%Y%m%dT%H%M%S")
    # Assign the time coordinate to the dataset
    ds.coords['time'] = time_value
    return ds
datasets = []
file_names = []

for file_path in raster:
    with fs_s3.open(file_path, 'rb') as file:
        file_bytes = file.read()
    file_obj = io.BytesIO(file_bytes)
    dataset = xr.open_dataset(file_obj, engine='h5netcdf')
    dataset_with_time = add_time_dimension(dataset, file_path)
    datasets.append(dataset_with_time)
    file_names.append(file_path.split('/')[-1])
    dataset.close()
# sorting the time dimension in correct order
datasets.sort(key=lambda ds: ds.time.values)
ds2 = xr.concat(datasets, dim='time')
ds2
<xarray.Dataset>
Dimensions:                  (x: 1549, y: 1549, time: 3)
Coordinates:
  * x                        (x) float64 6.788e+05 6.789e+05 ... 8.336e+05
  * y                        (y) float64 3.9e+06 3.901e+06 ... 4.055e+06
  * time                     (time) datetime64[ns] 2024-02-01T07:50:48 ... 20...
Data variables: (12/39)
    crs                      (time) object b'1' b'1' b'1'
    longitude                (time, y, x) float64 nan nan nan ... nan nan nan
    latitude                 (time, y, x) float64 nan nan nan ... nan nan nan
    wse                      (time, y, x) float32 nan nan nan ... nan nan nan
    wse_qual                 (time, y, x) float32 nan nan nan ... nan nan nan
    wse_qual_bitwise         (time, y, x) float64 nan nan nan ... nan nan nan
    ...                       ...
    load_tide_fes            (time, y, x) float32 nan nan nan ... nan nan nan
    load_tide_got            (time, y, x) float32 nan nan nan ... nan nan nan
    pole_tide                (time, y, x) float32 nan nan nan ... nan nan nan
    model_dry_tropo_cor      (time, y, x) float32 nan nan nan ... nan nan nan
    model_wet_tropo_cor      (time, y, x) float32 nan nan nan ... nan nan nan
    iono_cor_gim_ka          (time, y, x) float32 nan nan nan ... nan nan nan
Attributes: (12/49)
    Conventions:                   CF-1.7
    title:                         Level 2 KaRIn High Rate Raster Data Product
    source:                        Ka-band radar interferometer
    history:                       2024-02-05T12:55:01Z : Creation
    platform:                      SWOT
    references:                    V1.2.1
    ...                            ...
    x_min:                         680100.0
    x_max:                         829300.0
    y_min:                         3903300.0
    y_max:                         4052400.0
    institution:                   CNES
    product_version:               01
timeplot = ds2.wse.hvplot.image(y='y', x='x')
timeplot.opts(width=700, height=500, colorbar=True)

Let’s plot the wse quality flag, wse_qual which ranges 0-3 where 0=good, 1=suspect, 2=degraded, 3=bad (as described when printing variable with xarray).

timeplot = ds2.wse_qual.hvplot.image(y='y', x='x')
timeplot.opts(width=700, height=500, colorbar=True)

Masking a variable with its quaility flag

variable_to_mask = ds2['wse']
mask_variable = ds2['wse_qual']
# Define the condition for masking based on the range of the quaility flag
mask_condition = mask_variable <2

masked_variable = variable_to_mask.where(mask_condition)
# Update the masked variable in the dataset
ds2['wse'] = masked_variable

ds2['wse'].hvplot.image(y='y', x='x').opts(width=700, height=500, colorbar=True)

Our end product is a time series of the data showing only the values where the quality flag is either good (0) or suspect (1).

Appendix: Alternate Plot

# # Alternate plotting with matplotlib
# %matplotlib inline

# import matplotlib.pyplot as plt
# from matplotlib import animation
# from matplotlib.animation import FuncAnimation, PillowWriter
# from IPython.display import display, Image, HTML

# variable_name = 'wse'
# data = ds2[variable_name]

# fig, ax = plt.subplots(figsize=(10, 8))
# fig.set_tight_layout({'rect': [0.01, 0.01, 1.0, 1.0]})

# contour = ax.contourf(data.isel(time=0), cmap='viridis')
# cbar = plt.colorbar(contour)
# cbar.set_label('Water Surface Elevation (meters)', fontsize=14) 
# times = ds2.time.values

# # Function to update the plot for each time step
# def update(frame):
#     ax.clear()
#     contour = ax.contourf(data.isel(time=frame), cmap='viridis')
#     formatted_time = str(times[frame])[:-7]
#     ax.set_title(f'Date: {formatted_time}')
#     ax.set_xlabel('Longitude', fontsize=14)
#     ax.set_ylabel('Latitude', fontsize=14)
#     ax.text(0.5, 1.05, 'SWOT Raster 100M Lake Mead Reservoir', transform=ax.transAxes, ha='center', fontsize=14)
#     return contour,

# # Creating a gif animation
# ani = animation.FuncAnimation(fig, update, repeat=True, frames=len(data['time']), blit=True, interval=3000)

# output = ('time_series.gif')
# ani.save(output, writer='pillow', fps=.5)

# with open(output,'rb') as f:
#     display(Image(data=f.read(), format='gif'))

# plt.close(fig)
# ds2.close()