import glob
import os
import requests
import s3fs
import fiona
import netCDF4 as nc
import h5netcdf
import xarray as xr
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import hvplot.xarray
import earthaccess
from earthaccess import Auth, DataCollections, DataGranules, Store
This is an old version of the notebook, see the latest here.
From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow this link.
SWOT Simulated North American Continent Hydrology Dataset Exploration in the Cloud
Accessing and Visualizing SWOT Simulated Datasets
Requirement:
This tutorial can only be run in an AWS cloud instance running in us-west-2: NASA Earthdata Cloud data in S3 can be directly accessed via earthaccess
python library; this access is limited to requests made within the US West (Oregon) (code: us-west-2
) AWS region.
Learning Objectives:
- Access all 5 products of SWOT HR sample data (archived in NASA Earthdata Cloud) within the AWS cloud, without downloading to local machine
- Visualize accessed data
SWOT Simulated Level 2 North America Continent KaRIn High Rate Version 1 Datasets:
- River Vector Shapefile - SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1
- Lake Vector Shapefile - SWOT_SIMULATED_NA_CONTINENT_L2_HR_LAKESP_V1
- Water Mask Pixel Cloud NetCDF - SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXC_V1
- Water Mask Pixel Cloud Vector Attribute NetCDF - SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXCVEC_V1
- Raster NetCDF - SWOT_SIMULATED_NA_CONTINENT_L2_HR_RASTER_V1
Notebook Author: Cassie Nickles, NASA PO.DAAC (Aug 2022)
Libraries Needed
Earthdata Login
An Earthdata Login account is required to access data, as well as discover restricted data, from the NASA Earthdata system. Thus, to access NASA data, you need Earthdata Login. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up. We use earthaccess to authenticate your login credentials below.
#auth = earthaccess.login(strategy="interactive", persist=True) #if you do not have a netrc created, this line will do so with your credentials
= earthaccess.login(strategy="netrc") auth
We are already authenticated with NASA EDL
Set up an s3fs
session for Direct Access
s3fs
sessions are used for authenticated access to s3 bucket and allows for typical file-system style operations. Below we create session by passing in the data access information.
= earthaccess.get_s3fs_session(daac='PODAAC', provider='POCLOUD') fs_s3
Single File Access
The s3 access link can be found using earthaccess
data search. Since this collection consists of Reach and Node files, we need to extract only the granule for the Reach file. We do this by filtering for the ‘Reach’ title in the data link.
Alternatively, Earthdata Search (see tutorial) can be used to manually search for a single file.
1. River Vector Shapefiles
#retrieves granule from the day we want
= earthaccess.search_data(short_name = 'SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1', temporal = ('2022-08-22 19:24:41', '2022-08-22 19:30:37')) river_results
Granules found: 2
#finds the s3 link of the one granule we want (The collection contains both Reaches and Nodes, but here we want only the Reach)
= []
river_data_urls for g in river_results:
for l in earthaccess.results.DataGranule.data_links(g, access='direct'):
if "Reach" in l:
river_data_urls.append(l)print(river_data_urls[0])
s3://podaac-ops-cumulus-protected/SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1/SWOT_L2_HR_RiverSP_Reach_007_522_NA_20220822T192441_20220822T193037_PGA0_01.zip
The native format for this data is a .zip file, and we want the .shp file within the .zip file, so we will create a Fiona AWS session using the credentials from setting up the s3fs session above to access the shapefiles within the shp files. If we don’t do this, the alternative would be to download the data to the cloud environment and extract the .zip file there.
=fiona.session.AWSSession(
fiona_session=fs_s3.storage_options["key"],
aws_access_key_id=fs_s3.storage_options["secret"],
aws_secret_access_key=fs_s3.storage_options["token"]
aws_session_token )
# We use the zip+ prefix so fiona knows that we are operating on a zip file
= f"zip+{river_data_urls[0]}"
river_shp_url
with fiona.Env(session=fiona_session):
= gpd.read_file(river_shp_url)
SWOT_HR_shp1
#view the attribute table
SWOT_HR_shp1
reach_id | time | time_tai | time_str | p_lat | p_lon | river_name | wse | wse_u | wse_r_u | ... | p_width | p_wid_var | p_n_nodes | p_dist_out | p_length | p_maf | p_dam_id | p_n_ch_max | p_n_ch_mod | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 71224300241 | 7.145115e+08 | 7.145114e+08 | 2022-08-22T19:2441Z | 49.364818 | -94.879318 | no_data | 3.472248e+01 | -1.000000e+12 | 1.511000e-02 | ... | 7294.5 | 3.265803e+06 | 15 | 390935.258 | 3008.959150 | -1.000000e+12 | 0 | 15 | 8 | LINESTRING (-94.86483 49.37485, -94.86515 49.3... |
1 | 71224300253 | 7.145115e+08 | 7.145115e+08 | 2022-08-22T19:2446Z | 49.049486 | -94.899554 | no_data | 3.439994e+01 | -1.000000e+12 | 7.600000e-03 | ... | 394.5 | 7.876447e+06 | 42 | 444613.943 | 8411.845753 | -1.000000e+12 | 0 | 8 | 1 | LINESTRING (-94.92557 49.08401, -94.92556 49.0... |
2 | 71224300263 | 7.145115e+08 | 7.145115e+08 | 2022-08-22T19:2448Z | 48.977915 | -94.869598 | no_data | 3.434701e+01 | -1.000000e+12 | 9.620000e-03 | ... | 6365.5 | 2.935181e+06 | 42 | 453020.631 | 8406.687501 | -1.000000e+12 | 0 | 3 | 1 | LINESTRING (-94.88015 49.01512, -94.88006 49.0... |
3 | 71224300273 | 7.145115e+08 | 7.145115e+08 | 2022-08-22T19:2449Z | 48.902998 | -94.854720 | no_data | 3.416786e+01 | -1.000000e+12 | 1.372000e-02 | ... | 4650.0 | 3.770782e+06 | 43 | 461636.940 | 8616.309267 | -1.000000e+12 | 0 | 1 | 1 | LINESTRING (-94.86229 48.94092, -94.86228 48.9... |
4 | 71224300283 | 7.145115e+08 | 7.145115e+08 | 2022-08-22T19:2450Z | 48.883377 | -94.783621 | no_data | 3.426341e+01 | -1.000000e+12 | 5.050000e-03 | ... | 10439.0 | 2.952077e+07 | 46 | 470821.047 | 9184.106587 | -1.000000e+12 | 0 | 5 | 1 | LINESTRING (-94.73002 48.90430, -94.72952 48.9... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
666 | 74291700141 | 7.145116e+08 | 7.145116e+08 | 2022-08-22T19:2729Z | 39.811344 | -92.684233 | no_data | 3.869180e+01 | -1.000000e+12 | 3.840000e-01 | ... | 42.0 | 1.060619e+02 | 48 | 2463828.961 | 9553.659853 | -1.000000e+12 | 0 | 1 | 1 | LINESTRING (-92.68784 39.77391, -92.68804 39.7... |
667 | 74291700151 | 7.145116e+08 | 7.145116e+08 | 2022-08-22T19:2727Z | 39.888856 | -92.683047 | no_data | 3.687848e+01 | -1.000000e+12 | 1.897500e-01 | ... | 42.0 | 1.208373e+02 | 48 | 2473386.489 | 9557.528221 | -1.000000e+12 | 0 | 1 | 1 | LINESTRING (-92.68497 39.84931, -92.68497 39.8... |
668 | 74291700161 | -1.000000e+12 | -1.000000e+12 | no_data | 39.962507 | -92.671510 | no_data | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 42.0 | 9.634731e+01 | 48 | 2482916.982 | 9530.492810 | -1.000000e+12 | 0 | 1 | 1 | LINESTRING (-92.66461 39.92629, -92.66425 39.9... |
669 | 74291700171 | 7.145116e+08 | 7.145116e+08 | 2022-08-22T19:2725Z | 40.045903 | -92.690485 | no_data | 3.647532e+01 | -1.000000e+12 | 1.394000e-01 | ... | 54.0 | 6.557368e+02 | 59 | 2494780.714 | 11863.732260 | -1.000000e+12 | 0 | 2 | 1 | LINESTRING (-92.68041 40.00013, -92.68006 40.0... |
670 | 74291700181 | 7.145116e+08 | 7.145116e+08 | 2022-08-22T19:2723Z | 40.133798 | -92.687305 | no_data | 3.384407e+01 | -1.000000e+12 | 1.019290e+00 | ... | 42.0 | 1.250056e+02 | 58 | 2506424.880 | 11644.165636 | -1.000000e+12 | 0 | 1 | 1 | LINESTRING (-92.68547 40.09150, -92.68519 40.0... |
671 rows × 111 columns
= plt.subplots(figsize=(11,7))
fig, ax =ax, color='black') SWOT_HR_shp1.plot(ax
2. Lake Vector Shapefiles
The lake vector shapefiles can be accessed in the same way as the river shapefiles above.
= earthaccess.search_data(short_name = 'SWOT_SIMULATED_NA_CONTINENT_L2_HR_LAKESP_V1', temporal = ('2022-08-22 19:24:18', '2022-08-22 19:30:50')) lake_results
Granules found: 3
#find the s3 link of the desired granule (This collection has three options: Obs, Unassigned, and Prior - we want Obs)
= []
lake_data_urls for g in lake_results:
for l in earthaccess.results.DataGranule.data_links(g, access='direct'):
if "Obs" in l:
lake_data_urls.append(l)print(lake_data_urls[0])
s3://podaac-ops-cumulus-protected/SWOT_SIMULATED_NA_CONTINENT_L2_HR_LAKESP_V1/SWOT_L2_HR_LakeSP_Obs_007_522_NA_20220822T192415_20220822T193051_Dx0000_01.zip
The native format for this data is a .zip file, and we want the .shp file within the .zip file, so we will create a Fiona AWS session using the credentials from setting up the s3fs session above to access the shapefiles within the shp files. If we don’t do this, the alternative would be to download the data to the cloud environment and extract the .zip file there.
=fiona.session.AWSSession(
fiona_session=fs_s3.storage_options["key"],
aws_access_key_id=fs_s3.storage_options["secret"],
aws_secret_access_key=fs_s3.storage_options["token"]
aws_session_token )
# We use the zip+ prefix so fiona knows that we are operating on a zip file
= f"zip+{lake_data_urls[0]}"
lake_shp_url
with fiona.Env(session=fiona_session):
= gpd.read_file(lake_shp_url)
SWOT_HR_shp2
#view the attribute table
SWOT_HR_shp2
obs_id | lake_id | overlap | time | time_tai | time_str | wse | wse_u | wse_r_u | wse_std | ... | iono_c | xovr_cal_c | p_name | p_grand_id | p_max_wse | p_max_area | p_ref_date | p_ref_ds | p_storage | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 742081R000002 | 7420470702 | 93 | 7.145116e+08 | 7.145116e+08 | 2022-08-22T19:26:51 | 36.934 | 0.051 | 0.051 | 0.159 | ... | 0.0 | 0.0 | no_data | -99999999 | -1.000000e+12 | 1.35 | -9999 | -9999.0 | -1.000000e+12 | POLYGON ((-92.75926 42.04142, -92.75977 42.041... |
1 | 742081R000003 | 7420472462 | 75 | 7.145116e+08 | 7.145116e+08 | 2022-08-22T19:26:51 | 37.037 | 0.080 | 0.080 | 0.143 | ... | 0.0 | 0.0 | no_data | -99999999 | -1.000000e+12 | 1.62 | -9999 | -9999.0 | -1.000000e+12 | POLYGON ((-92.91651 42.01167, -92.91681 42.011... |
2 | 742081R000008 | 7420473212 | 58 | 7.145116e+08 | 7.145116e+08 | 2022-08-22T19:26:51 | 36.578 | 0.181 | 0.181 | 0.058 | ... | 0.0 | 0.0 | HENDRICKSON MARSH LAKE | -99999999 | -1.000000e+12 | 45.94 | -9999 | -9999.0 | -1.000000e+12 | POLYGON ((-93.24060 41.93319, -93.24066 41.933... |
3 | 742081R000009 | 7420470712 | 73 | 7.145116e+08 | 7.145116e+08 | 2022-08-22T19:26:51 | 36.910 | 0.110 | 0.110 | 0.136 | ... | 0.0 | 0.0 | no_data | -99999999 | -1.000000e+12 | 4.50 | -9999 | -9999.0 | -1.000000e+12 | POLYGON ((-92.72557 42.03424, -92.72560 42.034... |
4 | 742081R000011 | 7420470582 | 76 | 7.145116e+08 | 7.145116e+08 | 2022-08-22T19:26:51 | 36.904 | 0.109 | 0.109 | 0.628 | ... | 0.0 | 0.0 | no_data | -99999999 | -1.000000e+12 | 1.89 | -9999 | -9999.0 | -1.000000e+12 | POLYGON ((-93.39929 41.90871, -93.39945 41.908... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17240 | 742070R000729 | 7420857422 | 92 | 7.145115e+08 | 7.145115e+08 | 2022-08-22T19:25:10 | 33.186 | 0.098 | 0.098 | 0.056 | ... | 0.0 | 0.0 | no_data | -99999999 | -1.000000e+12 | 14.94 | -9999 | -9999.0 | -1.000000e+12 | POLYGON ((-95.07581 47.57853, -95.07587 47.578... |
17241 | 742070R000730 | 7420848152 | 96 | 7.145115e+08 | 7.145115e+08 | 2022-08-22T19:25:10 | 33.071 | 0.038 | 0.038 | 0.174 | ... | 0.0 | 0.0 | no_data | -99999999 | -1.000000e+12 | 3.06 | -9999 | -9999.0 | -1.000000e+12 | POLYGON ((-94.88532 47.61669, -94.88577 47.616... |
17242 | 712070R000731 | 7120812272 | 74 | 7.145115e+08 | 7.145115e+08 | 2022-08-22T19:25:10 | 33.104 | 0.077 | 0.077 | 0.085 | ... | 0.0 | 0.0 | no_data | -99999999 | -1.000000e+12 | 4.41 | -9999 | -9999.0 | -1.000000e+12 | POLYGON ((-95.08313 47.57179, -95.08341 47.571... |
17243 | 712070R000732 | 7120816202 | 67 | 7.145115e+08 | 7.145115e+08 | 2022-08-22T19:25:10 | 32.713 | 0.106 | 0.106 | 0.198 | ... | 0.0 | 0.0 | MUD LAKE | -99999999 | -1.000000e+12 | 6.30 | -9999 | -9999.0 | -1.000000e+12 | POLYGON ((-95.39968 47.51004, -95.39975 47.510... |
17244 | 742070R000733 | 7420857422 | 95 | 7.145115e+08 | 7.145115e+08 | 2022-08-22T19:25:10 | 32.725 | 0.093 | 0.093 | 0.119 | ... | 0.0 | 0.0 | no_data | -99999999 | -1.000000e+12 | 14.94 | -9999 | -9999.0 | -1.000000e+12 | POLYGON ((-95.07228 47.57473, -95.07257 47.574... |
17245 rows × 43 columns
= plt.subplots(figsize=(7,12))
fig, ax =ax, color='black') SWOT_HR_shp2.plot(ax
3. Water Mask Pixel Cloud NetCDF
Accessing the remaining files is different than the shp files above. We do not need to unzip the files because they are stored in native netCDF files in the cloud. For the rest of the products, we will open via xarray
.
= earthaccess.search_data(short_name = 'SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXC_V1', temporal = ('2022-08-22 19:29:00', '2022-08-22 19:29:11'), point = ('-90', '35')) watermask_results
Granules found: 1
The pixel cloud netCDF files are formatted with three groups titled, “pixel cloud”, “tvp”, or “noise” (more detail here). In order to access the coordinates and variables within the file, a group must be specified when calling xarray open_dataset.
= xr.open_mfdataset(earthaccess.open([watermask_results[0]]), group = 'pixel_cloud', engine='h5netcdf')
ds_PIXC ds_PIXC
Opening 1 granules, approx size: 0.0 GB
<xarray.Dataset> Dimensions: (points: 489673, complex_depth: 2) Coordinates: latitude (points) float64 dask.array<chunksize=(489673,), meta=np.ndarray> longitude (points) float64 dask.array<chunksize=(489673,), meta=np.ndarray> Dimensions without coordinates: points, complex_depth Data variables: (12/49) azimuth_index (points) float64 dask.array<chunksize=(489673,), meta=np.ndarray> range_index (points) float64 dask.array<chunksize=(489673,), meta=np.ndarray> interferogram (points, complex_depth) float32 dask.array<chunksize=(489673, 2), meta=np.ndarray> power_plus_y (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray> power_minus_y (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray> coherent_power (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray> ... ... solid_earth_tide (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray> load_tide_fes (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray> load_tide_got (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray> pole_tide (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray> ancillary_surface_classification_flag (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray> pixc_qual (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray> Attributes: description: cloud of geolocated interferogram pixels interferogram_size_azimuth: 2923 interferogram_size_range: 4575 looks_to_efflooks: 1.75
- description :
- cloud of geolocated interferogram pixels
- interferogram_size_azimuth :
- 2923
- interferogram_size_range :
- 4575
- looks_to_efflooks :
- 1.75