import os
import glob
import s3fs
import requests
import numpy as np
import pandas as pd
import xarray as xr
import hvplot.xarray
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy
import dask
from datetime import datetime
from os.path import isfile, basename, abspath
import earthaccess
from earthaccess import Auth, DataCollections, DataGranules, Store
From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow this link.
Amazon Estuary Exploration:
In AWS Cloud Version
This tutorial is one of two jupyter notebook versions of the same use case exploring multiple satellite data products over the Amazon Estuary. In this version, we use data products directly in the Amazon Web Services (AWS) Cloud.
Learning Objectives
- Compare cloud access methods (in tandem with notebook “Amazon Estuary Exploration: Cloud Direct Download Version”)
- Search for data products using earthaccess Python library
- Access datasets using direct s3 access in cloud, xarray and visualize using hvplot or plot tools
This tutorial explores the relationships between river height, land water equivalent thickness, sea surface salinity, and sea surface temperature in the Amazon River estuary and coastal region from multiple datasets listed below. The contents are useful for the ocean, coastal, and terrestrial hydrosphere communities, showcasing how to use cloud datasets and services. The notebook is designed to be executed in Amazon Web Services (AWS) (in us-west-2 region where the cloud data is located).
Cloud Datasets
The tutorial itself will use four different datasets:
1. TELLUS_GRAC-GRFO_MASCON_CRI_GRID_RL06.1_V3
The Gravity Recovery And Climate Experiment Follow-On (GRACE-FO) satellite land water equivalent (LWE) thicknesses will be used to observe seasonal changes in water storage around the river. When discharge is high, the change in water storage will increase, thus highlighting a wet season.
2. PRESWOT_HYDRO_GRRATS_L2_DAILY_VIRTUAL_STATION_HEIGHTS_V2
The NASA Pre-SWOT Making Earth System Data Records for Use in Research Environments (MEaSUREs) Program virtual gauges will be used as a proxy for Surface Water and Ocean Topography (SWOT) discharge until SWOT products are available. MEaSUREs contains river height products, not discharge, but river height is directly related to discharge and thus will act as a good substitute.
3. OISSS_L4_multimission_7day_v1
Optimally Interpolated Sea surface salinity (OISSS) is a level 4 product that combines the records from Aquarius (Sept 2011-June 2015), the Soil Moisture Active Passive (SMAP) satellite (April 2015-present), and ESAs Soil Moisture Ocean Salinity (SMOS) data to fill in data gaps.
4. MODIS_AQUA_L3_SST_MID-IR_MONTHLY_9KM_NIGHTTIME_V2019.0
Sea surface temperature is obtained from the Moderate Resolution Imaging Spectrometer (MODIS) instrument on board the Aqua satellite.
More details on available collections are on the PO.DAAC Cloud Earthdata Search Portal. For more information on the PO.DAAC transition to the cloud, please visit: https://podaac.jpl.nasa.gov/cloud-datasets/about
Needed Packages
Earthdata Login
An Earthdata Login account is required to access data, as well as discover restricted data, from the NASA Earthdata system. Thus, to access NASA data, you need Earthdata Login. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up. We use earthaccess
to authenticate your login credentials below.
= earthaccess.login(strategy="interactive", persist=True) auth
Liquid Water Equivalent (LWE) Thickness (GRACE & GRACE-FO)
Search for GRACE LWE Thickness data
Suppose we are interested in LWE data from the dataset (DOI:10.5067/TEMSC-3JC63) described on this PO.DAAC dataset landing page: https://podaac.jpl.nasa.gov/dataset/TELLUS_GRAC-GRFO_MASCON_CRI_GRID_RL06.1_V3 From the landing page, we see the dataset Short Name under the Information tab. This is what we will be using to search for the collection with earthaccess.
You can also access the short name of the dataset through Earthdata search at: https://search.earthdata.nasa.gov.
#search for the granules using the short name
= earthaccess.search_data(short_name="TELLUS_GRAC-GRFO_MASCON_CRI_GRID_RL06.1_V3") grace_results
Granules found: 1
Open the .nc file in the s3 bucket and load it into an xarray
dataset
= xr.open_mfdataset(earthaccess.open([grace_results[0]]), engine='h5netcdf')
ds_GRACE ds_GRACE
Opening 1 granules, approx size: 0.0 GB
<xarray.Dataset> Dimensions: (lon: 720, lat: 360, time: 219, bounds: 2) Coordinates: * lon (lon) float64 0.25 0.75 1.25 1.75 ... 358.2 358.8 359.2 359.8 * lat (lat) float64 -89.75 -89.25 -88.75 ... 88.75 89.25 89.75 * time (time) datetime64[ns] 2002-04-17T12:00:00 ... 2023-03-16T1... Dimensions without coordinates: bounds Data variables: lwe_thickness (time, lat, lon) float64 dask.array<chunksize=(219, 360, 720), meta=np.ndarray> uncertainty (time, lat, lon) float64 dask.array<chunksize=(219, 360, 720), meta=np.ndarray> lat_bounds (lat, bounds) float64 dask.array<chunksize=(360, 2), meta=np.ndarray> lon_bounds (lon, bounds) float64 dask.array<chunksize=(720, 2), meta=np.ndarray> time_bounds (time, bounds) datetime64[ns] dask.array<chunksize=(219, 2), meta=np.ndarray> land_mask (lat, lon) float64 dask.array<chunksize=(360, 720), meta=np.ndarray> scale_factor (lat, lon) float64 dask.array<chunksize=(360, 720), meta=np.ndarray> mascon_ID (lat, lon) float64 dask.array<chunksize=(360, 720), meta=np.ndarray> Attributes: (12/53) Conventions: CF-1.6, ACDD-1.3, ISO 8601 Metadata_Conventions: Unidata Dataset Discovery v1.0 standard_name_vocabulary: NetCDF Climate and Forecast (CF) Metadata ... title: JPL GRACE and GRACE-FO MASCON RL06.1Mv03 CRI summary: Monthly gravity solutions from GRACE and G... keywords: Solid Earth, Geodetics/Gravity, Gravity, l... ... ... C_30_substitution: TN-14; Loomis et al., 2019, Geophys. Res. ... user_note_1: The accelerometer on the GRACE-B spacecraf... user_note_2: The accelerometer on the GRACE-D spacecraf... journal_reference: Watkins, M. M., D. N. Wiese, D.-N. Yuan, C... CRI_filter_journal_reference: Wiese, D. N., F. W. Landerer, and M. M. Wa... date_created: 2023-05-22T06:05:03Z
Plot a subset of the data
Use the function xarray.DataSet.sel
to select a subset of the data to plot with hvplot
.
= [-18, 10], [275, 330] #degrees east for longitude
lat_bnds, lon_bnds = ds_GRACE.sel(lat=slice(*lat_bnds), lon=slice(*lon_bnds))
ds_GRACE_subset
ds_GRACE_subset
='lat', x='lon', cmap='bwr_r',).opts(clim=(-80,80)) ds_GRACE_subset.lwe_thickness.hvplot.image(y
River heights (Pre-SWOT MEaSUREs)
The shortname for MEaSUREs is ‘PRESWOT_HYDRO_GRRATS_L2_DAILY_VIRTUAL_STATION_HEIGHTS_V2’ with the concept ID: C2036882359-POCLOUD. The guidebook explains the details of the Pre-SWOT MEaSUREs data.
Our desired variable is height (meters above EGM2008 geoid) for this exercise, which can be subset by distance and time. Distance represents the distance from the river mouth, in this example, the Amazon estuary. Time is between April 8, 1993 and April 20, 2019.
To get the data for the exact area we need, we have set the boundaries of (-74.67188,-4.51279,-51.04688,0.19622) as reflected in our earthaccess data search.
Let’s again access the netCDF file from an s3 bucket and look at the data structure.
#earthaccess search using our established parameters to only find necessary granules
= earthaccess.search_data(short_name="PRESWOT_HYDRO_GRRATS_L2_DAILY_VIRTUAL_STATION_HEIGHTS_V2", temporal = ("1993-04-08", "2019-04-20"), bounding_box=(-74.67188,-4.51279,-51.04688,0.19622)) MEaSUREs_results
Granules found: 1
= xr.open_mfdataset(earthaccess.open([MEaSUREs_results[0]]), engine='h5netcdf')
ds_MEaSUREs ds_MEaSUREs
Opening 1 granules, approx size: 0.0 GB
<xarray.Dataset> Dimensions: (X: 3311, Y: 3311, distance: 3311, time: 9469, charlength: 26) Coordinates: * time (time) datetime64[ns] 1993-04-08T15:20:40.665117184 ... Dimensions without coordinates: X, Y, distance, charlength Data variables: lon (X) float64 dask.array<chunksize=(3311,), meta=np.ndarray> lat (Y) float64 dask.array<chunksize=(3311,), meta=np.ndarray> FD (distance) float64 dask.array<chunksize=(3311,), meta=np.ndarray> height (distance, time) float64 dask.array<chunksize=(3311, 9469), meta=np.ndarray> sat (charlength, time) |S1 dask.array<chunksize=(26, 9469), meta=np.ndarray> storage (distance, time) float64 dask.array<chunksize=(3311, 9469), meta=np.ndarray> IceFlag (time) float64 dask.array<chunksize=(9469,), meta=np.ndarray> LakeFlag (distance) float64 dask.array<chunksize=(3311,), meta=np.ndarray> Storage_uncertainty (distance, time) float64 dask.array<chunksize=(3311, 9469), meta=np.ndarray> Attributes: (12/40) title: GRRATS (Global River Radar Altimetry Time ... Conventions: CF-1.6, ACDD-1.3 institution: Ohio State University, School of Earth Sci... source: MEaSUREs OSU Storage toolbox 2018 keywords: EARTH SCIENCE,TERRESTRIAL HYDROSPHERE,SURF... keywords_vocabulary: Global Change Master Directory (GCMD) ... ... geospatial_lat_max: -0.6550700975069503 geospatial_lat_units: degree_north geospatial_vertical_max: 92.7681246287056 geospatial_vertical_min: -3.5634095181633763 geospatial_vertical_units: m geospatial_vertical_positive: up
Plot a subset of the data
Plotting the river distances and associated heights on the map at time t=9069 (March 16, 2018) using plt
.
= plt.figure(figsize=[11,7])
fig = plt.axes(projection=ccrs.PlateCarree())
ax
ax.coastlines()-85, -30, -20, 20])
ax.set_extent([
ax.add_feature(cartopy.feature.RIVERS)
=1, c=ds_MEaSUREs.height[:,9069])
plt.scatter(ds_MEaSUREs.lon, ds_MEaSUREs.lat, lw='Interpolated River Heights (m)')
plt.colorbar(label-10,100)
plt.clim(
plt.show()
Sea Surface Salinity (Multi-mission: SMAP, Aquarius, SMOS)
The shortname for this dataset is ‘OISSS_L4_multimission_7day_v1’ with the concept ID: C2095055342-POCLOUD. This dataset contains hundreds of granules, so it would be impractical to search and copy all of the URLs in Earthdata Search. Using the shortname, we can access all the files at once in the cloud (1.1 GB data).
= earthaccess.search_data(short_name="OISSS_L4_multimission_7day_v1") sss_results
Granules found: 998
The following cell may take up to around 10 minutes to run due to the size of the dataset
#this line ignores warnings about large chunk sizes
set(**{'array.slicing.split_large_chunks': False})
dask.config.
= xr.open_mfdataset(earthaccess.open(sss_results),
ds_sss ='by_coords',
combine=True,
mask_and_scale=True,
decode_cf='auto',
chunks='h5netcdf')
engine ds_sss
Opening 998 granules, approx size: 0.0 GB
<xarray.Dataset> Dimensions: (longitude: 1440, latitude: 720, time: 998) Coordinates: * longitude (longitude) float32 -179.9 -179.6 ... 179.6 179.9 * latitude (latitude) float32 -89.88 -89.62 ... 89.62 89.88 * time (time) datetime64[ns] 2011-08-28 ... 2022-08-02 Data variables: sss (latitude, longitude, time) float32 dask.array<chunksize=(720, 1440, 1), meta=np.ndarray> sss_empirical_uncertainty (latitude, longitude, time) float32 dask.array<chunksize=(720, 1440, 962), meta=np.ndarray> sss_uncertainty (latitude, longitude, time) float32 dask.array<chunksize=(720, 1440, 1), meta=np.ndarray> Attributes: (12/42) Conventions: CF-1.8, ACDD-1.3 standard_name_vocabulary: CF Standard Name Table v27 Title: Multi-Mission Optimally Interpolated Sea S... Short_Name: OISSS_L4_multimission_7d_v1 Version: V1.0 Processing_Level: Level 4 ... ... geospatial_lat_resolution: 0.25 geospatial_lat_units: degrees_north geospatial_lon_min: -180.0 geospatial_lon_max: 180.0 geospatial_lon_resolution: 0.25 geospatial_lon_units: degrees_east
Plot a subset of the data
Use the function xarray.DataSet.sel
to select a subset of the data at the outlet of the Amazon to plot at time t=0 (August 28, 2011) with hvplot
.
= [-2, 6], [-52, -44]
lat_bnds, lon_bnds = ds_sss.sel(latitude=slice(*lat_bnds), longitude=slice(*lon_bnds))
ds_sss_subset
ds_sss_subset
0].hvplot() ds_sss_subset.sss[:,:,
Sea Surface Temperature (MODIS)
MODIS has SST data that coincides with the period used for SSS, 2011-present (943 MB), and has the short name: “MODIS_AQUA_L3_SST_MID-IR_MONTHLY_9KM_NIGHTTIME_V2019.0”. Let’s access this in the same method as before.
Get a list of files so we can open them all at once, creating an xarray dataset
using the open_mfdataset()
function to “read in” all of the netCDF4 files in one call. MODIS does not have a built-in time variable like SSS, but it is subset by latitude and longitude coordinates. We need to combine the files using the nested format with a created ‘time’ dimension.
= earthaccess.search_data(short_name="MODIS_AQUA_L3_SST_MID-IR_MONTHLY_9KM_NIGHTTIME_V2019.0", temporal = ("2011-01-01", "2023-01-01")) modis_results
Granules found: 143
MODIS did not come with a time variable, so it needs to be extracted from the file names and added in the file preprocessing so files can be successfully concatenated.
#function for time dimension added to each netCDF file
def preprocessing(ds):
= ds.product_name
file_name = basename(file_name).split("_")[2][:6]
file_date = datetime.strptime(file_date, "%Y%m")
file_date_c = [file_date_c]
time_point 'time'] = ('time', time_point) #expand the dimensions to include time
ds.coords[return ds
= xr.open_mfdataset(earthaccess.open(modis_results), combine='by_coords', join='override', mask_and_scale=True, decode_cf=True, chunks='auto',preprocess = preprocessing)
ds_MODIS ds_MODIS
Opening 143 granules, approx size: 0.0 GB
<xarray.Dataset> Dimensions: (time: 143, lat: 2160, lon: 4320, rgb: 3, eightbitcolor: 256) Coordinates: * lat (lat) float32 89.96 89.88 89.79 89.71 ... -89.79 -89.88 -89.96 * lon (lon) float32 -180.0 -179.9 -179.8 -179.7 ... 179.8 179.9 180.0 * time (time) datetime64[ns] 2011-01-01 2011-02-01 ... 2023-01-01 Dimensions without coordinates: rgb, eightbitcolor Data variables: sst4 (time, lat, lon) float32 dask.array<chunksize=(1, 2160, 4320), meta=np.ndarray> qual_sst4 (time, lat, lon) float32 dask.array<chunksize=(1, 2160, 4320), meta=np.ndarray> palette (time, lon, lat, rgb, eightbitcolor) uint8 dask.array<chunksize=(1, 4320, 2160, 3, 256), meta=np.ndarray> Attributes: (12/59) product_name: AQUA_MODIS.20110101_20110131.L3m.MO.SST... instrument: MODIS title: MODISA Level-3 Standard Mapped Image project: Ocean Biology Processing Group (NASA/GS... platform: Aqua temporal_range: month ... ... publisher_url: https://oceandata.sci.gsfc.nasa.gov processing_level: L3 Mapped cdm_data_type: grid data_bins: 4834400 data_minimum: -1.635 data_maximum: 32.06999
Plot a subset of the data
Use the function xarray.DataSet.sel
to select a subset of the data at the outlet of the Amazon to plot with hvplot
.
= [6,-2], [-52, -44]
lat_bnds, lon_bnds = ds_MODIS.sel(lat=slice(*lat_bnds), lon=slice(*lon_bnds))
ds_MODIS_subset
ds_MODIS_subset
='lat', x='lon', cmap='viridis').opts(clim=(22,30)) ds_MODIS_subset.sst4.hvplot.image(y
Time Series Comparison
Plot each dataset for the time period 2011-2019.
First, we need to average all pixels in the subset lat/lon per time for sea surface salinity and sea surface temperature to set up for the graphs.
= []
sss_mean for t in np.arange(len(ds_sss_subset.time)):
sss_mean.append(np.nanmean(ds_sss_subset.sss[:,:,t].values))
#sss_mean
#MODIS
= []
sst_MODIS_mean for t in np.arange(len(ds_MODIS_subset.time)):
sst_MODIS_mean.append(np.nanmean(ds_MODIS_subset.sst4[t,:,:].values))
#sst_MODIS_mean
Combined timeseries plot of river height and LWE thickness
Both datasets are mapped for the outlet of the Amazon River into the estuary.
#plot river height and land water equivalent thickness
= plt.subplots(figsize=[12,7])
fig, ax1
#plot river height
16,6689:9469].plot(color='darkblue')
ds_MEaSUREs.height[
#plot LWE thickness on secondary axis
= ax1.twinx()
ax2 107:179], ds_GRACE_subset.lwe_thickness[107:179,34,69], color = 'darkorange')
ax2.plot(ds_GRACE_subset.time[
'Date')
ax1.set_xlabel('Land Water Equivalent Thickness (cm)', color='darkorange')
ax2.set_ylabel('River Height (m)', color='darkblue')
ax1.set_ylabel('GRACE-FO'], loc='upper right')
ax2.legend(['Pre-SWOT MEaSUREs'], loc='lower right')
ax1.legend([
'Amazon Estuary, 2011-2019 Lat, Lon = (-0.7, -50)')
plt.title(
ax1.grid() plt.show()
LWE thickness captures the seasonality of Pre-SWOT MEaSUREs river heights well, and so LWE thickness can be compared to all other variables as a representative of the seasonality of both measurements for the purpose of this notebook.
Combined timeseries plots of salinity and LWE thickness, followed by temperature
#Combined Subplots
= plt.figure(figsize=(10,10))
fig
= fig.add_subplot(211)
ax1 'Amazon Estuary, 2011-2019')
plt.title(= ax1.twinx()
ax2 = plt.subplot(212)
ax3 = ax3.twinx()
ax4
#lwe thickness
107:179], ds_GRACE_subset.lwe_thickness[107:179,34,69], color = 'darkorange')
ax1.plot(ds_GRACE_subset.time['LWE Thickness (cm)', color='darkorange')
ax1.set_ylabel(
ax1.grid()
#sea surface salinity
0:750], sss_mean[0:750], 'g')
ax2.plot(ds_sss_subset.time['SSS (psu)', color='g')
ax2.set_ylabel(
#sea surface temperature
7:108], sst_MODIS_mean[7:108], 'darkred')
ax3.plot(ds_MODIS_subset.time['SST (deg C)', color='darkred')
ax3.set_ylabel(
ax3.grid()
#river height at outlet
16,6689:9469].plot(color='darkblue')
ds_MEaSUREs.height['River Height (m)', color='darkblue') ax4.set_ylabel(
Text(0, 0.5, 'River Height (m)')
Measurements of LWE thickness and SSS follow expected patterns. When lwe thickness is at its lowest, indicating less water is flowing through during the drought, salinity is at its highest. Without high volume of water pouring into the estuary, salinity increases. We can see that temperature is shifted a bit in time from river height as well at the outlet, a relationship that could be further explored.