SWOT Oceanography with PO.DAAC

A PODAAC Cloud Hackathon for SWOT Oceanography Science Teams

Jinbo Wang
SWOT Scientist/PODAAC Project Scientist

3/16/2022

Introduction

The SWOT data stream is built based on AWS services. It is important for the science teams to get familiarized with the new way of data access and analyses. Most importantly, the cloud-based infrastructure enables “cloud paradigm” for “moving code to data”. This concept is not new. The community has been using large high-performance computing clusters for decades. However, the integration of the cloud computing infrastructure and a data center through a public-facing commercial partner is new and enables “open data and open science” to a larger scale.

The cloud-based data ingestion was created to meet the large data volume from SWOT. The cloud infrastructure will, in addition, accelerate the scientific and application development for SWOT. The 8 sets of synthetic SWOT global L2 SSH products are used to test the cloud-based data flow and to prepare the science teams for the SWOT launch in November 2023.

The Creation of the synthetic SWOT global L2 SSH

SWOT Oceanography data/code survey (2021 Science Team Meeting)

Shortly before the 2021 SWOT Science Team meeting, a survey was created to ask for the SWOT oceanography community’s opinion about data and code sharing. In summary, the community showed high interests in a set of synthetic SWOT global L2 SSH hosted by PODAAC/AVISO, as well as pre-launch training. The survey response can be accessed here. The following are a few takeaways.

image.png
  • L2 basic/expert SSH will be mostly used data products at the initial stage
  • A common datesets will be very useful for preparation of the SWOT research
  • It is important for PODAAC/AVISO to host simulated datasets to create postlaunch scenarios
  • Most teams need training in data access and analyses
  • Majorities are potentially interested in co-developing softwares and toolboxes.

Simulated L2 SSH

The SWOTsimulator was used on two global ocean simulations (LLC4320 and GLORYS) following the error specification described in Level 2 KaRIn Low Rate Sea Surface Height Product PDF file (D-56407). The simulator was based on Lucile/Clement/Fu’s first version but was almost completely rewritten, while keeping the error budget largely unchanged. (The error representation is claimed to be better but needs more documentation.) The CNES team is the creator of these products. The 8 datasets are listed as follows:

SWOT_SIMULATED_L2_KARIN_SSH_GLORYS_CALVAL_V1          17686 files
SWOT_SIMULATED_L2_KARIN_SSH_GLORYS_SCIENCE_V1         17564 files
SWOT_SIMULATED_L2_NADIR_SSH_GLORYS_CALVAL_V1          17686 files
SWOT_SIMULATED_L2_NADIR_SSH_GLORYS_SCIENCE_V1         17564 files
SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1    10288 files
SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_SCIENCE_V1   10218 files
SWOT_SIMULATED_L2_NADIR_SSH_ECCO_LLC4320_CALVAL_V1    10287 files
SWOT_SIMULATED_L2_NADIR_SSH_ECCO_LLC4320_SCIENCE_V1   10218 files

The links to these datasets on PODAAC website or Earthdata search page.

The ECCO_LLC4320-based products cover one-year duration with >10k files (granules in Earthdata language) in each collection. The GLORYS-based products cover about 20 months with >17k files in each collection.

Relevant content in the simulated L2 SSH files

There are too much information in these datasets for this short cloud hackathon. For example, there are 92 variables in the KaRIn products. The hackathon participants are strongly encouraged to read the the product description to understand the meaning of different variables. For this exercise, the variables with “simulated” keyword are most relevant: 1. ‘simulated_true_ssh_karin’ 1. ‘simulated_error_baseline_dilation’ 1. ‘simulated_error_roll’ 1. ‘simulated_error_phase’ 1. ‘simulated_error_timing’ 1. ‘simulated_error_karin’ 1. ‘simulated_error_orbital’ 1. ‘simulated_error_troposphere’

A visualized example is provided at the end of this notebook.

Note that the LLC4320 SSH includes the barotropic (BT) signals, internal tides and atmospheric pressure (IB). A proper correction is not yet provided/validated. Wang et al. (2018) found that a linear detrend within ~150km range may be sufficient to remove most of the large-scale BT and IB signals. It can be a quick solution but no guarantee about the size of the residual errors.

Other SWOT relevant datasets

Other datasets in PODAAC that are relevant to SWOT include 10 regional subsets of LLC4320 with all the 2D and 3D fields created to support Adopt-A-Crossover (AdAC) project. You will find the dynamic correspondence of the ECCO_LLC4320-based L2 SSH to these subsets. Sentinel-6MF alongtrack data are also hosted in PODAAC cloud.

  1. All LLC4320-derived datasets in PODAAC
  2. Sentinel-6MF alongtracks

Acknowledgment

  • Project/HQ
    • F. Briol, Gerald Dibarbource, Nicolas Picot, Shailen Desai produced the data products
    • J. Tom Farrar, Julien le Sommer, Ryan Abernathey, Sarah Gille, Rosemary Morrow, Lee-Lueng Fu provided science team support
    • Nadya Vinogradova requested this cloud hackathon. Jessica Hausman facilitated the coordination.
    • Justin Rice facilitates the PODAAC-openscapes collaboration
  • Openscapes provides cloud-jupyterhub
  • PO.DAAC team (everything else here)

Simulated L2 fields with errors

Run the following code blocks in US-WEST-2 to browse the datasets and the data content and visualize the swaths of L2 SSH and errors. These code blocks are based on Mike Gangl’s “direct S3 access” notebook.

from matplotlib import pylab as plt
import xarray as xr
import numpy as np
from pprint import pprint

def init_S3FileSystem(daac='podaac'):
    '''
    This function will return a S3filesystem using s3fs.
    
    Parameter
    =========
    daac: string
          The name of the NASA DAAC where the data are hosted. The options are ['podaac','lpdaac'] and others to be added
    
    Return
    ======
    s3: a s3fs handle
    '''
    
    import requests,s3fs
    s3_cred_endpoint = {
        'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
        'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials'}

    temp_creds_url = s3_cred_endpoint[daac]
    creds = requests.get(temp_creds_url).json()
    s3 = s3fs.S3FileSystem(anon=False,
                           key=creds['accessKeyId'],
                           secret=creds['secretAccessKey'], 
                           token=creds['sessionToken'])
    return s3

s3sys=init_S3FileSystem()

def open_swot_L2SSH(filename):
    '''
    Open a file in S3 using xarray. 
    
    Parameter
    ========
    filename: S3 link to the data file. 
    
    Return
    ======
    xarray Dataset
    '''
    
    return xr.open_dataset(s3sys.open(filename))
    

Browsing the 8 datasets in s3://podaac-ops-cumulus-protected

All PODAAC data collections are stored in s3://podaac-ops-cumulus-protected and/or s3://podaac-ops-cumulus-public. Once you are on AWS computing instance, they can be accessed like accessing to your own harddrives from your laptop.

#The 8 datasets have short names starting from "SWOT". 
#You may get more than these 8 collections in the future after new SWOT data ingested. 
#The following wildcard will give you the 8 collections for now.
s3path="s3://podaac-ops-cumulus-protected/SWOT*"  

#Search all collections that fit to the wildcard.
fns= s3sys.glob(s3path)

#Print the short names of the collection (also used as 'folder' names) and the number of files (granules) within. 
for aa in fns:
    print('%55s'%aa.split('/')[-1], len(s3sys.glob(aa+'/*nc')), 'files')

#Print the name of the first 10 files in the SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1 collection.
fns=s3sys.glob("s3://podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1/*nc")
pprint(fns[:10])
     SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1 10288 files
    SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_SCIENCE_V1 10218 files
           SWOT_SIMULATED_L2_KARIN_SSH_GLORYS_CALVAL_V1 17686 files
          SWOT_SIMULATED_L2_KARIN_SSH_GLORYS_SCIENCE_V1 17564 files
     SWOT_SIMULATED_L2_NADIR_SSH_ECCO_LLC4320_CALVAL_V1 10287 files
    SWOT_SIMULATED_L2_NADIR_SSH_ECCO_LLC4320_SCIENCE_V1 10218 files
           SWOT_SIMULATED_L2_NADIR_SSH_GLORYS_CALVAL_V1 17686 files
          SWOT_SIMULATED_L2_NADIR_SSH_GLORYS_SCIENCE_V1 17564 files
['podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1/SWOT_L2_LR_SSH_Expert_001_001_20111113T000000_20111113T005105_DG10_01.nc',
 'podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1/SWOT_L2_LR_SSH_Expert_001_002_20111113T005105_20111113T014211_DG10_01.nc',
 'podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1/SWOT_L2_LR_SSH_Expert_001_003_20111113T014211_20111113T023317_DG10_01.nc',
 'podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1/SWOT_L2_LR_SSH_Expert_001_004_20111113T023317_20111113T032423_DG10_01.nc',
 'podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1/SWOT_L2_LR_SSH_Expert_001_005_20111113T032423_20111113T041529_DG10_01.nc',
 'podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1/SWOT_L2_LR_SSH_Expert_001_006_20111113T041529_20111113T050634_DG10_01.nc',
 'podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1/SWOT_L2_LR_SSH_Expert_001_007_20111113T050634_20111113T055739_DG10_01.nc',
 'podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1/SWOT_L2_LR_SSH_Expert_001_008_20111113T055739_20111113T064845_DG10_01.nc',
 'podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1/SWOT_L2_LR_SSH_Expert_001_009_20111113T064845_20111113T073951_DG10_01.nc',
 'podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_ECCO_LLC4320_CALVAL_V1/SWOT_L2_LR_SSH_Expert_001_010_20111113T073951_20111113T083057_DG10_01.nc']