import earthaccess
import json
import boto3
import s3fs
import xarray as xr
import matplotlib as mpl
import matplotlib.pyplot as plt
From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow this link.
Scale Scientific Analysis in the Cloud with AWS Lambda
This tutorial demonstrates how to invoke an AWS Lambda function to create a timeseries of global mean sea surface temperature values.
IMPORTANT: This notebook will only run successfully after deploying the AWS Lambda function and supporting services to your own Amazon Web Services account.
See the full documentation to set up the prerequisite services.
** Note that using AWS Compute services will incur costs that will be charged to your AWS account. Running this tutorial as-configured with no modifications should result in less than $1 USD of charges. Expanding the analysis to a longer time period or different dataset will affect the compute costs charged to your AWS account.
PO.DAAC is not responsible for costs charged as a result of running this tutorial
Prerequisite Steps: Set up AWS infrastructure
This tutorial takes advantage of numerous AWS Services including Lambda, Parameter Store, Elastic Compute Cloud (EC2), Elastic Container Registry (ECR), and Simple Storage Service (S3).
After setting up and deploying all of the services as described in the documentation, this notebook must be run in an EC2 instance to invoke the functions and plot the results.
Connect to EC2 instance to run this notebook
This notebook cannot be run on a local computer, as it heavily depends on direct in-cloud access. To run this notebook in AWS, connect to an EC2 instance running in the us-west-2 region, following the instructions in this tutorial. Once you have connected to the EC2 instance, you can clone this repository into that environment, install the required packages, and run this notebook.
Step 1: Log in to Earthdata
We use the earthaccess Python library to handle Earthdata authentication for the initial query to find the granules of interest.
= earthaccess.login() auth
EARTHDATA_USERNAME and EARTHDATA_PASSWORD are not set in the current environment, try setting them or use a different strategy (netrc, interactive)
You're now authenticated with NASA Earthdata Login
Using token with expiration date: 06/16/2023
Using .netrc file for EDL
= earthaccess.search_data(
granules ='MUR25-JPL-L4-GLOB-v04.2',
short_name=True,
cloud_hosted=("2022-01-01", "2023-01-01")
temporal )
Granules found: 365
= [g.data_links(access='direct')[0] for g in granules] granule_paths
for path in granule_paths:
print(path)
break
s3://podaac-ops-cumulus-protected/MUR25-JPL-L4-GLOB-v04.2/20220101090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc
Step 2: Invoke the Lambda function
Set up a boto3 session to connect to your AWS instance and invoke the Lambda function
= boto3.Session(profile_name='saml-pub')
session
= session.client('lambda', region_name='us-west-2')
lambda_client
= "podaac-sst"
s3_results_bucket
for granule in granule_paths:
= {"input_granule_s3path": granule, "output_granule_s3bucket": s3_results_bucket, "prefix":"podaac"}
lambda_payload
lambda_client.invoke(="podaac-sst",
FunctionName="Event",
InvocationType=json.dumps(lambda_payload)
Payload )
Step 3: Plot results as timeseries
Open the resulting global mean files in xarray:
# set up the connection to the S3 bucket holding the results
= s3fs.S3FileSystem(
s3_results =False,
anon='saml-pub'
profile
)
= s3_results.glob("s3://" + s3_results_bucket + "/MUR25/*")
s3_files
# iterate through s3 files to create a fileset
= [s3_results.open(file) for file in s3_files]
fileset
# open all files as an xarray dataset
= xr.open_mfdataset(fileset, combine='by_coords', engine='scipy') data
data
<xarray.Dataset> Dimensions: (time: 365) Coordinates: * time (time) datetime64[ns] 2022-01-01T09:00:00 ... 2023-01-01T09... Data variables: analysed_sst (time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
Plot the data using matplotlib:
'font.size': 22})
mpl.rcParams.update({
# set up the figure
= plt.Figure(figsize=(20,5))
fig
# plot the data
='3')
plt.plot(data.time, data.analysed_sst, linewidth'Global Mean Sea Surface Temperature' + '\n' + '2022, MUR25')
plt.title(
'degrees C')
plt.ylabel('Date') plt.xlabel(
Text(0.5, 0, 'Date')