Advanced Cloud
When you want to dive into optimizing cloud workflows
AWS Lambda
- User Guide to Scale Scientific Analysis in the Cloud - user guide for deploying AWS services in end-user Amazon accounts
- Notebook Demonstration - This tutorial demonstrates how to plot a timeseries of global mean sea surface temperature values using AWS Lambda to perform the global mean computation using the MUR 25km dataset.
Zarr
- Tutorial for NetCDF4 Files - Teaches about the Zarr cloud optimized format
Kerchunk and Virtualizarr
- Kerchunk recipes - Meant to be used after having a high-level understanding of the pacakge, this notebook goes through several functionalities of kerchunk that we found relevant to Earthdata users. Workflows here combine Kerchunk with the earthaccess package.
- Kerchunk JSON Generation - An additional tutorial on generating a Kerchunk JSON file, demonstrating its use with one of the SWOT data sets hosted on PO.DAAC. Creates output for input in the following tutorial.
- Integrating Dask, Kerchunk, Zarr and Xarray - Efficiently visualize a whole collection of data in an interactive dashboard via cloud-optimized formats.
- Virtualizarr (coming soon)
Dask and Coiled
- Introduction to Dask Tutorial - covers the basics of using Dask for parallel computing with NASA Earth Data completely in the cloud
- Dask Function Replication Example - demonstrates a more complex example of replicating a function over many files in parallel using
dask.delayed()
. The example analysis generates spatial correlation maps of sea surface temperature vs sea surface height, using data sets available on PO.DAAC. - Dask Dataset Chunking Example - demonstrates a more complex example of applying computations to a large dataset via chunking and parallel computing. The example analysis generates seasonal cycles of sea surface temperature off the west coast of the U.S.A for a decade of ultra-high resolution data. Parallel computations are performed on a single VM with a local Dask cluster.
- Coiled Function Replication Example - demonstrates a more complex example of replicating a function over many files in parallel using
coiled.function()
. The example analysis generates spatial correlation maps of sea surface temperature vs sea surface height, using data sets available on PO.DAAC. This replicates the analysis from the Dask Function Replication Example, but changes the method of parallel computation. Instead of using a local cluster on a single VM (Dask), many VM’s are combined into a distributed cluster (Coiled). - Coiled Dataset Chunking Example - demonstrates a more complex example of applying computations to a large dataset via chunking and parallel computing. The example analysis generates seasonal cycles of sea surface temperature off the west coast of the U.S.A for a decade of ultra-high resolution data. Parallel computations are distributed over many VM’s using Coiled’s
coiled.cluster()
.
Harmony-py
- Subsetting tutorial - a tutorial for a Python library that integrates with NASA’s Harmony Services.