Advanced Cloud

When you want to dive into optimizing cloud workflows

AWS Lambda

Zarr

Kerchunk and Virtualizarr

  • Kerchunk recipes - Meant to be used after having a high-level understanding of the pacakge, this notebook goes through several functionalities of kerchunk that we found relevant to Earthdata users. Workflows here combine Kerchunk with the earthaccess package.
  • Kerchunk JSON Generation - An additional tutorial on generating a Kerchunk JSON file, demonstrating its use with one of the SWOT data sets hosted on PO.DAAC. Creates output for input in the following tutorial.
  • Integrating Dask, Kerchunk, Zarr and Xarray - Efficiently visualize a whole collection of data in an interactive dashboard via cloud-optimized formats.
  • Virtualizarr (coming soon)

Dask and Coiled

  • Introduction to Dask Tutorial - covers the basics of using Dask for parallel computing with NASA Earth Data completely in the cloud
  • Dask Function Replication Example - demonstrates a more complex example of replicating a function over many files in parallel using dask.delayed(). The example analysis generates spatial correlation maps of sea surface temperature vs sea surface height, using data sets available on PO.DAAC.
  • Dask Dataset Chunking Example - demonstrates a more complex example of applying computations to a large dataset via chunking and parallel computing. The example analysis generates seasonal cycles of sea surface temperature off the west coast of the U.S.A for a decade of ultra-high resolution data. Parallel computations are performed on a single VM with a local Dask cluster.
  • Coiled Function Replication Example - demonstrates a more complex example of replicating a function over many files in parallel using coiled.function(). The example analysis generates spatial correlation maps of sea surface temperature vs sea surface height, using data sets available on PO.DAAC. This replicates the analysis from the Dask Function Replication Example, but changes the method of parallel computation. Instead of using a local cluster on a single VM (Dask), many VM’s are combined into a distributed cluster (Coiled).
  • Coiled Dataset Chunking Example - demonstrates a more complex example of applying computations to a large dataset via chunking and parallel computing. The example analysis generates seasonal cycles of sea surface temperature off the west coast of the U.S.A for a decade of ultra-high resolution data. Parallel computations are distributed over many VM’s using Coiled’s coiled.cluster().

Harmony-py