Cheatsheets & Guides

Contents:

Here are some cheatsheets and guides helping visualize what working with NASA Earthdata Cloud data looks like, and how to get started!

  1. NASA Earthdata Cloud Overview
  2. Cloud Access Pathways
  3. Getting Started Roadmaps: Cloud & Local Workflows
  4. Tools & Services Roadmap
  5. Cloud Terminology 101
  6. Workflow Cheatsheet
  7. Cheatsheet Terminology

What is the NASA Earthdata Cloud?

NASA Earthdata Cloud is the NASA archive of Earth observations and is hosted in Amazon Web Services (AWS) cloud with DAAC tools and services built for use “next to the data.” The NASA DAACs (data centers) are currently transitioning to this cloud-based environment. All PO.DAAC data will be housed in the cloud, and can be accessed through AWS. The cloud offers a scalable and effective way to address storage, network, and data movement concerns while offering a tremendous amount of flexibility to the user. Particularly if working with large data volumes, data access and processing would be more efficient if workflows are taking place in the cloud, avoiding having to download large data volumes. Data download will continue to be freely available to users, from the Earthdata Cloud archive.

Published Google Slide

Cloud Access Pathways

Three pathway examples, to interact and access data (and services) from and within the NASA Earthdata Cloud, are illustrated in the diagram. Green arrows and icons indicate working locally, after downloading data to your local machine, servers, or compute/storage space. Orange arrows and icons highlight a workflow within the cloud, setting up your own AWS EC2 cloud instance, or virtual machine, in the cloud next to the data. Blue arrows and icons also indicate a within the cloud workflow, through shareable cloud environments such as Binder or JupyterHub set up in an AWS cloud region. Note that each of these may have a range of cost models. EOSDIS data are being stored in the us-west-2 region of AWS cloud; we recommend setting up your cloud computing environment in the same region as the data for free and easy in-cloud access.

Published Google Slide

A note on costing: What is free and what do I have to budget for, now that data is archived in the cloud?

  • Downloading data from the Earthdata Cloud archive in AWS, to your local computer environment or local storage (e.g. servers) is and will continue to be free for the user.
  • Accessing the data directly in the cloud (from us-west-2 S3 region) is free. Users will need a NASA Earthdata Login account and AWS credentials to access, but there is no cost associated with these authentication steps, which are in place for security reasons.
  • Accessing data in the cloud via EOSDIS or DAAC cloud-based tools and services such as the CMR API, Harmony API, OPenDAP API (from us-west-2 S3 region) is free to the user. Having the tools and services “next to the data” in the cloud enables DAACs to support data reduction and transformation, more efficiently, on behalf of the user, so users only access the data they need.
  • Cloud computing environments (i.e. virtual machines in the cloud) for working with data in the cloud (beyond direct or via services provided access) such as data analysis or running models with the data, is user responsibility, and should be considered in budgeting. I.e. User would need to set up a cloud compute environment (such as an EC2 instance or JupyterLab) and are responsible for any storage and computing costs.
    • This means that even though direct data access in the cloud is free to the user, they would first need to have a cloud computing environment/machine to execute the data access step from, and then continue their analysis.
    • Depending on whether that cloud environment is provided by the user themselves, user’s institution, community hubs like Pangeo or NASA Openscapes JupyterLab sandbox, this element of the workflow may require user accountability, budgeting and user financial maintenance.

Getting Started Roadmap

Cloud Workflow

The following is a conceptual roadmap for users getting started with NASA Earth Observations cloud-archived data using an in-cloud workflow (i.e. bringing user code into the cloud, avoiding data download and performing data workflows “next to the data”).

Published Google Slide

Local Workflow

The following is a conceptual roadmap for users getting started with NASA Earth Observations cloud-archived data using a local machine (e.g. laptop) workflow, as data storage and computational work.

[Published Google Slide]((https://docs.google.com/presentation/d/e/2PACX-1vSpmQBM-BfRMdtut9qT2oc4uxxTDFpulXMCN6h_9TviDlujn-MZJ4gwX8ilHK1qg1hgvwrt_JXwlaVF/pub?start=false&loop=false&delayms=3000){target=“_blank”}

Tools & Services Roadmap

Below is a practical guide for learning about and selecting helpful tools or services for a given use case, focusing on how to find and access NASA Earthdata Cloud-archived data from local compute environment (e.g. laptop) or from a cloud computing workspace, with accompanying example tutorials. Once you follow your desired pathway, click on the respective blue notebook icon to get to the example tutorial. Note: these pathways are not exhaustive, there are many ways to accomplish these common steps, but these are some of our recommendations.

Published Google Slide

Cloud Terminology 101

Cloud Terminology 101 for those new to commonly used cloud computing terms and phrases.

Published Google Slide

Workflow Cheatsheet

The following is a practical reference guide with links to tutorials and informational websites for users who are starting to take the conceptual pieces and explore and implement in their own workflows.

Published Google Slide

Workflow Cheatsheet Terminology

Terminology cheatsheet to explain terms commonly used in cloud computing and those located on the NASA Earthdata Cloud Cheatsheet. See also NASA Earthdata Glossary.

Published Google Slide