How do I find data using code?
Introduction
Here are our recommended approaches for finding data with code, from the command line or a notebook.
In Python we can use the earthaccess
library (renamed, previously earthdata
)
To install the package we’ll run this code from the command line. Note: you can run shell code directly from a Jupyter Notebook cell by adding a !
, so it would be !conda install
.
## In the command line
## Install earthaccess
conda install -c conda-forge earthaccess
This example searches for data from the Land Processes DAAC with a spatial bounding box and temporal range.
## In Python
## Import packages
from earthaccess import DataGranules, DataCollections
from pprint import pprint
## We'll get 4 collections that match with our keyword of interest
= DataCollections().keyword("REFLECTANCE").cloud_hosted(True).get(4)
collections
## Let's print 2 collections
for collection in collections[0:2]:
print(pprint(collection.summary()) , collection.abstract(), "\n")
## Search for files from the second dataset result over a small plot in Nebraska, USA for two weeks in September 2022
= DataGranules().concept_id("C2021957657-LPCLOUD").temporal("2022-09-10","2022-09-24").bounding_box(-101.67271,41.04754,-101.65344,41.06213)
granules print(len(granules))
granules
To find data in R, we’ll also use the earthaccess
python package - we can do so from R using the reticulate
package (cheatsheet). Note below that we import the python library as an R object we name earthaccess
, as well as the earthaccess$
syntax for accessing functions from the earthaccess
library. The granules
object has a list of JSON dictionaries with some extra dictionaries.
## In R
## load R libraries
library(tidyverse) # install.packages("tidyverse")
library(reticulate) # install.packages("reticulate")
## load python library
<- reticulate::import("earthaccess")
earthaccess
## use earthaccess to access data # https://nsidc.github.io/earthaccess/tutorials/search-granules/
<- earthaccess$search_data(
granules doi = "10.5067/SLREF-CDRV3",
temporal = reticulate::tuple("2017-01", "2017-02") # with an earthaccess update, this can be simply c() or list()
)
## Granules found: 72
## exploring
# this is the result of the get request.
granules
class(granules) # "list"
## granules <- reticulate::py_to_r(granules) # Object to convert is not a Python object
Matlab code coming soon!
## In Matlab
## Coming soon!
With wget
and curl
:
## In the command line
## Coming soon!