# Built-in packages
import os
import sys
# Filesystem management
import fsspec
import earthaccess
# Data handling
import xarray as xr
from virtualizarr import open_virtual_dataset
# Parallel computing
import multiprocessing
from dask import delayed
import dask.array as da
from dask.distributed import Client
# Other
import matplotlib.pyplot as plt
VirtualiZarr Useful Recipes with NASA Earthdata
Summary
This notebook goes through several functionalities of the VirtualiZarr package to create virtual reference files, specifically using it with NASA Earthdata and utilizing the earthaccess
package. It is meant to be a quick-start reference that introduces some key capabilities / characteristics of the package once a user has a high-level understanding of virtual data sets and the cloud-computing challenges they address (see references in the Prerequisite knowledge section below). In short, VirtualiZarr is a Python package to create “reference files”, which can be thought of as road maps for the computer to efficiently navigate through large arrays in a single data file, or across many files. Once a reference file for a data set is created, utilizing it to open the data can speed up several processes including lazy loading, accessing subsets, and in some cases performing computations. Importantly, one can create a combined reference for all the files in a dataset and use it to lazy load / access the entire record at once.
The functionalities of VirtualiZarr (with earthaccess) covered in this notebook are:
- Getting Data File endpoints in Earthdata Cloud which are needed for virtualizarr to create reference files.
- Generating reference files for 1 day, 1 year, and the entire record of a ~750 GB data set. The data set used is the Level 4 global gridded 6-hourly wind product from the Cross-Calibrated Multi-Platform project (https://doi.org/10.5067/CCMP-6HW10M-L4V31), available on PO.DAAC. This section also covers speeding up the reference creation using parallel computing. Reference files are saved in both JSON and PARQUET formats. The latter is an important format as it reduces the reference file size by ~30x in our tests. Saving in ice chunk formats will be tested / covered in the coming months.
- Combining reference files (in progress). The ability to combine reference files together is valuable, for example to upate reference files for forward-streaming datasets when new data are available, without re-creating the entire record from scratch. However, with the current workflows and version of VirtualiZarr, this is not possible due to our use of a specific kwarg when creating the reference files. The workflow is still included here (with errors) because it is anticipated that this will be fixed in upcoming versions. Alternately, the use of ice chunk will also likely solve this issue (ice chunk functionality to be tested soon).
Requirements, prerequisite knowledge, learning outcomes
Requirements to run this notebook
Earthdata login account: An Earthdata Login account is required to access data from the NASA Earthdata system. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account.
Compute environment: This notebook is meant to be run in the cloud (AWS instance running in us-west-2). We used an
m6i.4xlarge
EC2 instance (16 CPU’s, 64 GiB memory) for the parallel computing sections. At minimum we recommend a VM with 10 CPU’s to make the parallel computations in Section 2.2.1 faster.Optional Coiled account: To run the section on distributed clusters, Create a coiled account (free to sign up), and connect it to an AWS account. For more information on Coiled, setting up an account, and connecting it to an AWS account, see their website https://www.coiled.io.
Prerequisite knowledge
This notebook covers virtualizarr functionality but does not present the high-level ideas behind it. For an understanding of reference files and how they are meant to enhance in-cloud access to file formats that are not cloud optimized (such netCDF, HDF), please see e.g. this kerchunk page, or this page on virtualizarr.
Familiarity with the
earthaccess
andXarray
packages. Familiarity with directly accessing NASA Earthdata in the cloud.The Cookbook notebook on Dask basics is handy for those new to parallel computating.
Learning Outcomes
This notebook serves both as a pedagogical resource for learning several key workflows as well as a quick reference guide. Readers will gain the understanding to combine the virtualizarr and earthaccess packages to create virtual dataset reference files for NASA Earthdata.
Import Packages
Note Zarr Version
Zarr version 2 is needed for the current implementation of this notebook, due to (as of February 2025) Zarr version 3 not accepting FSMap
objects.
We ran this notebook in a Python 3.12 environment. The minimal working environment we used to run this notebook was:
zarr==2.18.4
fastparquet==2024.5.0
xarray==2025.1.2
earthaccess==0.11.0
fsspec==2024.10.0
dask==2024.5.2 ("dask[complete]"==2024.5.2 if using pip)
h5netcdf==1.3.0
matplotlib==3.9.2
jupyterlab
jupyter-server-proxy
virtualizarr==1.3.0
kerchunk==0.2.7
And optionally:
coiled==1.58.0
# Optional
import coiled
Other Setup
# display options for xarray objects
xr.set_options( =False,
display_expand_attrs=True,
display_expand_coords=True,
display_expand_data )
<xarray.core.options.set_options at 0x7f78f44a7aa0>
1. Get Data File S3 endpoints in Earthdata Cloud
The first step is to find the S3 endpoints to the files. Handling access credentials to Earthdata and then finding the endpoints can be done a number of ways (e.g. using the requests
, s3fs
packages) but we use the earthaccess
package for its ease of use. We get the endpoints for all files in the CCMP record.
# Get Earthdata creds
earthaccess.login()
<earthaccess.auth.Auth at 0x7f790c5b4770>
# Get AWS creds. Note that if you spend more than 1 hour in the notebook, you may have to re-run this line!!!
= earthaccess.get_s3_filesystem(daac="PODAAC") fs
# Locate CCMP file information / metadata:
= earthaccess.search_data(
granule_info ="CCMP_WINDS_10M6HR_L4_V3.1",
short_name )
# Get S3 endpoints for all files:
= [g.data_links(access="direct")[0] for g in granule_info]
data_s3links 0:3] data_s3links[
['s3://podaac-ops-cumulus-protected/CCMP_WINDS_10M6HR_L4_V3.1/CCMP_Wind_Analysis_19930102_V03.1_L4.nc',
's3://podaac-ops-cumulus-protected/CCMP_WINDS_10M6HR_L4_V3.1/CCMP_Wind_Analysis_19930103_V03.1_L4.nc',
's3://podaac-ops-cumulus-protected/CCMP_WINDS_10M6HR_L4_V3.1/CCMP_Wind_Analysis_19930105_V03.1_L4.nc']
2. Generate reference files for 1 day, 1 year, and entire record
2.1 First day
The virtualizarr function to generate reference information is compact. We use it on one file for demonstration.
Important
The kwarg loadable_variables
is not mandatory to create a viable reference file, but will become important for rapid lazy loading when working with large combined reference files. Assign to this at minimum the list of 1D coordinate variable names for the data set (additional 1D or scalar vars can also be added). This functionality will be the default in future releases of virtualizarr.
# This will be assigned to 'loadable_variables' and needs to be modified per the specific
# coord names of the data set:
= ["latitude","longitude","time"] coord_vars
%%time
= {"storage_options": fs.storage_options} # S3 filesystem creds from previous section.
reader_opts
# Create reference for the first data file:
= open_virtual_dataset(
virtual_ds_example 0], indexes={},
data_s3links[=reader_opts, loadable_variables=coord_vars
reader_options
)print(virtual_ds_example)
<xarray.Dataset> Size: 66MB
Dimensions: (time: 4, latitude: 720, longitude: 1440)
Coordinates:
* latitude (latitude) float32 3kB -89.88 -89.62 -89.38 ... 89.38 89.62 89.88
* longitude (longitude) float32 6kB 0.125 0.375 0.625 ... 359.4 359.6 359.9
* time (time) datetime64[ns] 32B 1993-01-02 ... 1993-01-02T18:00:00
Data variables:
uwnd (time, latitude, longitude) float32 17MB ManifestArray<shape=(...
vwnd (time, latitude, longitude) float32 17MB ManifestArray<shape=(...
ws (time, latitude, longitude) float32 17MB ManifestArray<shape=(...
nobs (time, latitude, longitude) float32 17MB ManifestArray<shape=(...
Attributes: (54)
CPU times: user 301 ms, sys: 123 ms, total: 424 ms
Wall time: 1.75 s
The reference can be saved to file and used to open the corresponding CCMP data file with Xarray:
'virtual_ds_example.json', format='json') virtual_ds_example.virtualize.to_kerchunk(
# Open data using the reference file, using a small wrapper function around xarray's open_dataset.
# This will shorten code blocks in other sections.
def opends_withref(ref, fs_data):
"""
"ref" is a reference file or object. "fs_data" is a filesystem with credentials to
access the actual data files.
"""
= {"fo": ref, "remote_protocol": "s3", "remote_options": fs_data.storage_options}
storage_opts = fsspec.filesystem('reference', **storage_opts)
fs_ref = fs_ref.get_mapper('')
m = xr.open_dataset(
data ="zarr", chunks={},
m, engine={"consolidated": False}
backend_kwargs
)return data
= opends_withref('virtual_ds_example.json', fs)
data_example print(data_example)
<xarray.Dataset> Size: 66MB
Dimensions: (latitude: 720, longitude: 1440, time: 4)
Coordinates:
* latitude (latitude) float32 3kB -89.88 -89.62 -89.38 ... 89.38 89.62 89.88
* longitude (longitude) float32 6kB 0.125 0.375 0.625 ... 359.4 359.6 359.9
* time (time) datetime64[ns] 32B 1993-01-02 ... 1993-01-02T18:00:00
Data variables:
nobs (time, latitude, longitude) float32 17MB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
uwnd (time, latitude, longitude) float32 17MB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
vwnd (time, latitude, longitude) float32 17MB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
ws (time, latitude, longitude) float32 17MB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
Attributes: (54)
# Also useful to note, these reference objects don't take much memory:
print(sys.getsizeof(virtual_ds_example), "bytes")
120 bytes
2.2 First year
Reference information for each data file in the year is created individually, and then the combined reference file for the year can be created.
For us, reference file creation for a single file takes about 0.7 seconds, so processing a year of files would take about 4.25 minuts. One can easly accomplish this with a for-loop:
virtual_ds_list = [
open_virtual_dataset(
p, indexes={},
reader_options={"storage_options": fs.storage_options},
loadable_variables=coord_vars
)
for p in data_s3links
]
However, we speed things up using basic parallel computing.
2.2.1 Method 1: parallelize using Dask local cluster
If using an m6i.4xlarge
AWS EC2 instance, there are 16 CPUs available and each should have enough memory to utilize all at once. If working on a different VM-type, change the n_workers
in the call to Client()
below as needed.
# Check how many cpu's are on this VM:
print("CPU count =", multiprocessing.cpu_count())
CPU count = 16
# Start up cluster and print some information about it:
= Client(n_workers=15, threads_per_worker=1)
client print(client.cluster)
print("View any work being done on the cluster here", client.dashboard_link)
LocalCluster(cbeb9b3b, 'tcp://127.0.0.1:33393', workers=15, threads=15, memory=60.81 GiB)
View any work being done on the cluster here https://cluster-ykalm.dask.host/jupyter/proxy/8787/status
%%time
# Create individual references:
= delayed(open_virtual_dataset)
open_vds_par = [
tasks ={}, reader_options=reader_opts, loadable_variables=coord_vars)
open_vds_par(p, indexesfor p in data_s3links[:365] # First year only!
]= list(da.compute(*tasks)) # The xr.combine_nested() function below needs a list rather than a tuple. virtual_ds_list
CPU times: user 5.5 s, sys: 1.14 s, total: 6.64 s
Wall time: 47.6 s
Using the individual references to create the combined reference is fast and does not requre parallel computing.
%%time
# Create the combined reference
= xr.combine_nested(virtual_ds_list, concat_dim='time', coords='minimal', compat='override', combine_attrs='drop_conflicts') virtual_ds_combined
CPU times: user 181 ms, sys: 18.1 ms, total: 199 ms
Wall time: 195 ms
# Save in JSON or PARQUET format:
= 'ref_combined_1year.json'
fname_combined_json = 'ref_combined_1year.parq'
fname_combined_parq format='json')
virtual_ds_combined.virtualize.to_kerchunk(fname_combined_json, format='parquet') virtual_ds_combined.virtualize.to_kerchunk(fname_combined_parq,
%%time
# Test lazy loading of the combine reference file JSON:
= opends_withref(fname_combined_json, fs)
data_json print(data_json)
<xarray.Dataset> Size: 24GB
Dimensions: (latitude: 720, longitude: 1440, time: 1460)
Coordinates:
* latitude (latitude) float32 3kB -89.88 -89.62 -89.38 ... 89.38 89.62 89.88
* longitude (longitude) float32 6kB 0.125 0.375 0.625 ... 359.4 359.6 359.9
* time (time) datetime64[ns] 12kB 1993-01-02 ... 1994-01-04T18:00:00
Data variables:
nobs (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
uwnd (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
vwnd (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
ws (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
Attributes: (47)
CPU times: user 33.6 ms, sys: 0 ns, total: 33.6 ms
Wall time: 32.5 ms
%%time
# Test lazy loading of the combine reference file PARQUET:
= opends_withref(fname_combined_parq, fs)
data_parq print(data_parq)
<xarray.Dataset> Size: 24GB
Dimensions: (latitude: 720, longitude: 1440, time: 1460)
Coordinates:
* latitude (latitude) float32 3kB -89.88 -89.62 -89.38 ... 89.38 89.62 89.88
* longitude (longitude) float32 6kB 0.125 0.375 0.625 ... 359.4 359.6 359.9
* time (time) datetime64[ns] 12kB 1993-01-02 ... 1994-01-04T18:00:00
Data variables:
nobs (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
uwnd (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
vwnd (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
ws (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
Attributes: (47)
CPU times: user 27.2 ms, sys: 0 ns, total: 27.2 ms
Wall time: 24.7 ms
2.2.2 Optional method 2: parallelize using distributed cluster with Coiled
At PO.DAAC we have been testing the third party software/package Coiled which makes it easy to spin up distributed computing clusters in the cloud. Since we suspect that Coiled may become a key member of the Cloud ecosystem for earth science researchers, this optional section is included, which can be used as an alternative to Section 2.2.1 for generating the individual reference files in parallel.
%%time
## --------------------------------------------
## Create single reference files with parallel computing using Coiled
## --------------------------------------------
# Wrap `open_virtual_dataset()` into coiled function and copy to mulitple VM's:
= coiled.function(
open_vds_par ="us-west-2", spot_policy="on-demand",
region="m6i.large", n_workers=15
vm_type
)(open_virtual_dataset)
# Begin computations for first year only:
= open_vds_par.map(
results 365], indexes={},
data_s3links[:=reader_opts, loadable_variables=coord_vars
reader_options
)
= []
virtual_ds_list for r in results:
virtual_ds_list.append(r)
CPU times: user 2.6 s, sys: 135 ms, total: 2.73 s
Wall time: 2min 15s
open_vds_par.cluster.shutdown()
Using the individual references to create the combined reference is fast and does not requre parallel computing.
%%time
# Combining the individual references works the same as in Section 2.2.1:
= xr.combine_nested(virtual_ds_list, concat_dim='time', coords='minimal', compat='override', combine_attrs='drop_conflicts') virtual_ds_combined
CPU times: user 176 ms, sys: 0 ns, total: 176 ms
Wall time: 176 ms
# Save in JSON or PARQUET format:
= 'ref_combined_1year.json'
fname_combined_json = 'ref_combined_1year.parq'
fname_combined_parq format='json')
virtual_ds_combined.virtualize.to_kerchunk(fname_combined_json, format='parquet') virtual_ds_combined.virtualize.to_kerchunk(fname_combined_parq,
%%time
# Test lazy loading of the combine reference file JSON:
= opends_withref(fname_combined_json, fs)
data_json print(data_json)
<xarray.Dataset> Size: 24GB
Dimensions: (latitude: 720, longitude: 1440, time: 1460)
Coordinates:
* latitude (latitude) float32 3kB -89.88 -89.62 -89.38 ... 89.38 89.62 89.88
* longitude (longitude) float32 6kB 0.125 0.375 0.625 ... 359.4 359.6 359.9
* time (time) datetime64[ns] 12kB 1993-01-02 ... 1994-01-04T18:00:00
Data variables:
nobs (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
uwnd (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
vwnd (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
ws (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
Attributes: (47)
CPU times: user 20.2 ms, sys: 4.06 ms, total: 24.2 ms
Wall time: 23.8 ms
%%time
# Test lazy loading of the combine reference file PARQUET:
= opends_withref(fname_combined_parq, fs)
data_parq print(data_parq)
<xarray.Dataset> Size: 24GB
Dimensions: (latitude: 720, longitude: 1440, time: 1460)
Coordinates:
* latitude (latitude) float32 3kB -89.88 -89.62 -89.38 ... 89.38 89.62 89.88
* longitude (longitude) float32 6kB 0.125 0.375 0.625 ... 359.4 359.6 359.9
* time (time) datetime64[ns] 12kB 1993-01-02 ... 1994-01-04T18:00:00
Data variables:
nobs (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
uwnd (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
vwnd (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
ws (time, latitude, longitude) float32 6GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
Attributes: (47)
CPU times: user 303 ms, sys: 12 ms, total: 315 ms
Wall time: 314 ms
2.3 Entire record
Processing the entire record follows the exact same workflow as processing the first year Section 2.2 (either parallelization method). The only modification required is to change the one instance of
data_s3links[:365]
with
data_s3links[:]
when setting up the parallel computations (occurs once in each of Sections 2.2.1 and 2.2.2). Optionally, also change the saved file names e.g. from ref_combined_1year.json
to ref_combined_record.json
.
For us, processing the entire record using a local cluster on an m6i.4xlarge
EC2 instance, with 15 workers, took about 13 minutes. Using 20 m6i.large
VM’s on a distributed cluster with Coiled also took ~15 minutes and cost ~$0.40.
Because the virtualizarr package is so efficient at combining many individual reference files together, and because the individual references have such small in-memory requirements, the workflows in Section 2.2 are assumed to scale to tens of thousands of files and TB’s of data. However, this assumption will be tested as the techniques in the notebook are applied to progressively larger data sets.
For us, lazy loading the entire record took ~3 seconds. Compare that to an attempt at opening these same files with Xarray
the “traditional” way with a call to xr.open_mfdataset()
. On a smaller machine, the following line of code will either fail or take a long (possibly very long) amount of time:
## You can try un-commenting and running this but your notebook will probably stall or crash:
# fobjs = earthaccess.open(granule_info)
# data = xr.open_mfdataset(fobjs[:])
3. Appending additional reference files
! Currently this is not viable since the loadable_variables
kwarg was used when creating the individual reference files !
Using the loadable_variables
kwarg is important for faster lazy loading of large data sets with combined reference files, but does have this current limitation. The issue is that in order to append an additional reference file to our already saved year-long reference file from the previous section, we need to be able to re-load that reference file back into memory as manifest arrays. This isn’t supported yet for files created with the loadable_variables
kwarg.
For example, this is how we would append an extra day to the year-long reference file from section 2:
# In case this notebook has been running over an hour, refresh the file system and credentials:
= earthaccess.get_s3_filesystem(daac="PODAAC")
fs = {"storage_options": fs.storage_options} reader_opts
%%time
# Create reference file for 366th CCMP file:
= open_virtual_dataset(
vds_extraday 366], indexes={},
data_s3links[=reader_opts, loadable_variables=coord_vars
reader_options )
CPU times: user 319 ms, sys: 98 ms, total: 417 ms
Wall time: 1.72 s
%%time
# Try to add it to the year-long reference:
= open_virtual_dataset('ref_combined_1year.json', filetype='kerchunk')
vds_year1 = xr.combine_nested([vds_year1, vds_extraday], concat_dim='time', coords='minimal', compat='override', combine_attrs='drop_conflicts') vds_appended
--------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) File <timed exec>:2 File /opt/coiled/env/lib/python3.12/site-packages/virtualizarr/backend.py:199, in open_virtual_dataset(filepath, filetype, group, drop_variables, loadable_variables, decode_times, cftime_variables, indexes, virtual_array_class, virtual_backend_kwargs, reader_options, backend) 196 if backend_cls is None: 197 raise NotImplementedError(f"Unsupported file type: {filetype.name}") --> 199 vds = backend_cls.open_virtual_dataset( 200 filepath, 201 group=group, 202 drop_variables=drop_variables, 203 loadable_variables=loadable_variables, 204 decode_times=decode_times, 205 indexes=indexes, 206 virtual_backend_kwargs=virtual_backend_kwargs, 207 reader_options=reader_options, 208 ) 210 return vds File /opt/coiled/env/lib/python3.12/site-packages/virtualizarr/readers/kerchunk.py:75, in KerchunkVirtualBackend.open_virtual_dataset(filepath, group, drop_variables, loadable_variables, decode_times, indexes, virtual_backend_kwargs, reader_options) 72 with fs.open_file() as of: 73 refs = ujson.load(of) ---> 75 vds = dataset_from_kerchunk_refs(KerchunkStoreRefs(refs), fs_root=fs_root) 77 else: 78 raise ValueError( 79 "The input Kerchunk reference did not seem to be in Kerchunk's JSON or Parquet spec: https://fsspec.github.io/kerchunk/spec.html. If your Kerchunk generated references are saved in parquet format, make sure the file extension is `.parquet`. The Kerchunk format autodetection is quite flaky, so if your reference matches the Kerchunk spec feel free to open an issue: https://github.com/zarr-developers/VirtualiZarr/issues" 80 ) File /opt/coiled/env/lib/python3.12/site-packages/virtualizarr/translators/kerchunk.py:136, in dataset_from_kerchunk_refs(refs, drop_variables, virtual_array_class, indexes, fs_root) 119 def dataset_from_kerchunk_refs( 120 refs: KerchunkStoreRefs, 121 drop_variables: list[str] = [], (...) 124 fs_root: str | None = None, 125 ) -> Dataset: 126 """ 127 Translate a store-level kerchunk reference dict into an xarray Dataset containing virtualized arrays. 128 (...) 133 Currently can only be ManifestArray, but once VirtualZarrArray is implemented the default should be changed to that. 134 """ --> 136 vars = virtual_vars_from_kerchunk_refs( 137 refs, drop_variables, virtual_array_class, fs_root=fs_root 138 ) 139 ds_attrs = fully_decode_arr_refs(refs["refs"]).get(".zattrs", {}) 140 coord_names = ds_attrs.pop("coordinates", []) File /opt/coiled/env/lib/python3.12/site-packages/virtualizarr/translators/kerchunk.py:111, in virtual_vars_from_kerchunk_refs(refs, drop_variables, virtual_array_class, fs_root) 105 drop_variables = [] 106 var_names_to_keep = [ 107 var_name for var_name in var_names if var_name not in drop_variables 108 ] 110 vars = { --> 111 var_name: variable_from_kerchunk_refs( 112 refs, var_name, virtual_array_class, fs_root=fs_root 113 ) 114 for var_name in var_names_to_keep 115 } 116 return vars File /opt/coiled/env/lib/python3.12/site-packages/virtualizarr/translators/kerchunk.py:169, in variable_from_kerchunk_refs(refs, var_name, virtual_array_class, fs_root) 167 dims = zattrs.pop("_ARRAY_DIMENSIONS") 168 if chunk_dict: --> 169 manifest = manifest_from_kerchunk_chunk_dict(chunk_dict, fs_root=fs_root) 170 varr = virtual_array_class(zarray=zarray, chunkmanifest=manifest) 171 elif len(zarray.shape) != 0: 172 # empty variables don't have physical chunks, but zarray shows that the variable 173 # is at least 1D File /opt/coiled/env/lib/python3.12/site-packages/virtualizarr/translators/kerchunk.py:195, in manifest_from_kerchunk_chunk_dict(kerchunk_chunk_dict, fs_root) 193 for k, v in kerchunk_chunk_dict.items(): 194 if isinstance(v, (str, bytes)): --> 195 raise NotImplementedError( 196 "Reading inlined reference data is currently not supported. [ToDo]" 197 ) 198 elif not isinstance(v, (tuple, list)): 199 raise TypeError(f"Unexpected type {type(v)} for chunk value: {v}") NotImplementedError: Reading inlined reference data is currently not supported. [ToDo]