Subset Module

subset.py

Functions related to subsetting a NetCDF file.

podaac.subsetter.subset.subset(file_to_subset: str, bbox: ndarray, output_file: str, variables: List[str] | str | None = (), cut: bool = True, shapefile: str = None, min_time: str = None, max_time: str = None, origin_source: str = None, lat_var_names: List[str] = (), lon_var_names: List[str] = (), time_var_names: List[str] = (), pixel_subset: bool = False, stage_file_name_subsetted_true: str = None, stage_file_name_subsetted_false: str = None) ndarray | None[source]

Subset a given NetCDF file given a bounding box

Parameters:
  • file_to_subset (string) – The location of the file which will be subset

  • bbox (np.ndarray) – The chosen bounding box. This is a tuple of tuples formatted as such: ((west, east), (south, north)). The assumption is that the valid range is ((-180, 180), (-90, 90)). This will be transformed as appropriate if the actual longitude range is 0-360.

  • output_file (string) – The file path for the output of the subsetting operation.

  • variables (list, str, optional) – List of variables to include in the resulting data file. NOTE: This will remove ALL variables which are not included in this list, including coordinate variables!

  • cut (boolean) – True if the scanline should be cut, False if the scanline should not be cut. Defaults to True.

  • shapefile (str) – Name of local shapefile used to subset given file.

  • min_time (str) – ISO timestamp representing the lower bound of the temporal subset to be performed. If this value is not provided, the granule will not be subset temporally on the lower bound.

  • max_time (str) – ISO timestamp representing the upper bound of the temporal subset to be performed. If this value is not provided, the granule will not be subset temporally on the upper bound.

  • origin_source (str) – Original location or filename of data to be used in “derived from” history element.

  • lat_var_names (list) – List of variables that represent the latitude coordinate variables for this granule. This list will only contain more than one value in the case where there are multiple groups and different coordinate variables for each group.

  • lon_var_names (list) – List of variables that represent the longitude coordinate variables for this granule. This list will only contain more than one value in the case where there are multiple groups and different coordinate variables for each group.

  • time_var_names (list) – List of variables that represent the time coordinate variables for this granule. This list will only contain more than one value in the case where there are multiple groups and different coordinate variables for each group.

  • pixel_subset (boolean) – Cut the lon lat based on the rows and columns within the bounding box, but could result with lon lats that are outside the bounding box

  • stage_file_name_subsetted_true (str) – stage file name if subsetting is true name depends on result of subset

  • stage_file_name_subsetted_false (str) – stage file name if subsetting is false name depends on result of subset

  • decode_times (# clean up time variable in SNDR before)

  • blank (# SNDR.AQUA files have ascending node time)

  • list(nc_dataset.variables)) (if any('__asc_node_tai93' in i for i in) –

    asc_time_var = nc_dataset.variables[‘__asc_node_tai93’] if not asc_time_var[:] > 0:

    del nc_dataset.variables[‘__asc_node_tai93’]

podaac.subsetter.subset.subset_with_bbox(dataset: Dataset, lat_var_names: list, lon_var_names: list, time_var_names: list, bbox: ndarray = None, cut: bool = True, min_time: str = None, max_time: str = None, pixel_subset: bool = False) ndarray[source]

Subset an xarray Dataset using a spatial bounding box.

Parameters:
  • dataset (xr.Dataset) – Dataset to subset

  • lat_var_names (list) – Name of the latitude variables in the given dataset

  • lon_var_names (list) – Name of the longitude variables in the given dataset

  • time_var_names (list) – Name of the time variables in the given dataset

  • variables (list[str]) – List of variables to include in the result

  • bbox (np.array) – Spatial bounding box to subset Dataset with.

  • cut (bool) – True if scanline should be cut.

  • min_time (str) – ISO timestamp of min temporal bound

  • max_time (str) – ISO timestamp of max temporal bound

  • pixel_subset (boolean) – Cut the lon lat based on the rows and columns within the bounding box, but could result with lon lats that are outside the bounding box

  • TODO (add docstring and type hint for variables parameter.)

Returns:

  • np.array – Spatial bounds of Dataset after subset operation

  • TODO - fix this docstring type and the type hint to match code (currently returning a list[xr.Dataset])

podaac.subsetter.subset.subset_with_shapefile_multi(dataset: Dataset, lat_var_names: List[str], lon_var_names: List[str], shapefile: str, cut: bool, pixel_subset: bool) Dataset[source]

Subset an xarray Dataset using a shapefile for multiple latitude and longitude variable pairs

Returns:

The subsetted dataset

Return type:

xr.Dataset