import geopandas as gpd
import glob
from pathlib import Path
import pandas as pd
import os
import zipfile
import earthaccess
From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow this link.
SWOT Shapefile Data Conversion to CSV
Notebook showcasing how to merge/concatenate multiple shapefiles into a single file.
- Utilizing the merged shapefile and converting it to a csv file.
- Option to query the new dataset based on users choice; either ‘reach_id’ or water surface elevation (‘wse’), etc.
- Using the queried variable to export it as a csv or shapefile.
Import libraries
Before you start
Before you beginning this tutorial, make sure you have an account in the Earthdata Login, which is required to access data from the NASA Earthdata system. Please visit https://urs.earthdata.nasa.gov to register for an Earthdata Login account. It is free to create and only takes a moment to set up.
= earthaccess.login() auth
Search for SWOT data
Let’s start our search for River Vector Shapefiles in North America. SWOT files come in “reach” and “node” versions in the same collection, here we want the 10km reaches rather than the nodes. We will also only get files for North America, or ‘NA’ and can call out a specific pass number that we want. Each dataset has it’s own shortname associate with it, for the SWOT River shapefiles, it is SWOT_L2_HR_RiverSP_2.0.
= earthaccess.search_data(short_name = 'SWOT_L2_HR_RIVERSP_2.0',
results #temporal = ('2024-02-01 00:00:00', '2024-02-29 23:59:59'), # can also specify by time
= '*Reach*_009_NA*') # here we filter by Reach files (not node), pass=009, continent code=NA granule_name
Granules found: 5
During the science orbit, a pass will be repeated once every 21 days. A particular location may have different passes observe it within the 21 days, however. See the SWOT swath visualizer for your location!
Download the Data into a folder
"../datasets/data_downloads/SWOT_files/")
earthaccess.download(results, = Path("../datasets/data_downloads/SWOT_files") folder
Getting 5 granules, approx download size: 0.03 GB
File SWOT_L2_HR_RiverSP_Reach_008_009_NA_20231214T141139_20231214T141150_PIC0_01.zip already downloaded
Unzip shapefiles in existing folder
for item in os.listdir(folder): # loop through items in dir
if item.endswith(".zip"): # check for ".zip" extension
= zipfile.ZipFile(f"{folder}/{item}") # create zipfile object
zip_ref # extract file to dir
zip_ref.extractall(folder) # close file zip_ref.close()
Opening multiple shapefiles from within a folder
Lets open all the shapefiles we’ve downloaded together into one database. This approach is ideal for a small number of granules, but if you’re looking to create large timeseries, consider using the PO.DAAC Hydrocron tool.
# Initialize list of shapefiles containing all dates
= []
SWOT_HR_shps
# Loop through queried granules to stack all acquisition dates
for j in range(len(results)):
= earthaccess.results.DataGranule.data_links(results[j], access='external')
filename = filename[0].split("/")[-1]
filename = filename.replace('.zip','.shp')
filename_shp = f"{folder}\{filename_shp}"
filename_shp_path SWOT_HR_shps.append(gpd.read_file(filename_shp_path))
# Combine granules from all acquisition dates into one dataframe
= gpd.GeoDataFrame(pd.concat(SWOT_HR_shps, ignore_index=True))
SWOT_HR_df
# Sort dataframe by reach_id and time
= SWOT_HR_df.sort_values(['reach_id', 'time'])
SWOT_HR_df
SWOT_HR_df
reach_id | time | time_tai | time_str | p_lat | p_lon | river_name | wse | wse_u | wse_r_u | ... | p_wid_var | p_n_nodes | p_dist_out | p_length | p_maf | p_dam_id | p_n_ch_max | p_n_ch_mod | p_low_slp | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 71224500951 | -1.000000e+12 | -1.000000e+12 | no_data | 48.517717 | -93.692086 | Rainy River | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 1480.031 | 53 | 244919.492 | 10586.381484 | -1.000000e+12 | 0 | 1 | 1 | 0 | LINESTRING (-93.76076 48.51651, -93.76035 48.5... |
931 | 71224500951 | -1.000000e+12 | -1.000000e+12 | no_data | 48.517717 | -93.692086 | Rainy River | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 1480.031 | 53 | 244919.492 | 10586.381484 | -1.000000e+12 | 0 | 1 | 1 | 0 | LINESTRING (-93.76076 48.51651, -93.76035 48.5... |
1854 | 71224500951 | -1.000000e+12 | -1.000000e+12 | no_data | 48.517717 | -93.692086 | Rainy River | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 1480.031 | 53 | 244919.492 | 10586.381484 | -1.000000e+12 | 0 | 1 | 1 | 0 | LINESTRING (-93.76076 48.51651, -93.76035 48.5... |
2789 | 71224500951 | -1.000000e+12 | -1.000000e+12 | no_data | 48.517717 | -93.692086 | Rainy River | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 1480.031 | 53 | 244919.492 | 10586.381484 | -1.000000e+12 | 0 | 1 | 1 | 0 | LINESTRING (-93.76076 48.51651, -93.76035 48.5... |
3728 | 71224500951 | -1.000000e+12 | -1.000000e+12 | no_data | 48.517717 | -93.692086 | Rainy River | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 1480.031 | 53 | 244919.492 | 10586.381484 | -1.000000e+12 | 0 | 1 | 1 | 0 | LINESTRING (-93.76076 48.51651, -93.76035 48.5... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
930 | 77125000273 | -1.000000e+12 | -1.000000e+12 | no_data | 17.952683 | -99.906755 | no_data | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 283915.163 | 49 | 484179.822 | 9729.640027 | -1.000000e+12 | 0 | 2 | 1 | 0 | LINESTRING (-99.93256 17.94746, -99.93273 17.9... |
1853 | 77125000273 | -1.000000e+12 | -1.000000e+12 | no_data | 17.952683 | -99.906755 | no_data | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 283915.163 | 49 | 484179.822 | 9729.640027 | -1.000000e+12 | 0 | 2 | 1 | 0 | LINESTRING (-99.93256 17.94746, -99.93273 17.9... |
2788 | 77125000273 | -1.000000e+12 | -1.000000e+12 | no_data | 17.952683 | -99.906755 | no_data | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 283915.163 | 49 | 484179.822 | 9729.640027 | -1.000000e+12 | 0 | 2 | 1 | 0 | LINESTRING (-99.93256 17.94746, -99.93273 17.9... |
3727 | 77125000273 | -1.000000e+12 | -1.000000e+12 | no_data | 17.952683 | -99.906755 | no_data | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 283915.163 | 49 | 484179.822 | 9729.640027 | -1.000000e+12 | 0 | 2 | 1 | 0 | LINESTRING (-99.93256 17.94746, -99.93273 17.9... |
4666 | 77125000273 | -1.000000e+12 | -1.000000e+12 | no_data | 17.952683 | -99.906755 | no_data | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 283915.163 | 49 | 484179.822 | 9729.640027 | -1.000000e+12 | 0 | 2 | 1 | 0 | LINESTRING (-99.93256 17.94746, -99.93273 17.9... |
4667 rows × 127 columns
Querying a Shapefile
Let’s get the attributes from a particular reach of the merged shapefile. If you want to search for a specific reach id or a specific length of river reach that is possible through a spatial query using Geopandas. Here, we’ll look at a river reach on Cook Slough in Oregon, ID: 78310700041. River IDs can be identified in the SWORD Database.
= SWOT_HR_df.query("reach_id == '77125000273'")
reach reach
reach_id | time | time_tai | time_str | p_lat | p_lon | river_name | wse | wse_u | wse_r_u | ... | p_wid_var | p_n_nodes | p_dist_out | p_length | p_maf | p_dam_id | p_n_ch_max | p_n_ch_mod | p_low_slp | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
930 | 77125000273 | -1.000000e+12 | -1.000000e+12 | no_data | 17.952683 | -99.906755 | no_data | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 283915.163 | 49 | 484179.822 | 9729.640027 | -1.000000e+12 | 0 | 2 | 1 | 0 | LINESTRING (-99.93256 17.94746, -99.93273 17.9... |
1853 | 77125000273 | -1.000000e+12 | -1.000000e+12 | no_data | 17.952683 | -99.906755 | no_data | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 283915.163 | 49 | 484179.822 | 9729.640027 | -1.000000e+12 | 0 | 2 | 1 | 0 | LINESTRING (-99.93256 17.94746, -99.93273 17.9... |
2788 | 77125000273 | -1.000000e+12 | -1.000000e+12 | no_data | 17.952683 | -99.906755 | no_data | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 283915.163 | 49 | 484179.822 | 9729.640027 | -1.000000e+12 | 0 | 2 | 1 | 0 | LINESTRING (-99.93256 17.94746, -99.93273 17.9... |
3727 | 77125000273 | -1.000000e+12 | -1.000000e+12 | no_data | 17.952683 | -99.906755 | no_data | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 283915.163 | 49 | 484179.822 | 9729.640027 | -1.000000e+12 | 0 | 2 | 1 | 0 | LINESTRING (-99.93256 17.94746, -99.93273 17.9... |
4666 | 77125000273 | -1.000000e+12 | -1.000000e+12 | no_data | 17.952683 | -99.906755 | no_data | -1.000000e+12 | -1.000000e+12 | -1.000000e+12 | ... | 283915.163 | 49 | 484179.822 | 9729.640027 | -1.000000e+12 | 0 | 2 | 1 | 0 | LINESTRING (-99.93256 17.94746, -99.93273 17.9... |
5 rows × 127 columns
Converting to CSV
We can convert the merged timeseries geodataframe for this reach into a csv file.
/ 'csv_77125000273.csv') gdf.to_csv(folder