Downloading datasets for the thermal IR tutorial

This is an optional notebook to demonstrate a method for accessing SnowEx data through the NASA EarthData API.

Learning Objectives

At the conclusion of this tutorial, you will be able to:

  • download datasets through the NASA Earthdata API

  • use data conversion tools to create geotiff files

# Import the packages we'll need
import pandas as pd
from earthdata_api import earthdata_granule_search # for interfacing with the NASA Earthdata API

We are going to use a custom function (earthdata_api.earthdata_granule_search()) which can return a list of download URLs for files we search for. We can then use a utility like wget (with our EarthData credentials) to download the files from that list of URLs.

Alternatively, we could search on the EarthData or NSIDC website itself and download through the web interface.

First, set up some of the search parameters we’ll use for all the datasets we want to find:

  • a bounding box for the Grand Mesa area,

  • and a date/time that we want to look for.

When defining a bounding box for the study area we want to look at, Earthdata will return anything that overlaps this spot.

What we are specifying below is a a tiny area, basicallcy a point search, but we can expand to a larger area if needed.

Format the bounding box as a list of coordinates like bounding_box = [ lower left longitude, lower left latitude, upper right longitude, upper right latitude ]

bounding_box = [-108.19, 39.03, -108.18, 39.04] # a little box on top of Grand Mesa

We are going to look at one specific day, but we could search multiple dates/times. Those dates/times don’t need to be a continuous time period, we could instead provide a list of dates/times. (That was my original motivation to write this function). Use either pandas.Timestamp, datetime.datetime, or numpy.datetime64 objects.

timestampsUTC = pd.Timestamp('2020-2-08') # February 8, 2020

Now we can find the first dataset we want, ASTER satellite images. Specificallfy we are going to download the AST_L1T product. Looking its product documentation we see that it is available either as an HDF file with all bands, or as tif files with select visible or thermal bands.

We want to download the hdf file, which we’ll later convert to separate geotiffs for each band.

# Information for the ASTER product we want
product="AST_L1T" # product name
version="003" # product version
ext="hdf" # geotiff files only

To make sure we find the images from this day, we need to specify a window in time to search, which starts at our timestampsUTC time we already defined.

To define this time window we specify two things, the size of the time window from our start time, and units of that size value.

search_time_window=1 # specify that we want to have a time window of 1
search_time_window_units="d" # specify time window units of Days, which together gives us 1 day

We also want to save the list of data access URLs so we can use wget to download them

# make sure this directory exists
import os
if not os.path.exists("/tmp/thermal-ir"):

Finally, make the API call to get the list of files that match our search

url_list = earthdata_granule_search(product, version, 
                                    bounding_box, timestampsUTC, 
                                    search_time_window, search_time_window_units, 
                                    list_filepath, ext)
# we can print the list of files, but these are also saved to a file
Only including hdf files in list
Writing list of URLs to /tmp/thermal-ir/aster_download_list.txt

Now we can open a terminal and use wget to download the files that are listed.

Open a terminal and navigate to the data directory for this tutorial

cd /tmp/thermal-ir/

Use wget to download the files listed in aster_download_list.txt. This requires that you put your Earthdata username wherer it says YOUR_USERNAME. This will also prompt you for your Earthdata password.

wget --http-user=YOUR_USERNAME --ask-password --keep-session-cookies --auth-no-challenge=on -c -i aster_download_list.txt

Convert hdf to geotiffs with script. When we run this utility, the output data will be at-sensor radiance stored as Digital Numbers (DN) as uint16 data types.

The is from: NASA LP DAAC. (2020). Reformat and Georeference ASTER L1T HDF Files Data Prep Script. Ver. 1. NASA EOSDIS Land Processes Distributed Active Archive Center (LP DAAC), USGS/Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, USA. Accessed July 27, 2020.

In the terminal, navigate to the directory containing the script

cd ~/website/book/tutorials/thermal-ir/ast-l1t/

Then run the script as follows to generate geotiff files for each ASTER band

python /tmp/thermal-ir/

You should now see the AST_L1T geotiff files in the data directory!

Download the second dataset we want, ground-based temperature measurements

product = "SNEX20_VPTS_Raw"
version = "1"
url_list = earthdata_granule_search(product, version, 
                                    bounding_box, timestampsUTC, 
                                    search_time_window, search_time_window_units, 
Writing list of URLs to /tmp/thermal-ir/snow_temperature_download_list.txt

Use wget to download these data

wget --http-user=YOUR_USERNAME --ask-password --keep-session-cookies --auth-no-challenge=on -c -i snow_temperature_download_list.txt

Then unzip the folder we just downloaded (-q for “quiet mode” so it doesn’t list every file as it unzips them, -d for the directory we want to unzip the files into)

unzip -q -d /tmp/thermal-ir/SNEX20_VPTS_Raw

And you should see the folder “Level-0” with the raw ground-based snow temperature data inside now in your data directory

The file we want is /tmp/thermal-ir/SNEX20_VPTS_Raw/Level-0/snow-temperature-timeseries/CR10X_GM1_final_storage_1.dat, this is a comma-delimited text file from the datalogger at snow pit 2S10


Note: The airborne TIR imagery from SnowEx 2020 is not yet publicly available through NSIDC, we will work with a sample image provided in this tutorial.


# Retrieve a copy of data files used in this tutorial from
# Re-running this cell will not re-download things if they already exist

mkdir -p /tmp/tutorial-data
cd /tmp/tutorial-data
wget -q -nc -O
unzip -q -n
import xarray as xr
import rioxarray
import datetime

Open an airborne TIR mosaic NetCDF file

# Open airborne TIR mosaic NetCDF file
ds = xr.open_dataset('/tmp/tutorial-data/thermal-ir/',
                     decode_times=False) # set decode_times to False

Inspect the dataset and its dimensions

# Preview the dataset
Dimensions:    (pass: 17, easting, x: 4171, northing, y: 3581, na: 1)
Dimensions without coordinates: pass, easting, x, northing, y, na
Data variables:
    STBmosaic  (pass, easting, x, northing, y) float32 ...
    DEM        (easting, x, northing, y) float32 ...
    E_UTM      (easting, x, na) float64 2.182e+05 2.182e+05 ... 2.39e+05
    N_UTM      (na, northing, y) float64 4.314e+06 4.314e+06 ... 4.332e+06
    time       (pass, na) float64 1.581e+09 1.581e+09 ... 1.581e+09 1.581e+09
    processed date:  19-Aug-2020 18:54:53
# Take a look at the dimensions in the dataset
Frozen({'pass': 17, 'easting, x': 4171, 'northing, y': 3581, 'na': 1})

There is an extra dimension (“na”) that we will want to drop, we will want to rename some of the dims, assign coordinates to those dims, and add a coordinate reference system for plotting.

# Drop the extra "na" dimension from E_UTM, N_UTM, and time
ds['E_UTM'] = ds['E_UTM'].isel(na=0, drop=True)
ds['N_UTM'] = ds['N_UTM'].isel(na=0, drop=True)
ds['time'] = ds['time'].isel(na=0, drop=True)

Rename the dimensions to some easier to use names

# Rename dims
ds = ds.rename({"pass" : "time", 
                "easting, x" : "easting", 
                "northing, y" : "northing"})

This NetCDF file was generated in MATLAB, and the dates/times are in an epoch format. Use utcfromtimestamp() and isoformat() to convert and reformat into a more convenient format.

# Decode matlab (epoch) format times
utctime = [datetime.datetime.utcfromtimestamp(this_time).isoformat() for this_time in ds.time.values]

Assign and then transpose coordinates in our dataset

# Assign coordinates to the "northing",  "easting", and "time" dimensions
ds = ds.assign_coords({"time": utctime, "northing": ds.N_UTM, "easting": ds.E_UTM})
# Transpose coords
ds = ds.transpose("time", "northing", "easting")

Set spatial dimensions then define which coordinate reference system the spatial dimensions are in

# Write the coordinate reference system for the spatial dims with rioxarray
#'epsg:32612', inplace=True,
                ).rio.set_spatial_dims('easting', 'northing', inplace=True,
Dimensions:      (time: 17, northing: 3581, easting: 4171)
  * time         (time) <U26 '2020-02-08T15:07:17.946189' ... '2020-02-08T19:...
  * northing     (northing) float64 4.314e+06 4.314e+06 ... 4.332e+06 4.332e+06
  * easting      (easting) float64 2.182e+05 2.182e+05 ... 2.39e+05 2.39e+05
    spatial_ref  int64 0
Data variables:
    STBmosaic    (time, northing, easting) float32 ...
    DEM          (northing, easting) float32 ...
    E_UTM        (easting) float64 2.182e+05 2.182e+05 ... 2.39e+05 2.39e+05
    N_UTM        (northing) float64 4.314e+06 4.314e+06 ... 4.332e+06 4.332e+06
    processed date:  19-Aug-2020 18:54:53