Dynamic Query of SNOTEL data¶
SnowEx Hackweek
July 13, 2021
Author: David Shean
Introduction¶
This tutorial will demonstrate a subset of basic concepts of geospatial data processing and analysis using data from the SNOTEL sensor network. We will demonstrate dynamic query of a public API to fetch point location data, review coordinate system transformations with GeoPandas, Geometry objects, and Pandas time series analysis/visualization.
Read a bit about SNOTEL data for the Western U.S.¶
https://www.wcc.nrcs.usda.gov/snow/
This is actually a nice web interface, with some advanced querying and interactive visualization. You can also download formatted ASCII files (csv) for analysis. This is great for one-time projects, but it’s nice to have reproducible code that can be updated as new data appear, without manual steps. That’s what we’re going to do here.
About SNOTEL sites and data:¶
Sample plots for SNOTEL site at Paradise, WA (south side of Mt. Rainier)¶
We will reproduce some of these plots/metrics in this tutorial
Interactive dashboard¶
Snow today¶
CUAHSI WOF server and automated Python data queries¶
We are going to use a server set up by CUAHSI to serve the SNOTEL data, using a standardized database storage format and query structure. You don’t need to worry about this, but can quickly review the following:
#This is the latest CUAHSI API endpoint
wsdlurl = 'https://hydroportal.cuahsi.org/Snotel/cuahsi_1_1.asmx?WSDL'
Acronym soup¶
SNOTEL = Snow Telemetry
CUAHSI = Consortium of Universities for the Advancement of Hydrologic Science, Inc
WOF = WaterOneFlow
WSDL = Web Services Description Language
USDA = United States Department of Agriculture
NRCS = National Resources Conservation Service
AWDB = Air-Water Database
Python options¶
There are a few packages out there that offer convenience functions to query the online SNOTEL databases and unpack the results.
climata (https://pypi.org/project/climata/) - last commit Sept 2017 (not a good sign)
ulmo (https://github.com/ulmo-dev/ulmo) - maintained by Emilio Mayorga over at UW APL)
You can also write your own queries using the Python requests
module and some built-in XML parsing libraries.
Hopefully not overwhelming amount of information - let’s just go with ulmo for now. I’ve done most of the work to prepare functions for querying and processing the data. Once you wrap your head around all of the acronyms, it’s pretty simple, basically running a few functions here: https://ulmo.readthedocs.io/en/latest/api.html#module-ulmo.cuahsi.wof
We will use ulmo with daily data for this exercise, but please feel free to experiment with hourly data, other variables or other approaches to fetch SNOTEL data.
import os
from datetime import datetime
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
import contextily as ctx
import ulmo
Part 1: Spatial Query SNOTEL sites¶
Use the ulmo cuahsi interface and the
get_sites
function to fetch available site metadata from serverThis will return a Python dictionary
sites = ulmo.cuahsi.wof.get_sites(wsdlurl)
#Preview first item in dictionary
next(iter(sites.items()))
('SNOTEL:301_CA_SNTL',
{'code': '301_CA_SNTL',
'name': 'Adin Mtn',
'network': 'SNOTEL',
'location': {'latitude': '41.2358283996582',
'longitude': '-120.79192352294922'},
'elevation_m': '1886.7120361328125',
'site_property': {'county': 'Modoc',
'state': 'California',
'site_comments': 'beginDate=10/1/1983 12:00:00 AM|endDate=1/1/2100 12:00:00 AM|HUC=180200021403|HUD=18020002|TimeZone=-8.0|actonId=20H13S|shefId=ADMC1|stationTriplet=301:CA:SNTL|isActive=True',
'pos_accuracy_m': '0'}})
Store the dictionary as a Pandas DataFrame called sites_df
¶
See the Pandas
from_dict
functionUse
orient
option so the sites comprise the DataFrame index, with columns for ‘name’, ‘elevation_m’, etcUse the
dropna
method to remove any empty records
sites_df = pd.DataFrame.from_dict(sites, orient='index').dropna()
sites_df.head()
code | name | network | location | elevation_m | site_property | |
---|---|---|---|---|---|---|
SNOTEL:301_CA_SNTL | 301_CA_SNTL | Adin Mtn | SNOTEL | {'latitude': '41.2358283996582', 'longitude': ... | 1886.7120361328125 | {'county': 'Modoc', 'state': 'California', 'si... |
SNOTEL:907_UT_SNTL | 907_UT_SNTL | Agua Canyon | SNOTEL | {'latitude': '37.522171020507813', 'longitude'... | 2712.719970703125 | {'county': 'Kane', 'state': 'Utah', 'site_comm... |
SNOTEL:916_MT_SNTL | 916_MT_SNTL | Albro Lake | SNOTEL | {'latitude': '45.59722900390625', 'longitude':... | 2529.840087890625 | {'county': 'Madison', 'state': 'Montana', 'sit... |
SNOTEL:1267_AK_SNTL | 1267_AK_SNTL | Alexander Lake | SNOTEL | {'latitude': '61.749668121337891', 'longitude'... | 48.768001556396484 | {'county': 'Matanuska-Susitna', 'state': 'Alas... |
SNOTEL:908_WA_SNTL | 908_WA_SNTL | Alpine Meadows | SNOTEL | {'latitude': '47.779571533203125', 'longitude'... | 1066.800048828125 | {'county': 'King', 'state': 'Washington', 'sit... |
Clean up the DataFrame and prepare Point geometry objects¶
Convert
'location'
column (contains dictionary with'latitude'
and'longitude'
values) to ShapelyPoint
objectsStore as a new
'geometry'
column (needed by GeoPandas)Drop the
'location'
column, as this is no longer neededUpdate the
dtype
of the'elevation_m'
column to float
sites_df['geometry'] = [Point(float(loc['longitude']), float(loc['latitude'])) for loc in sites_df['location']]
sites_df = sites_df.drop(columns='location')
sites_df = sites_df.astype({"elevation_m":float})
Review output¶
Take a moment to familiarize yourself with the DataFrame structure and different columns.
Note that the index is a set of strings with format ‘SNOTEL:1000_OR_SNTL’
Extract the first record with
loc
Review the
'site_property'
dictionary - could parse this and store as separate fields in the DataFrame if desired
sites_df.head()
code | name | network | elevation_m | site_property | geometry | |
---|---|---|---|---|---|---|
SNOTEL:301_CA_SNTL | 301_CA_SNTL | Adin Mtn | SNOTEL | 1886.712036 | {'county': 'Modoc', 'state': 'California', 'si... | POINT (-120.7919235229492 41.2358283996582) |
SNOTEL:907_UT_SNTL | 907_UT_SNTL | Agua Canyon | SNOTEL | 2712.719971 | {'county': 'Kane', 'state': 'Utah', 'site_comm... | POINT (-112.2711791992188 37.52217102050781) |
SNOTEL:916_MT_SNTL | 916_MT_SNTL | Albro Lake | SNOTEL | 2529.840088 | {'county': 'Madison', 'state': 'Montana', 'sit... | POINT (-111.9590225219727 45.59722900390625) |
SNOTEL:1267_AK_SNTL | 1267_AK_SNTL | Alexander Lake | SNOTEL | 48.768002 | {'county': 'Matanuska-Susitna', 'state': 'Alas... | POINT (-150.8896636962891 61.74966812133789) |
SNOTEL:908_WA_SNTL | 908_WA_SNTL | Alpine Meadows | SNOTEL | 1066.800049 | {'county': 'King', 'state': 'Washington', 'sit... | POINT (-121.6984710693359 47.77957153320312) |
sites_df.loc['SNOTEL:301_CA_SNTL']
code 301_CA_SNTL
name Adin Mtn
network SNOTEL
elevation_m 1886.712036
site_property {'county': 'Modoc', 'state': 'California', 'si...
geometry POINT (-120.7919235229492 41.2358283996582)
Name: SNOTEL:301_CA_SNTL, dtype: object
sites_df.loc['SNOTEL:301_CA_SNTL']['site_property']
{'county': 'Modoc',
'state': 'California',
'site_comments': 'beginDate=10/1/1983 12:00:00 AM|endDate=1/1/2100 12:00:00 AM|HUC=180200021403|HUD=18020002|TimeZone=-8.0|actonId=20H13S|shefId=ADMC1|stationTriplet=301:CA:SNTL|isActive=True',
'pos_accuracy_m': '0'}
Convert to a Geopandas GeoDataFrame¶
We already have
'geometry'
column, but still need to define thecrs
of the point coordinatesNote the number of records
sites_gdf_all = gpd.GeoDataFrame(sites_df, crs='EPSG:4326')
sites_gdf_all.head()
code | name | network | elevation_m | site_property | geometry | |
---|---|---|---|---|---|---|
SNOTEL:301_CA_SNTL | 301_CA_SNTL | Adin Mtn | SNOTEL | 1886.712036 | {'county': 'Modoc', 'state': 'California', 'si... | POINT (-120.79192 41.23583) |
SNOTEL:907_UT_SNTL | 907_UT_SNTL | Agua Canyon | SNOTEL | 2712.719971 | {'county': 'Kane', 'state': 'Utah', 'site_comm... | POINT (-112.27118 37.52217) |
SNOTEL:916_MT_SNTL | 916_MT_SNTL | Albro Lake | SNOTEL | 2529.840088 | {'county': 'Madison', 'state': 'Montana', 'sit... | POINT (-111.95902 45.59723) |
SNOTEL:1267_AK_SNTL | 1267_AK_SNTL | Alexander Lake | SNOTEL | 48.768002 | {'county': 'Matanuska-Susitna', 'state': 'Alas... | POINT (-150.88966 61.74967) |
SNOTEL:908_WA_SNTL | 908_WA_SNTL | Alpine Meadows | SNOTEL | 1066.800049 | {'county': 'King', 'state': 'Washington', 'sit... | POINT (-121.69847 47.77957) |
sites_gdf_all.shape
(930, 6)
Create a scatterplot showing elevation values for all sites¶
#geojson of state polygons
states_url = 'http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_5m.json'
states_gdf = gpd.read_file(states_url)
f, ax = plt.subplots(figsize=(10,6))
sites_gdf_all.plot(ax=ax, column='elevation_m', markersize=3, cmap='inferno', legend=True, legend_kwds={'label': "Elevation (m)"})
#This prevents matplotlib from updating the axes extent (states polygons cover larger area than SNOTEL points)
ax.autoscale(False)
states_gdf.plot(ax=ax, facecolor='none', edgecolor='k', alpha=0.3);

Exclude the Alaska (AK) points to isolate points over Western U.S.¶
Simple appraoch is to remove points where the site name contains ‘AK’ with attribute filter
Note the number of records
sites_gdf_conus = sites_gdf_all[~(sites_gdf_all.index.str.contains('AK'))]
Alternatively, can use a spatial filter (see GeoPandas
cx
indexer functionality for a bounding box)
#xmin, xmax, ymin, ymax = [-126, 102, 30, 50]
#sites_gdf_conus = sites_gdf_all.cx[xmin:xmax, ymin:ymax]
sites_gdf_conus.shape
(865, 6)
Update your scatterplot as sanity check¶
Should look something like the Western U.S.
f, ax = plt.subplots(figsize=(10,6))
sites_gdf_conus.plot(ax=ax, column='elevation_m', markersize=3, cmap='inferno', legend=True, legend_kwds={'label': "Elevation (m)"})
ax.autoscale(False)
states_gdf.plot(ax=ax, facecolor='none', edgecolor='k', alpha=0.3);

Export SNOTEL site GeoDataFrame as a geojson¶
Maybe useful for other purposes, and can avoid all of the above processing, just load directly with geopandas
read_file
sites_gdf_conus.to_file?
sites_fn = 'snotel_conus_sites.json'
if not os.path.exists(sites_fn):
sites_gdf_conus.to_file(sites_fn, driver='GeoJSON')
Part 2: Spatial filter points by polygon¶
Load Grand Mesa Polygon¶
gm_poly_fn = 'grand_mesa_poly.geojson'
gm_poly_gdf = gpd.read_file(gm_poly_fn)
gm_poly_gdf.plot()
<AxesSubplot:>

A quick aside on geometry
objects¶
Vector data contain geometry
objects¶
Classes (Point, Line, Polygon) with unique attribute and methods
Polygon vs. MultiPolygon
https://automating-gis-processes.github.io/site/notebooks/L1/geometric-objects.html
https://shapely.readthedocs.io/en/stable/manual.html#geometric-objects
Image Source: National Ecological Observatory Network (NEON), from https://datacarpentry.org/organization-geospatial
Isolate Polygon geometry within GeoDataFrame¶
type(gm_poly_gdf)
geopandas.geodataframe.GeoDataFrame
gm_poly_gdf
geometry | |
---|---|
0 | POLYGON ((-108.31168 39.13758, -108.34116 39.0... |
gm_poly_gdf.total_bounds
array([-108.34115668, 38.82320553, -107.72839859, 39.19563035])
gm_poly_gdf.iloc[0] #Now a GeoSeries
geometry POLYGON ((-108.31168 39.13758, -108.34116 39.0...
Name: 0, dtype: geometry
gm_poly_geom = gm_poly_gdf.iloc[0].geometry
gm_poly_geom
print(gm_poly_geom)
POLYGON ((-108.3116825655377 39.13757646212944, -108.3411566832522 39.03758987613325, -108.2878686387796 38.89051431295789, -108.2077296878005 38.8232055291981, -108.0746016431103 38.8475137825863, -107.9856051049498 38.9439912011017, -107.7283985875575 39.01510930230633, -107.7872414249099 39.19563034965999, -108.0493948009875 39.13950466335424, -108.1728700097086 39.15920066396116, -108.3116825655377 39.13757646212944))
list(gm_poly_geom.exterior.coords)
[(-108.31168256553767, 39.13757646212944),
(-108.34115668325224, 39.03758987613325),
(-108.2878686387796, 38.89051431295789),
(-108.20772968780051, 38.8232055291981),
(-108.07460164311031, 38.8475137825863),
(-107.98560510494981, 38.9439912011017),
(-107.72839858755752, 39.01510930230633),
(-107.78724142490994, 39.195630349659986),
(-108.04939480098754, 39.139504663354245),
(-108.17287000970857, 39.15920066396116),
(-108.31168256553767, 39.13757646212944)]
gm_poly_geom.bounds
(-108.34115668325224,
38.8232055291981,
-107.72839858755752,
39.195630349659986)
Generate boolean index for points that intersect the polygon¶
This will return a new GeoDataSeries with True/False values for each record
idx = sites_gdf_all.intersects(gm_poly_geom)
idx.head()
SNOTEL:301_CA_SNTL False
SNOTEL:907_UT_SNTL False
SNOTEL:916_MT_SNTL False
SNOTEL:1267_AK_SNTL False
SNOTEL:908_WA_SNTL False
dtype: bool
idx.value_counts()
False 928
True 2
dtype: int64
Use fancy indexing to isolate points and return new GeoDataFrame¶
gm_snotel_sites = sites_gdf_all.loc[idx]
gm_snotel_sites
code | name | network | elevation_m | site_property | geometry | |
---|---|---|---|---|---|---|
SNOTEL:622_CO_SNTL | 622_CO_SNTL | Mesa Lakes | SNOTEL | 3048.000000 | {'county': 'Mesa', 'state': 'Colorado', 'site_... | POINT (-108.05835 39.05831) |
SNOTEL:682_CO_SNTL | 682_CO_SNTL | Park Reservoir | SNOTEL | 3035.808105 | {'county': 'Delta', 'state': 'Colorado', 'site... | POINT (-107.87414 39.04644) |
Quick plot¶
f, ax = plt.subplots(figsize=(10,6))
gm_snotel_sites.plot(ax=ax, column='elevation_m', markersize=20, edgecolor='k', cmap='inferno', \
legend=True, legend_kwds={'label':'Elevation (m)'})
#ctx.add_basemap(ax=ax, crs=gm_snotel_sites.crs, source=ctx.providers.Stamen.Terrain)
ax.set_title('Grand Mesa SNOTEL Stations');

import hvplot.pandas
from geoviews import tile_sources as gvts