Vector data¶
Authors: David Shean & Scott Henderson
SnowEx Hackweek July 13, 2021
Learning Objectives
A 30 minute guide to vector data for SnowEX Hackweek
find, visualize, interpret vector data formats
use Python raster libraries geopandas
GeoPandas¶
pandas is great, but what if we want to do some geospatial operations - like reproject our vector data or compute the intersection between Point and Polygon features?
Enter Geopandas - all the great things about pandas, plus geo! (http://geopandas.org/).
“GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and descartes and matplotlib for plotting.”
“GeoPandas enables you to easily do operations in python that would otherwise require a spatial database such as PostGIS.”
Under the hood, GeoPandas is pandas
plus some other core geospatial packages:
shapely
for geometry operations (https://shapely.readthedocs.io/en/stable/manual.html)fiona
for reading/writing GIS file formats (https://fiona.readthedocs.io/en/latest/manual.html)pyproj
for projections and coordinate system transformations (http://pyproj4.github.io/pyproj/stable/)
Under those hoods are lower-level geospatial libraries (GEOS, GDAL/OGR, PROJ), written in C/C++/Java and compiled (fast!), that provide a foundation for most GIS software (open-source and commercial). I encourage you to explore these - I guarantee you will learn something valuable.
GDAL/OGR
https://gdal.org/PROJ
https://proj.org/
For now, let’s explore some basic geopandas functionality.
import geopandas as gpd
import hvplot.pandas
import holoviews as hv
Coordinate Reference Systems (CRS)¶
These are essential for geospatial analysis, but can be confusing (even for seasoned GIS professionals). Fortunately, most applications use common CRS with a shared database of EPSG codes. https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset
Here are some common CRS you will encounter:
EPSG:4326 - WGS84, geodetic latitude and longitude
EPSG:3857 - “web mercator”, used by most tiled basemaps and online/mobile mapping platforms, https://en.wikipedia.org/wiki/Web_Mercator_projection
Usually OK for display purposes, maybe some analysis at lower latitudes, but considerable distortion at high latitudes
UTM zones - convenient compromise projections for smaller areas (<500 km)
EPSG:326XX - where XX is zone number (e.g., 12), WGS84 ellipsoid
EPSG:269XX - where XX is zone number, NAD83(2011) ellipsoid
Key takeaways for now:¶
Make sure your dataset has a CRS defined (consult metadata if unspecified)
While ArcGIS/QGIS can reproject datasets to a common CRS “on the fly”
GeoPandas/Shapely/GEOS performs calculations using 2D Cartesian coordinates (e.g., computing distance with Pythagorean theorem, not true 3D geodetic distance on surface of the Earth). Choice of CRS for these calculations is important (e.g., use equal-area projection for area calculations)
Most of the time, you will use one of the above CRS definitions. For larger regional analysis, consider a custom projection defined by your domain and planned analysis: https://projectionwizard.org/
Let’s use some Polygons for US States¶
Hmmm, let’s see. Two choices:
We could go to ESRI or the U.S. Census website, identify and download a shapefile, unzip 4+ files, copy/paste the appropriate *.shp filename into the notebook. Wait, how can I download on a remote server? OK, maybe run something like
wget http://...
, unzip, provide absolute path
- OR -Give geopandas a url string that points to a GeoJSON file somewhere on the web, and read dynamically
Yeah, let’s go with #2
Let’s use the US States 5M GeoJSON here: http://eric.clst.org/tech/usgeojson/
JSON (JavaScript Object Notation) is a structured text-based data format. A Jupyter notebook is a json text file.
GeoJSON extends this format to include geospatial information. It’s pretty great. If you are unfamiliar, take a moment to read about GeoJSON: https://en.wikipedia.org/wiki/GeoJSON
Take a look at the 5M file contents in your browser or download and open with a text editor. Note organization structure. How does this compare to, say, a Python dictionary object?
This is a GeoJSON file!
Read the file using GeoPandas by passing the url to gpd.read_file()
.
states_url = 'http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_5m.json'
#states_url = 'http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_500k.json'
states_gdf = gpd.read_file(states_url)
states_gdf.head()
GEO_ID | STATE | NAME | LSAD | CENSUSAREA | geometry | |
---|---|---|---|---|---|---|
0 | 0400000US01 | 01 | Alabama | 50645.326 | MULTIPOLYGON (((-88.12466 30.28364, -88.08681 ... | |
1 | 0400000US02 | 02 | Alaska | 570640.950 | MULTIPOLYGON (((-166.10574 53.98861, -166.0752... | |
2 | 0400000US04 | 04 | Arizona | 113594.084 | POLYGON ((-112.53859 37.00067, -112.53454 37.0... | |
3 | 0400000US05 | 05 | Arkansas | 52035.477 | POLYGON ((-94.04296 33.01922, -94.04304 33.079... | |
4 | 0400000US06 | 06 | California | 155779.220 | MULTIPOLYGON (((-122.42144 37.86997, -122.4213... |
states_gdf.plot()
<AxesSubplot:>

Check the CRS¶
Note that this was defined when we opened the file with GeoPandas - by default, a GeoJSON is assumed to use EPSG:4326 coordinate system for geodetic latitude and longitude
states_gdf.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
states_gdf.hvplot()
states_gdf.hvplot(geo=True)