Accessing the U.S. Wind Turbine Database API for Location Data Visualization


Contact: Chris Garrity | U.S. Geological Survey | [email protected]
Database: U.S. Wind Turbine Database | API Access: USWTDB API | Data Acquisition: About

The United States Wind Turbine Database (USWTDB) provides the locations of land-based and offshore wind turbines in the United States, corresponding wind project information, and turbine technical specifications. Wind turbine records are collected and compiled from various public and private sources, digitized and position-verified from aerial imagery, and quality checked. The USWTDB is available for download in a variety of tabular and geospatial file formats, to meet a range of user and software needs. The following examples access the wind turbine data through the USWTDB API. Accessing raw data through an API lets users stay in sync with the database without the need to download static versions of the data. Learn more about the USWTDB and USWTDB API https://energy.usgs.gov/uswtdb/.

The following Jupyter Notebook examples are targeted for users who are new to Jupyter and notebook environments in general. A notebook integrates code and code output into a single document that combines visualizations, narrative text, mathematical equations, and other media types. This type of workflow promotes iterative and efficient development, making notebooks an increasingly popular choice for contemporary data science and analysis. This notebook contains exhaustive narrative text for each step, tailored to those just starting development in the Jupyter Notebook environment. Learn more about Project Jupyter.

Dependencies Used in These Examples

The examples in this notebook require the installation of two additional python packages. These packages can easily be installed using pip, a well-known standard package manager for Python*. pip allows you to install and manage additional packages that are not part of the Python standard library. The Python installer installs pip by default, so it should be ready for you to use.

pip install mapboxgl

Note the `mapboxgl` package does not have a `conda` distribution and must be installed via `pip`.

mapboxgl allows you to build Mapbox GL JS data driven visualizations natively in Jupyter Notebooks. mapboxgl is a high-performance, interactive, WebGL-based data visualization tool that leverages Mapbox Vector Tiles. Learn more about the MapBox platform https://www.mapbox.com/maps/.

pip install pandas

pandas provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. In the following examples we will be leveraging pandas.DataFrame, generally the most commonly used pandas object. A dataframe is a 2-dimensional labeled data structure with columns of potentially different types. You can think of a dataframe like a traditional spreadsheet or SQL table, composed of rows and columns supporting a variety of tabular data types. pandas provides several methods for reading data in different formats. In these examples, we’ll request our source data through the USWTDB API which returns raw data in JSON format using standard http protocols. Learn more about pandas https://pandas.pydata.org/pandas-docs/stable/index.html.

* `Conda` is another widely-used packaging tool/installer that, unlike `pip`, handles library dependencies outside of strictly Python packages. `Conda` package and environment manager is included in all versions of Anaconda and Miniconda. Those using `conda` can swap `pip` with `conda` for the `pandas` install above (see note about `mapboxgl`). Learn more about `conda` https://docs.conda.io/en/latest/

Handle the Notebook Imports

Python provides a flexible framework for importing modules and specific members of a module. In the examples in this notebook, we will import pandas and give it the alias pd. We will also install the viz submodule and the utils submodule from the mapboxgl package, as well as operating system dependent functionalities via the os import.

In [11]:
import pandas as pd
import os

from mapboxgl.viz import *
from mapboxgl.utils import *

Example 1 - Create a Clustered Turbine Location Map of the Conterminous U.S.

Map clustering algorithms typically find map markers (points) that are near each other and denotes them with a cluster symbol representing the overall density of aggregated map markers. By default, the new symbols are labeled with the number of map markers they contain. We can apply symbol scaling and custom color ramps to the rendered cluster symbols to better help us visualize density of the dataset in our map window. As we zoom in, the algorithm re-calibrates clustering on the fly based on the number of markers in our map view. Map clustering can be a powerful visualization tool when mapping large numbers of marker data and helps users visualize patterns of points without the traditional issues of marker overlap. In this example, we will build a simple cluster map to visualize the locations of turbines at a national scale throughout the United States. Due to the location proximity inherit in the dataset (i.e. turbines typically occur in groups, or 'wind farms' throughout the country), a cluster map becomes a useful tool to help us visualize the overall density of turbines when zoomed out at a national level.

Step 1. Add a National Geologic Map Database Vector Tile Basemap Service

The mapboxgl package leverages a public MapBox token to access MapBox hosted basemap styles. To avoid requiring users of this notebook to have a MapBox account, we will add a USGS hosted vector tile basemap from the National Geologic Map Database (NGMDB) to our notebook. We can call custom vector tile styles using the style parameter when we generate our map visualization and omit the token parameter. Below, we will pass two styles from the NGMDB as variables to use in our notebook exercises. For those with an existing MapBox account, swap the style parameter above with your MapBox token and MapBox style. Learn more about the NGMDB https://ngmdb.usgs.gov.

In [12]:
# NGMDB monochorome style designed to provide a basemap that highlights the data overlay.
ngmdbLight = 'https://ngmdb-tiles.usgs.gov/styles/ngmdb-light/style.json'

# NGMDB full-color style that contains standard cartographic basemap layers, contour lines, and hillshading.
ngmdbBasemap = 'https://ngmdb-tiles.usgs.gov/styles/ngmdb-tv/style.json'

Step 2. Connect to the U.S. Wind Turbine Database API and Preview the Response

As noted previously, the USWTDB API allows for programmatic access to the U.S. Wind Turbine Database by the USGS and partner agencies. Creation of the USWTDB API was meant to extend USWTDB visibility, expand user base, and create more productive internal workflows. The API supports filtering table rows by appending specific attributes, the filter operator, and the filter value to the request. Filters can exclude table rows using simple operators that compare against specified key values. Applying filters to the request allows for more efficient, faster API responses because unneeded data is withheld by the server prior to API return. This is particularly useful when users are interested in a subset of data from the USWTDB. See additional USWTDB API filter operations here https://energy.usgs.gov/uswtdb/api-doc/#operators.

In the first example, we will make a customized http request to the API and return turbines that (1) have a capacity greater than 0 kW to exclude any zero or null capacity values and (2) limit the turbine attributes in the response to case_id (unique ID), t_manu (manufacturer), t_cap (capacity), xlong (longitude), and ylat (latitude). This is done by appending URL parameters ?&t_cap=gt.0&select=case_id,t_manu,t_cap,xlong,ylat to the root level API https://eersc.usgs.gov/api/uswtdb/v1/turbines/. Once a successful request is made, we will parse the JSON response and preview the first 5 records of the pandas.DataFrame.

Note: There are many more attributes related to the USWTDB that can be leveraged in the API request. Feel free to experiment with the URL parameters to build your own custom maps using the USWTDB.

In [13]:
# Call the USWTDB API and apply custom URL parameters to the request. Parameters allow us to filter the data return.
data_url = 'https://eersc.usgs.gov/api/uswtdb/v1/turbines?&t_cap=gt.0&select=case_id,t_manu,t_cap,xlong,ylat'

# Parse the JSON response from the API return and populate the dataframe
dfClusterMap = pd.read_json(data_url)

# Preview the first five records of our dataframe based on the custom URL paramters in the API request
dfClusterMap.head(5)
Out[13]:
case_id t_manu t_cap xlong ylat
0 3072677 Vestas North America 95 -118.36575 35.07787
1 3073412 Vestas North America 95 -118.35526 35.08480
2 3073335 Vestas North America 95 -118.35754 35.08832
3 3072695 Vestas North America 95 -118.36441 35.07744
4 3073327 Vestas North America 95 -118.35787 35.08450

Step 3. Create a GeoJSON Object from the Dataframe

The mapboxgl package supports both vector tile sources and the GeoJSON format for rendering map visualizations. GeoJSON is a common, open standard, geospatial data interchange format based on JSON. It's designed for representing geographical features, along with their non-spatial attributes and spatial extents. Learn more about the GeoJSON format.

The conversion from our pandas.dataframe to GeoJSON is handled by df_to_geojson *. There are a variety of parameters we can pass to the function, but for this example we will (1) define the dataframe columns (attributes) to be passed to our GeoJSON object, (2) define the precision of the turbine latitude/longitude values, and (3) map the names of the dataframe columns to the required latitude and longitude parameters of the function.

* There are a variety of other geospatial entensions like `GeoPandas` to make working with geospatial data in Python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Learn more about the `GeoPandas` project https://geopandas.org/index.html.

In [14]:
# Create GeoJSON object with selected attributes. Define coordinates to three decimal places. Map required lat lon.
turbineClusterGeoJson = df_to_geojson(dfClusterMap,
                          properties=['case_id','t_cap','t_manu'], 
                          precision=3,lat='ylat', lon='xlong')

Step 4. Build the Cluster Map Visualization of Turbine Locations in the Conterminous U.S.

To create the turbine cluster map, we start by creating color 'stops' (or cutoffs) based on the density of the turbine locations (proximity to one another). For our cluster map, we will apply a 6-step diverging color ramp from ColorBrewer (diverging color schemes highlight the largest and smallest ranges) and create stops for proximity bins with counts of 10, 50, 100, 500, 1000, 5000. Next, we will define the sizes of our cluster makers. Finally, we will call ClusteredCircleViz and apply our color ramp along with some custom parameters for our visualization. In the code cell below, we provide a brief explanation for each custom parameter used for rendering our cluster map. An exhaustive list of parameters can be found in the `mapboxgl-jupyter` documentation.

In [15]:
# Define our color stops based on a 6-step divergent color ramp
turbine_color_stops = create_color_stops([10, 50, 100, 500, 1000, 5000], colors='Spectral')

# Define the radius (sizes) of the cluster markers
turbine_radius_stops = [[1, 5], [10, 10], [1000, 15], [5000, 20]]

# Define the parameters for our cluster map 
# Call our NGMDB style as basemap, set the max zoom level for clusters to show 
# Set cluster label sizze and cluster symbol opacity
# Handle initial zoom/center of visualization
turbineClustersMap = ClusteredCircleViz(turbineClusterGeoJson,
                          color_stops=turbine_color_stops,
                          radius_stops=turbine_radius_stops,
                          style=ngmdbLight,
                          cluster_maxzoom=10,
                          label_size=10,
                          opacity=0.6,
                          center=(-95, 40),
                          zoom=3.25)

# Render the cluster map visualization
turbineClustersMap.show()

Example 2 - Create a Graduated Symbol Map of Wind Turbines - San Fransisco, California

Graduated symbols show a quantitative difference between mapped elements by varying the size of the map markers. Attribute values are classified into ranges that are assigned a symbol size representing that range. Symbol size is an effective way to represent differences in magnitude of a selected attribute, because larger markers are naturally associated with a greater amount of something. Using graduated symbols gives you granularity over the size of each symbol in their respective bin ranges, and unlike proportional symbols, are not scaled directly to the absolute min and max of the attribute values.

In this example, we call the USWTDB API with some advanced parameters to fine tune the data returned to the dataframe. We will then render a graduated symbol map of a windfarm north of San Fransisco with symbols sizes based on turbine capacity values (in kW). FInally, we apply a color scheme to our graduated symbol map based on turbine height (in m), effectively visualizing the relationship between turbine capacity and turbine height.

Before we build the map visualization, we will preview USWTDB data for California in some simple plots using the pandas.plotting module. The plots will help us visualize relationships between a variety of turbine attributes like installation year, capacity, and total height.

Step 1. Call the USWTDB API with New Parameters and Preview the Data in Plots

In this example, let's make another customized http request to the API and return turbines that (1) are only located in California (2) have t_cap (capacity) values that are not null and (3) limit the turbine attributes in the response to t_cap (which we will cast as "Capacity"). This is done by appending URL parameters ?&t_state=eq.CA&t_cap=not.is.null&select=Capacity:t_cap to the root level API https://eersc.usgs.gov/api/uswtdb/v1/turbines/. Once a successful request is made, we can parse the JSON response and generate a histogram showing the frequency of turbine capacities for wind turbines in California.

Note: There are other operators related to the USWTDB that can be leveraged in the API request. Feel free to experiment with other URL operators to build your own custom plots using the USWTDB.

In [16]:
# Call the USWTDB API and apply custom URL parameters to the request
caCapHist_url = 'https://eersc.usgs.gov/api/uswtdb/v1/turbines?&t_state=eq.CA&t_cap=not.is.null&select=Capacity:t_cap'

# Parse the JSON response from the API return and populate the dataframe
capHist = pd.read_json(caCapHist_url)

#Display the number of turbines in our API return
display(capHist.count())

#Preview the first 5 records of the return. Data should only include the single attribute "Capacity" as defined by our API request
display(capHist.head(5))

# Generate a histogram showing frequencies of wind turbine capacities. Include number of bins, and size of the plot
capHist.plot.hist(bins=10,
                  figsize=(20,5))
Capacity    5528
dtype: int64
Capacity
0 95
1 95
2 95
3 95
4 95
Out[16]:
<AxesSubplot:ylabel='Frequency'>

From our output, we see that the number of turbines returned from this API request is 5,528 (first output line). Based on our histogram with a bin level of bins=10, we see that the majority of turbines in California have capacities of less than 500 kW. The next largest frequency appears to be turbines with a capacity between 1500-1800 kW. We can increase the number of bins and rerun the cell to further refine capacity ranges in the histogram return. Try running the cell with a bin level of bins=40.

Next, let's generate a scatter plot to help us visualize relationships between turbine installation year, turbine capacity, and turbine total height. We will make another customized request to the USWTDB API for turbines that (1) are located in California (2) have t_cap (capacity) and t_ttlh values that are not null and (3) limit the turbine attributes in the response to p_year (year the turbine project was completed), t_manu (turbine manufacturer), p_name (name of the turbine project), t_ttlh (which we will cast as "Height"), and t_cap (which we will cast as "Capacity"). We will also add xlong (longitude) and ylat (latitude) so we can use the request to generate our graduated symbol map later. All this is done by appending URL parameters ?&t_state=eq.CA&t_cap=not.is.null&t_ttlh=not.is.null&select=p_year,t_manu,p_name,Capacity:t_cap,Height:t_ttlh,xlong,ylat' to the root level API https://eersc.usgs.gov/api/uswtdb/v1/turbines/.

In [17]:
# Call the USWTDB API and apply custom URL parameters to the request.
caTurbines_url = 'https://eersc.usgs.gov/api/uswtdb/v1/turbines?&t_state=eq.CA&t_cap=not.is.null&t_ttlh=not.is.null&select=p_year,t_manu,p_name,Capacity:t_cap,Height:t_ttlh,xlong,ylat'

# Parse the JSON response from the API return and populate the dataframe
caTurbines = pd.read_json(caTurbines_url)

# Display the number of turbines in our API return
display(caTurbines.count())

# Preview the first 5 records of the return. Data should only include the attributes defined by our API request
display(caTurbines.head(5))

# Generate a scatter plot with x-axis=year, y-axis=capacity, colorized (c) by height using 'viridis' matplotlib colormap
caTurbines.plot.scatter(x='p_year',
                         y='Capacity',
                         c='Height',
                         colormap='viridis',
                         figsize=(20,5),
                         sharex=False)
p_year      4099
t_manu      4099
p_name      4099
Capacity    4099
Height      4099
xlong       4099
ylat        4099
dtype: int64
p_year t_manu p_name Capacity Height xlong ylat
0 2010 GE Wind Alta I 1500 118.6 -118.36929 35.03569
1 2008 Vestas Alite Wind Farm 3000 125.0 -118.34369 35.03149
2 2008 Vestas Alite Wind Farm 3000 125.0 -118.34409 35.03590
3 2008 Vestas Alite Wind Farm 3000 125.0 -118.34489 35.03090
4 2008 Vestas Alite Wind Farm 3000 125.0 -118.34529 35.03519
Out[17]:
<AxesSubplot:xlabel='p_year', ylabel='Capacity'>

From our output, we see that the number of turbines returned based on our new API request is 4,099. Based on our scatter plot, we see there is a general trend of increasing capacity and height over time for wind turbines in California. If we wanted to see if this general trend was the same at a national level, we would remove the parameter &t_state=eq.CA from our API request and re-run the cell. This is a simple example highlighting the efficiency of data delivery through an API. We only request the data we need for our analysis by modifying and/or appending simple URL parameters to the root level API endpoint. We also stay in sync with the latest version of the data because we're pulling from the source, not a static flat-file that may be out of date.

Step 2. Create a GeoJSON Object from the Dataframe

Just like before let's create a GeoJSON object from the dataframe we just defined. We start by (1) passing the dataframe columns (attributes) to be added to our GeoJSON object, (2) define the precision of the turbine latitude/longitude values, and (3) map the names of the dataframe columns to the required latitude and longitude parameters of the function*.

*As seen in examples above, we can cast lat:ylat and lon:xlong in the API request to avoid having to map lat='ylat', lon='xlong' in df_to_geojson.

In [18]:
# Create GeoJSON object from our 'caTurbines' dataframe
turbineGradSymGeoJson = df_to_geojson(caTurbines,
                          properties=['p_name','Capacity','t_manu', 'Height'], 
                          precision=3,lat='ylat', lon='xlong')

Step 3. Build the Graduated Symbol Map Using Multiple Turbine Attributes

Let's say we wanted to show a graduated symbol map that had marker symbols with sizes based on turbine capacity, and a color ramp represesnting turbine height (where hotter colors depicted increased turbine height). To create the map, first create color 'stops' (or cutoffs) based on the turbine height ranges in the database. In this example, we will hard code in RGB values (warm to hot) at each of our five defined stop. Next, we will define the sizes of our markers based on turbine capacity. Again, let's define five stops representing a range of turbine capacities. Finally, we can call GraduatedCircleViz and apply our color and radius bins, along with some custom parameters for our visualization. In the code cell below, we provide a brief explanation for each custom parameter used for rendering our cluster map. An exhaustive list of parameters can be found in the `mapboxgl-jupyter` documentation.

In [19]:
# Assign color breaks based on turbine height ranges (in m)
tubine_height_color_bins= [[25, 'rgb(43,131,186)'],  
            [50, 'rgb(171,221,164)'], 
            [100, 'rgb(255,255,191)'], 
            [120, 'rgb(253,174,97)'], 
            [180, 'rgb(215,25,28)']]

# Assign marker radius size based on turbine capacity ranges (in kW)
turbine_radius_bins = [[0, 0],
                       [1000, 3],
                       [2000, 6],
                       [3000, 9],
                       [4000, 12]]

# Define the parameters for our graduated symbol map 
# Call our NGMDB style as basemap, apply our color and radius stops 
# Set cluster symbol opacity and stroke, add scalebar and scalebar styles
# Handle initial zoom/center of visualization
turbineGradSymbolMap = GraduatedCircleViz(turbineGradSymGeoJson, 
                style= ngmdbBasemap,
                color_property='Height',
                color_function_type='interpolate',
                color_stops=tubine_height_color_bins,
                radius_property='Capacity',
                radius_stops=turbine_radius_bins, 
                radius_function_type='interpolate', 
                radius_default=1,
                opacity=0.75,                          
                stroke_color='black',
                stroke_width=0.15,
                scale=True,
                scale_unit_system='imperial',
                scale_background_color='#0000ff00',
                center=(-121.8, 38.13),
                zoom=10.75)

# Generate map labels of turbine project names and adjust label properties
turbineGradSymbolMap.label_property = "p_name"
turbineGradSymbolMap.label_size = 5

#Render the map
turbineGradSymbolMap.show() 

Step 4. Export the Notebook Maps as Standalone Web Maps

You may want to view/share the maps generated in your notebook as standalone web maps. Standalone web maps can be displayed on web and mobile devices without the need for notebook dependencies. They look exactly like the inline maps in your notebook and carry with them all the interactivity and control parameters defined in your code. The web map will include your data packaged in the HTML file. You can generate a standalone web map from mapboxgl by calling create_html() with standard Python protocol. The standalone web maps will be written to your Jupyter notebook home directory.

In [20]:
# Generate a standalone web map of the USWTDB cluster map 
with open('uswtdbClusterMap.html', 'w') as f:
    f.write(turbineClustersMap.create_html())

# Generate a standalone web map of the California USWTDB graduated symbol map    
with open('uswtdbGradSymbolMap.html', 'w') as f:
    f.write(turbineGradSymbolMap.create_html())

USWTDB Attribution and Disclaimer

The creation of the USWTDB was jointly funded by the U.S. Department of Energy (DOE) Wind Energy Technologies Office (WETO) via the Lawrence Berkeley National Laboratory (LBNL) Electricity Markets and Policy Group, the U.S. Geological Survey (USGS) Energy Resources Program, and the American Wind Energy Association (AWEA). The database is being continuously updated through collaboration among LBNL, USGS, and AWEA. Wind turbine records are collected and compiled from various public and private sources, digitized or position-verified from aerial imagery, and quality checked. Technical specifications for turbines are obtained directly from project developers and turbine manufacturers, or they are based on data obtained from public sources.

Map services and data downloaded from the U.S. Wind Turbine Database are free and in the public domain. There are no restrictions; however, we request that the following acknowledgment statement be included in products and data derived from our map services when citing, copying, or reprinting: "Map services and data are available from U.S. Wind Turbine Database, provided by the U.S. Geological Survey, American Wind Energy Association, and Lawrence Berkeley National Laboratory via https://energy.usgs.gov/uswtdb.

Although this digital spatial database has been subjected to rigorous review and is substantially complete, it is released on the condition that neither the USGS, LBNL, AWEA nor the United States Government nor any agency thereof, nor any employees thereof, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information contained within the database.