The National Park Service

Modeling National Park Visitor Density

This presentation details a data science approach to developing a recommendation algorithm for the National Park Service (NPS), based on visitor density across US National Parks.



The Problem

“A Surge in Visitors is Overwhelming America's National Parks. Growing crowds at U.S. National Parks have become unmanageable, jeopardizing the natural experience the parks were created to provide. With seasonal attendance continuing to shatter records, officials are considering limiting use of the parks in order to save them.”



Leverage visitation and geographic data of over 400 U.S. National Parks to identify over-visited parks in relationship to the available area, and identify crowded parks to which can be allocated additional resources for management and/or deter guests to less densely visited parks



The Solution

Rebalance park visitor density through cleaning the data, analyzing the data and modeling the data through linear, logistic and decision tree regression. In addition, I used analyzed the data through K Means Clustering. 


Next Steps

Three main steps to improve the accuracy and core results of the models include:

1. Supplement data to include more accurate measurements and additional characteristics

2. Need to develop more expertise with model parameters (i.e. refining the grouping of similar parks by tuning the number of clusters of parks

3. Need the ability to test models on a larger time series data set