A Data Science Project on the Relationship Between Flight Delays and Weather Conditions
Overview
This project investigates how weather conditions influence flight delays across major U.S. airports. Conducted as a group project for COMPSCI 526 (Data Science) at Duke University, the study integrates large-scale aviation and meteorological datasets to analyze delay patterns and build predictive models.
Data Sources
- Flight Delay Data:
U.S. Bureau of Transportation Statistics (BTS) flight records - Weather Data:
NOAA National Centers for Environmental Information (NCEI) hourly meteorological observations
The dataset spans 36 months (January 2022 – December 2024). For each flight, the nearest weather observation (both geographically and temporally) was matched to represent atmospheric conditions at departure time.
Methods
- Exploratory Data Analysis (EDA) on delay distributions and weather variables
- Feature engineering on weather attributes (visibility, wind, precipitation, etc.)
- Predictive modeling to estimate delay likelihood and severity
- Evaluation of weather impact across airports and seasons
My Role
I contributed to data preprocessing, weather–flight data integration, exploratory analysis, and predictive modeling. I also helped interpret results and communicate insights through visualizations and reports.