A Data Science Project on the Relationship Between Flight Delays and Weather Conditions

Overview

This project investigates how weather conditions influence flight delays across major U.S. airports. Conducted as a group project for COMPSCI 526 (Data Science) at Duke University, the study integrates large-scale aviation and meteorological datasets to analyze delay patterns and build predictive models.

Data Sources

  • Flight Delay Data:
    U.S. Bureau of Transportation Statistics (BTS) flight records
  • Weather Data:
    NOAA National Centers for Environmental Information (NCEI) hourly meteorological observations

The dataset spans 36 months (January 2022 – December 2024). For each flight, the nearest weather observation (both geographically and temporally) was matched to represent atmospheric conditions at departure time.

Methods

  • Exploratory Data Analysis (EDA) on delay distributions and weather variables
  • Feature engineering on weather attributes (visibility, wind, precipitation, etc.)
  • Predictive modeling to estimate delay likelihood and severity
  • Evaluation of weather impact across airports and seasons

My Role

I contributed to data preprocessing, weather–flight data integration, exploratory analysis, and predictive modeling. I also helped interpret results and communicate insights through visualizations and reports.

Repository

🔗 GitHub – Flight Delay & Weather Analysis