Formula 1 Social Media Analysis
Formula 1 Social Media Analysis
Overview
This project explores how Formula 1 race performance relates to fan engagement on social media, with a focus on Reddit. My primary contribution was designing and implementing the Reddit data pipeline used to analyze fan discussions and engagement patterns.
My Contribution
- Built a Reddit scraper using PRAW to collect posts from the r/formula1 subreddit
- Queried posts for each driver and extracted metadata including score, upvotes, comments, and timestamps
- Structured raw data into a clean dataset for analysis
- Engineered engagement metrics and prepared data for correlation analysis
Methodology
- Collected top posts for each driver over defined time windows before races
- Aggregated engagement metrics (average score, comments, upvotes)
- Aligned Reddit activity with race results data
- Performed correlation analysis to measure relationships between engagement and performance
Key Insights
- Higher Reddit engagement is negatively correlated with subsequent race performance
- Posts about drama, incidents, and controversy generate the most interaction
- Fan sentiment has only a weak relationship with actual driver success

Tools & Technologies
- Python (pandas)
- PRAW (Reddit API)
- VADER Sentiment Analysis
- Matplotlib
You can view the source code and explore this project in more detail here!