Formula 1 Social Media Analysis

Overview

This project explores how Formula 1 race performance relates to fan engagement on social media, with a focus on Reddit. My primary contribution was designing and implementing the Reddit data pipeline used to analyze fan discussions and engagement patterns.

My Contribution

  • Built a Reddit scraper using PRAW to collect posts from the r/formula1 subreddit
  • Queried posts for each driver and extracted metadata including score, upvotes, comments, and timestamps
  • Structured raw data into a clean dataset for analysis
  • Engineered engagement metrics and prepared data for correlation analysis

Methodology

  • Collected top posts for each driver over defined time windows before races
  • Aggregated engagement metrics (average score, comments, upvotes)
  • Aligned Reddit activity with race results data
  • Performed correlation analysis to measure relationships between engagement and performance

Key Insights

  • Higher Reddit engagement is negatively correlated with subsequent race performance
  • Posts about drama, incidents, and controversy generate the most interaction
  • Fan sentiment has only a weak relationship with actual driver success

Tools & Technologies

  • Python (pandas)
  • PRAW (Reddit API)
  • VADER Sentiment Analysis
  • Matplotlib

You can view the source code and explore this project in more detail here!