Overview

As part of a team project, I developed machine learning models to predict movie box office performance and identify key drivers of franchise success using a 1M-record TMDb dataset.

My Contributions

  • Cleaned and preprocessed large-scale movie data, handling missing values and removing sparse features
  • Engineered features including release date transformations and multi-label genre encoding
  • Built and evaluated Linear Regression, Random Forest, and Logistic Regression models

Results

  • Achieved up to R² = 0.75 for revenue prediction
  • Reached 72% accuracy in classifying box office success

Tools & Technologies

  • Python (pandas, scikit-learn)
  • Data preprocessing & feature engineering
  • Supervised machine learning models
You can view the source code and explore this project in more detail here!