Final Report Exploratory Analysis of movies from IMDb and Prediction of Movie Rating

.docx

School

George Mason University *

*We aren’t endorsed by this school

Course

MISC

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

11

Uploaded by JusticeSkunkPerson786

Report
EXPLORATORY ANALYSIS OF MOVIES DATA FROM IMDB Authors: Diana Duarte Guedes Sai Aishwarya Namrata Reddy Palle Syeda Yumma Batool Zaidi Laysa Priya Rudroju K. Paola C. Reyes Sydney| CS-504 | December 2022 December 7,2022 Professor:
I. ABSTRACT Paragraph based on slide . II. INTRODUCTION Movies are not only a source of entertainment these days but are also a key source of international trade and marketing. People, especially young people, become caught up in new trends of movies. The success of movies is something that concerns everyone, not just movie directors and box office executives. IMDB is an online Internet Movie Database with all the information related to movies. It is basically a platform that keeps track of movies data such as genres, stars, directors, movie rating, meta-score, gross, votes, etc. These platforms are growing in popularity day by day since they provide individuals with frank reviews. Therefore, there is a huge amount of information on movie reviews and ratings online. In this project, this information is utilized for predictions, visualizations, and modeling. IMDB has become so popular that most people watch movies by looking into the rating provided by them. So, this project's aim is to scrap the data from the IMDB movies, find the attributes that contribute more towards the rating of the movies and the attributes that contribute towards the gross of the movies, to give some incites to the movie makers. III. METHODOLOGY IV. T O O L S The tools used in this project are excel, Jupiter notebook, RStudio and Tableau. MS Excel is used to store the data that is fetched from the IMBD site through the scraping of the data. The Jupiter notebook using python is used for scrapping data, exploratory analysis, correlation and predictions on different models. Even RStudio is used for the predictions of different models and Tableau is used for the visualizations for the exploratory analysis.
V. DATA SCRAPPING VI. SENTIMENT ANALYSIS VII. EXPLORATORY ANALYSIS The steps for this section are shown in the Exploratory Analysis diagram.
Dataset Description In the dataset that we have scrapped from IMDB movies has 1000 rows and 18 columns with 12 numeric columns and 6 categorical columns. Column Names Description Range of Columns Id Movie Id 0-999 Movie_Name Name of the movie - Year_of_Release The year in which the movie is released 1920-2022 Watch_Time The total time the movie runs 45-321 Movie Rating Rating for a movie given by IMDB 7.6-9.3 Meatscore_of_movie Score given to a movie by average reviews 28-100 Votes Number of votes a movie got 25547-2655610 Gross An amount that a movie earned 0-936.66 Description A short description of movies - Rank It is the rank given to each movie 1-1000 No_Of_Reviews Total number of reviews a movie got 14-25 Positive_Reviews Total number of positive reviews 4-25 Negative_Reviews Total number of negative reviews 0-21 Certificate Certificate given to each movie - Top250 Ranked the top 250 movies 1-250 Director Name of the directors - Stars Name of the stars - Genre Different genres names - Predicted Attribute As seen in the Rating distribution plot, 6.10% of the Movies is greater and equal 8.5. Most of the Movies in the data set have a rating lower than 8.5. Distribution of Attributes (From slide ~15-23) Numerical attributes The following three boxplots show the distribution of Runtime (Watch time), Gross and Number of votes for the two groups of rating. As seen on the plot, for those movies with a rating greater and equal to 8.5 (Green), the mentioned attributes have a higher average. The movies with the highest rating have on average around 150 minutes of runtime, $100,000 gross and around 1,000,000 votes. And the bar plot shows the top 5 meta scores of the movies and their corresponding rating. Meta Score and Movie Ratings need not be proportional but from the plot we can say that the Meta Score 96 has a lesser rating than in comparison to Meta score 94.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help