Assignment 12 - Problems
.html
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
6400
Subject
Industrial Engineering
Date
Feb 20, 2024
Type
html
Pages
5
Uploaded by CaptainSparrowMaster1000
IE6400 Foundations for Data Analytics
Engineering
¶
Fall 2023
¶
Assignment 12
¶
Module 2: Probability
¶
Question 1: Monty Hall Problem Simulation and Analysis
¶
Background:
¶
The Monty Hall problem is a famous probability puzzle named after the host of the television game show "Let's Make a Deal." The problem goes as follows:
•
A contestant is presented with three doors. Behind one of them is a car (which the contestant wants), and behind the other two are goats. •
The contestant selects one of the doors, say Door A. •
The host, Monty Hall, who knows what's behind each door, opens another door, say Door B, revealing a goat. •
Monty now asks the contestant if they want to stick with their initial choice (Door
A) or switch to the remaining door (Door C). •
The contestant makes a decision: stick or switch. •
The question is, is it in the contestant's best interest to stick with their initial choice, switch, or does it not matter? Task:
¶
Your goal is to simulate the Monty Hall problem using Python and determine the empirical probabilities of winning the car for both strategies: sticking with the initial choice and switching after Monty reveals a goat.
Dataset: 'monty_hall_trials.csv'
¶
In [ ]:
import pandas as pd
import numpy as np
np.random.seed(999)
def simulate_monty_hall(num_trials):
doors = ['A', 'B', 'C']
results = []
for i in range(num_trials):
car_location = np.random.choice(doors)
initial_choice = np.random.choice(doors)
remaining_doors = [door for door in doors if door != initial_choice and door !=
car_location]
monty_reveal = np.random.choice(remaining_doors)
# Simulating equal probability of sticking or switching
final_decision = np.random.choice(['Stick', 'Switch'])
if final_decision == 'Stick':
win = 1 if initial_choice == car_location else 0
else:
switch_to = [door for door in doors if door != initial_choice and door != monty_reveal][0]
win = 1 if switch_to == car_location else 0
results.append([i+1, initial_choice, monty_reveal, car_location, final_decision, win])
return pd.DataFrame(results, columns=['trial', 'initial_choice', 'monty_reveal', 'actual_car_location', 'final_decision', 'win'])
df = simulate_monty_hall(1000)
df.to_csv('monty_hall_trials.csv', index=False)
The dataset contains six columns:
¶
•
trial: The trial number. •
initial_choice: The initial door chosen by the contestant. •
monty_reveal: The door Monty reveals to have a goat. •
actual_car_location: The door behind which the car is actually located. •
final_decision: The contestant's final decision, either "Stick" or "Switch". •
win: Whether the contestant won the car (1 for win, 0 for lose). Requirements: 1. Data Loading and Preprocessing:
¶
•
Load the dataset monty_hall_trials.csv into a Pandas DataFrame. •
Check for any missing or inconsistent data entries and handle them. •
Display a summary of the dataset. 2. Simulation Analysis:
¶
•
Calculate the empirical probability of winning the car for both strategies: sticking
and switching. 3. Visualization:
¶
•
Plot a bar chart comparing the winning probabilities for both strategies. •
Ensure the graph is appropriately labeled with a relevant title and annotations. 4. Interpretation:
¶
•
Discuss the empirical results in the context of the theoretical probabilities. •
Offer insights into the optimal strategy for a contestant based on the simulation results. Evaluation Criteria:
¶
•
Correctness and efficiency of the Python code. •
Proper handling and preprocessing of the dataset. •
Accurate calculation and interpretation of empirical probabilities. •
Quality and clarity of visualizations. •
Insightful interpretations and conclusions regarding the Monty Hall problem. Question 2: Poisson Process Analysis of Website Hits
¶
Background:
¶
A Poisson process is a mathematical model for events that happen at random points in
time and space, where the average rate of occurrence is constant and known. A common application of this process is in modeling the number of times a website is accessed over a given time interval.
Scenario:
¶
You are a data analyst at a tech company. The company's main website has been receiving hits, and you suspect that the hits can be modeled as a Poisson process. Your task is to analyze the website hits data and verify if it indeed follows the Poisson distribution.
Dataset: 'website_hits.csv'
¶
In [ ]:
import pandas as pd
import numpy as np
np.random.seed(12345)
# Generating hits using Poisson distribution
# Assuming mean hits per hour is 6
hits_per_hour = np.random.poisson(lam=6, size=24)
time_intervals = [f"{i}-{i+1}" for i in range(24)]
df = pd.DataFrame({
'time_interval': time_intervals,
'hits': hits_per_hour
})
df.to_csv('website_hits.csv', index=False)
The dataset contains two columns:
¶
•
time_interval: Represents hourly intervals over a 24-hour period (e.g., "0-1" represents the interval from midnight to 1 AM). •
hits: The number of website hits recorded during the corresponding time interval. Requirements: 1. Data Loading and Preprocessing:
¶
•
Load the dataset website_hits.csv into a Pandas DataFrame. •
Check for any missing or inconsistent data entries and handle them. •
Display the basic statistics of the dataset. 2. Poisson Distribution Fitting:
¶
•
Calculate the mean hit rate from the data. •
Using the calculated mean, generate the expected hit frequencies for each hour if the process follows a Poisson distribution. 3. Visualization:
¶
•
Plot a bar chart showing both the observed and expected hits for each hourly interval. The bars for observed and expected hits should be side-by-side for comparison. •
Ensure the graph is properly labeled with a relevant title, legend, and annotations. 4. Hypothesis Testing:
¶
•
Conduct a goodness-of-fit test (e.g., chi-squared test) to determine if the observed hits significantly differ from a Poisson distribution with the calculated mean rate. 5. Interpretation:
¶
•
Discuss the results of the visualization and hypothesis test. •
Provide insights and recommendations to the company based on your findings. Evaluation Criteria:
¶
•
Correctness and efficiency of the Python code. •
Proper handling and preprocessing of the dataset. •
Accurate fitting of the Poisson distribution and calculation of expected frequencies. •
Quality and clarity of visualizations. •
Thoroughness of hypothesis testing. •
Insightful interpretations and conclusions drawn from the analysis. Question 3: Bayesian Analysis of Product Review Sentiments
¶
Background:
¶
Bayesian statistics is a branch of probability theory that applies probability to
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help