assignement-3-tanmay-agarwal
pdf
keyboard_arrow_up
School
New Jersey Institute Of Technology *
*We aren’t endorsed by this school
Course
680
Subject
Statistics
Date
Jan 9, 2024
Type
Pages
10
Uploaded by AgentWaterPrairieDog32
10/31/23, 7:49 PM
assignement-3-tanmay-agarwal
localhost:8888/nbconvert/html/Downloads/assignement-3-tanmay-agarwal.ipynb?download=false
1/10
Sentiment Analysis of IMDB Movie Reviews
Problem Statement:
In this, we have to predict the number of positive and negative reviews based on sentiments by
using different classification models.
Import necessary libraries
['IMDB Dataset.csv']
Reading the dataset
review
sentiment
0
One of the other reviewers has mentioned that ...
positive
1
A wonderful little production. <br /><br />The...
positive
2
I thought this was a wonderful way to spend ti...
positive
3
Basically there's a family where a little boy ...
negative
4
Petter Mattei's "Love in the Time of Money" is...
positive
Exploratery data analysis
In [1]:
#Load the libraries
import
numpy
as
np
import
pandas
as
pd
import
nltk
from
sklearn.feature_extraction.text
import
CountVectorizer
from
sklearn.feature_extraction.text
import
TfidfVectorizer
from
sklearn.preprocessing
import
LabelBinarizer
from
nltk.corpus
import
stopwords
from
nltk.stem.porter
import
PorterStemmer
from
nltk.tokenize
import
word_tokenize
,
sent_tokenize
from
bs4
import
BeautifulSoup
import
re
from
nltk.tokenize.toktok
import
ToktokTokenizer
from
sklearn.linear_model
import
LogisticRegression
,
SGDClassifier
from
sklearn.naive_bayes
import
MultinomialNB
from
sklearn.svm
import
SVC
from
sklearn.metrics
import
classification_report
,
confusion_matrix
,
accuracy_score
import
os
print
(
os
.
listdir
(
"../input"
))
import
warnings
warnings
.
filterwarnings
(
'ignore'
)
In [3]:
#importing the training data
imdb_data
=
pd
.
read_csv
(
'../input/IMDB Dataset.csv'
)
imdb_data
.
head
()
Out[3]:
In [7]:
#Checking the data description
10/31/23, 7:49 PM
assignement-3-tanmay-agarwal
localhost:8888/nbconvert/html/Downloads/assignement-3-tanmay-agarwal.ipynb?download=false
2/10
review
sentiment
count
50000
50000
unique
49582
2
top
Loved today's show!!! It was a variety and not...
positive
freq
5
25000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 2 columns):
review
50000 non-null object
sentiment
50000 non-null object
dtypes: object(2)
memory usage: 781.3+ KB
Sentiment count
positive
25000
negative
25000
Name: sentiment, dtype: int64
From the above results. Here are the following observations:
1. There are total 50000 reviews in the dataset
2. There are no null values present in the dataset
3. The dataset is not biased as we have equal proportion of both positive and negative
reviews in the "Sentiment" column (target feature)
Text Preprocessing
performing text preprocessing to tokenize the reviews and clean the dataset before applyig
machine learning models to the dataset
Removing html strips and noise text
imdb_data
.
describe
(
include
=
'all'
)
Out[7]:
In [6]:
# checking the info of the dataset
imdb_data
.
info
()
In [8]:
#checking the distribution of labels in our target column
imdb_data
[
'sentiment'
]
.
value_counts
()
Out[8]:
In [12]:
#Tokenization of text
tokenizer
=
ToktokTokenizer
()
#Setting English stopwords
stopword_list
=
nltk
.
corpus
.
stopwords
.
words
(
'english'
)
In [13]:
#Removing the html strips
def
strip_html
(
text
):
soup
=
BeautifulSoup
(
text
,
"html.parser"
)
return
soup
.
get_text
()
10/31/23, 7:49 PM
assignement-3-tanmay-agarwal
localhost:8888/nbconvert/html/Downloads/assignement-3-tanmay-agarwal.ipynb?download=false
3/10
Removing special characters
Text stemming
Removing stopwords
#Removing the square brackets
def
remove_between_square_brackets
(
text
):
return
re
.
sub
(
'\[[^]]*\]'
,
''
,
text
)
#Removing the noisy text
def
denoise_text
(
text
):
text
=
strip_html
(
text
)
text
=
remove_between_square_brackets
(
text
)
return
text
#Apply function on review column
imdb_data
[
'review'
]
=
imdb_data
[
'review'
]
.
apply
(
denoise_text
)
In [14]:
#Define function for removing special characters
def
remove_special_characters
(
text
,
remove_digits
=
True
):
pattern
=
r'[^a-zA-z0-9\s]'
text
=
re
.
sub
(
pattern
,
''
,
text
)
return
text
#Apply function on review column
imdb_data
[
'review'
]
=
imdb_data
[
'review'
]
.
apply
(
remove_special_characters
)
In [15]:
#Stemming the text
def
simple_stemmer
(
text
):
ps
=
nltk
.
porter
.
PorterStemmer
()
text
=
' '
.
join
([
ps
.
stem
(
word
)
for
word
in
text
.
split
()])
return
text
#Apply function on review column
imdb_data
[
'review'
]
=
imdb_data
[
'review'
]
.
apply
(
simple_stemmer
)
In [16]:
#set stopwords to english
stop
=
set
(
stopwords
.
words
(
'english'
))
print
(
stop
)
#removing the stopwords
def
remove_stopwords
(
text
,
is_lower_case
=
False
):
tokens
=
tokenizer
.
tokenize
(
text
)
tokens
=
[
token
.
strip
()
for
token
in
tokens
]
if
is_lower_case
:
filtered_tokens
=
[
token
for
token
in
tokens
if
token
not
in
stopword_list
]
else
:
filtered_tokens
=
[
token
for
token
in
tokens
if
token
.
lower
()
not
in
stopword_
filtered_text
=
' '
.
join
(
filtered_tokens
)
return
filtered_text
#Apply function on review column
imdb_data
[
'review'
]
=
imdb_data
[
'review'
]
.
apply
(
remove_stopwords
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10/31/23, 7:49 PM
assignement-3-tanmay-agarwal
localhost:8888/nbconvert/html/Downloads/assignement-3-tanmay-agarwal.ipynb?download=false
4/10
{'down', 'you', 'their', 'these', 'no', "didn't", 'where', 'more', 'about', 'so', 'bo
th', 'himself', 'who', 'some', "shan't", 'before', 'why', 'me', 'this', "hadn't",
'o', 'up', 'further', 'own', 'was', 'yours', 'which', 'those', 'the', 'weren', "you'l
l", 'it', 'by', "you're", 'but', 'few', 'wasn', 'is', 'were', 'shouldn', "needn't",
"you'd", 'too', 're', 'be', "won't", "should've", 'only', 'than', 'y', 'aren', 'unde
r', 'am', 's', 'if', 'shan', 'ours', 'should', 'because', 'didn', 'between', 'won',
'each', 'there', 'ain', 'for', 'wouldn', 'most', 'of', 'll', 'are', "weren't", 'in',
'did', 'to', 'had', "aren't", 'at', "haven't", 'itself', 't', 'any', 'on', 'above',
"isn't", 'will', 'isn', 'they', 'with', 'now', 'until', 'over', 'into', "that'll", 'h
aving', 'does', "it's", 'not', 'what', 'while', 'do', 'needn', 'other', 'whom', 'ou
t', 'hadn', "doesn't", 'from', 'such', 'mightn', 'against', 'ourselves', "wasn't", 'h
ers', 'myself', "shouldn't", 'herself', 'his', 'an', 'during', 'ma', 'below', "do
n't", "you've", 'as', 'can', "couldn't", 'again', 'he', 'been', 'm', 'she', 'yoursel
f', 'off', "wouldn't", "she's", 'my', 'then', 'how', 'nor', 'doesn', 've', 'a', 'it
s', 'after', 'or', 'hasn', 'that', 'your', 'and', 'don', 'we', 'them', 'once', 'bein
g', 'doing', 'has', 'theirs', 'yourselves', "mightn't", 'couldn', 'i', 'same', "has
n't", 'him', 'very', 'haven', "mustn't", 'just', 'when', 'themselves', 'd', 'her', 'h
ere', 'mustn', 'our', 'through', 'have', 'all'}
Normalized train reviews
'thi movi aw cant even bother write review thi garbag say one bore film ive ever seen
and act veri bad boy play main charact realli annoy got express hi face movi want sla
p basic 80 movi slow motion shot skateboard weird music utter shtappar ive got write
least 10 line text submit thi comment ill use line say lead charact ha got one face w
ant slapmeh give upthi movi suck'
Normalized test reviews
'surviv christma surprisingli funni movi especi consid bad public wa first releas ben
affleck funni obnoxi millionair pay famili occupi hi childhood home hi famili christm
a drive famili crazi overindulg christma cheer ben affleck fan past though like dared
evil paycheck well cast thi role also like christina appleg daughter famili cant stan
d affleck charact first sure see thi movi go dont care ignor critic say rent thi movi
becaus funnier lot christma movi'
In [18]:
#normalized train reviews
norm_train_reviews
=
imdb_data
.
review
[:
40000
]
# checking one of the rows
norm_train_reviews
[
1000
]
Out[18]:
In [19]:
#Normalized test reviews
norm_test_reviews
=
imdb_data
.
review
[
40000
:]
norm_test_reviews
[
46000
]
Out[19]:
In [29]:
# setting the display width of the column to max
pd
.
set_option
(
'display.max_colwidth'
,
1000
)
In [30]:
norm_train_reviews
.
head
()
10/31/23, 7:49 PM
assignement-3-tanmay-agarwal
localhost:8888/nbconvert/html/Downloads/assignement-3-tanmay-agarwal.ipynb?download=false
5/10
0
one review ha mention watch 1 Oz episod youll hook right thi exactli happen meth
first thing struck Oz wa brutal unflinch scene violenc set right word GO trust thi sh
ow faint heart timid thi show pull punch regard drug sex violenc hardcor classic use
wordit call OZ nicknam given oswald maximum secur state penitentari focus mainli emer
ald citi experiment section prison cell glass front face inward privaci high agenda E
m citi home manyaryan muslim gangsta latino christian italian irish moreso scuffl dea
th stare dodgi deal shadi agreement never far awayi would say main appeal show due fa
ct goe show wouldnt dare forget pretti pictur paint mainstream audienc forget charm f
orget romanceoz doesnt mess around first episod ever saw struck nasti wa surreal coul
dnt say wa readi watch develop tast Oz got accustom high level graphic violenc violen
c injustic crook guard wholl sold nickel inmat wholl kill order get away well manner
middl class inmat turn prison bitch due lack street skill prison ...
1
wonder littl product film techniqu veri unassum veri oldtimebbc fashion give comfort
sometim discomfort sens realism entir piec actor extrem well chosen michael sheen onl
i ha got polari ha voic pat truli see seamless edit guid refer william diari entri on
li well worth watch terrificli written perform piec master product one great master c
omedi hi life realism realli come home littl thing fantasi guard rather use tradit dr
eam techniqu remain solid disappear play knowledg sens particularli scene concern ort
on halliwel set particularli flat halliwel mural decor everi surfac terribl well done
2
thought thi wa wonder way spend time hot summer weekend sit air condit theater watch
lightheart comedi plot simplist dialogu witti charact likabl even well bread suspect
serial killer may disappoint realiz thi match point 2 risk addict thought wa proof wo
odi allen still fulli control style mani us grown lovethi wa Id laugh one woodi comed
i year dare say decad ive never impress scarlet johanson thi manag tone sexi imag jum
p right averag spirit young womanthi may crown jewel hi career wa wittier devil wear
prada interest superman great comedi go see friend
3
basic famili littl boy jake think zombi hi closet hi parent fight timethi movi slower
soap opera suddenli jake decid becom rambo kill zombieok first go make film must deci
d thriller drama drama movi watchabl parent divorc argu like real life jake hi closet
total ruin film expect see boogeyman similar movi instead watch drama meaningless thr
iller spots3 10 well play parent descent dialog shot jake ignor
4
petter mattei love time money visual stun film watch Mr mattei offer us vivid portrai
t human relat thi movi seem tell us money power success peopl differ situat encount t
hi variat arthur schnitzler play theme director transfer action present time new york
differ charact meet connect one connect one way anoth next person one seem know previ
ou point contact stylishli film ha sophist luxuri look taken see peopl live world liv
e habitatth onli thing one get soul pictur differ stage loneli one inhabit big citi e
xactli best place human relat find sincer fulfil one discern case peopl encounterth a
ct good Mr mattei direct steve buscemi rosario dawson carol kane michael imperioli ad
rian grenier rest talent cast make charact come alivew wish Mr mattei good luck await
anxious hi next work
Name: review, dtype: object
Bags of words model
It is used to convert text documents to numerical vectors or bag of words.
Out[30]:
In [31]:
#Count vectorizer for bag of words
cv
=
CountVectorizer
(
min_df
=
0
,
max_df
=
1
,
binary
=
False
,
ngram_range
=
(
1
,
3
))
#transformed train reviews
cv_train_reviews
=
cv
.
fit_transform
(
norm_train_reviews
)
#transformed test reviews
cv_test_reviews
=
cv
.
transform
(
norm_test_reviews
)
print
(
'BOW_cv_train:'
,
cv_train_reviews
.
shape
)
10/31/23, 7:49 PM
assignement-3-tanmay-agarwal
localhost:8888/nbconvert/html/Downloads/assignement-3-tanmay-agarwal.ipynb?download=false
6/10
BOW_cv_train: (40000, 6209089)
BOW_cv_test: (10000, 6209089)
Term Frequency-Inverse Document Frequency model (TFIDF)
It is used to convert text documents to matrix of tfidf features.
Tfidf_train: (40000, 6209089)
Tfidf_test: (10000, 6209089)
Labeling the sentiment text
(50000, 1)
Split the sentiment tdata
[[1]
[1]
[1]
...
[1]
[0]
[0]]
[[0]
[0]
[0]
...
[0]
[0]
[0]]
Modelling the dataset
Let us build logistic regression model for both bag of words and tfidf features
print
(
'BOW_cv_test:'
,
cv_test_reviews
.
shape
)
#vocab=cv.get_feature_names()-toget feature names
In [32]:
#Tfidf vectorizer
tv
=
TfidfVectorizer
(
min_df
=
0
,
max_df
=
1
,
use_idf
=
True
,
ngram_range
=
(
1
,
3
))
#transformed train reviews
tv_train_reviews
=
tv
.
fit_transform
(
norm_train_reviews
)
#transformed test reviews
tv_test_reviews
=
tv
.
transform
(
norm_test_reviews
)
print
(
'Tfidf_train:'
,
tv_train_reviews
.
shape
)
print
(
'Tfidf_test:'
,
tv_test_reviews
.
shape
)
In [33]:
#labeling the sentient data
lb
=
LabelBinarizer
()
#transformed sentiment data
sentiment_data
=
lb
.
fit_transform
(
imdb_data
[
'sentiment'
])
print
(
sentiment_data
.
shape
)
In [34]:
#Spliting the sentiment data
train_sentiments
=
sentiment_data
[:
40000
]
test_sentiments
=
sentiment_data
[
40000
:]
print
(
train_sentiments
)
print
(
test_sentiments
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10/31/23, 7:49 PM
assignement-3-tanmay-agarwal
localhost:8888/nbconvert/html/Downloads/assignement-3-tanmay-agarwal.ipynb?download=false
7/10
LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=500,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=42, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=500,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=42, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
Logistic regression model performane on test dataset
[0 0 0 ... 0 1 1]
[0 0 0 ... 0 1 1]
Accuracy of the model
lr_bow_score : 0.7512
lr_tfidf_score : 0.75
Printing the classification report to evaluate the performance of the models
In [35]:
#training the model
lr
=
LogisticRegression
(
penalty
=
'l2'
,
max_iter
=
500
,
C
=
1
,
random_state
=
42
)
#Fitting the model for Bag of words
lr_bow
=
lr
.
fit
(
cv_train_reviews
,
train_sentiments
)
print
(
lr_bow
)
#Fitting the model for tfidf features
lr_tfidf
=
lr
.
fit
(
tv_train_reviews
,
train_sentiments
)
print
(
lr_tfidf
)
In [36]:
#Predicting the model for bag of words
lr_bow_predict
=
lr
.
predict
(
cv_test_reviews
)
print
(
lr_bow_predict
)
##Predicting the model for tfidf features
lr_tfidf_predict
=
lr
.
predict
(
tv_test_reviews
)
print
(
lr_tfidf_predict
)
In [37]:
#Accuracy score for bag of words
lr_bow_score
=
accuracy_score
(
test_sentiments
,
lr_bow_predict
)
print
(
"lr_bow_score :"
,
lr_bow_score
)
#Accuracy score for tfidf features
lr_tfidf_score
=
accuracy_score
(
test_sentiments
,
lr_tfidf_predict
)
print
(
"lr_tfidf_score :"
,
lr_tfidf_score
)
In [38]:
#Classification report for bag of words
lr_bow_report
=
classification_report
(
test_sentiments
,
lr_bow_predict
,
target_names
=
[
'Posi
print
(
lr_bow_report
)
#Classification report for tfidf features
lr_tfidf_report
=
classification_report
(
test_sentiments
,
lr_tfidf_predict
,
target_names
=
[
'
print
(
lr_tfidf_report
)
10/31/23, 7:49 PM
assignement-3-tanmay-agarwal
localhost:8888/nbconvert/html/Downloads/assignement-3-tanmay-agarwal.ipynb?download=false
8/10
precision
recall
f1-score
support
Positive
0.75
0.75
0.75
4993
Negative
0.75
0.75
0.75
5007
accuracy
0.75
10000
macro avg
0.75
0.75
0.75
10000
weighted avg
0.75
0.75
0.75
10000
precision
recall
f1-score
support
Positive
0.74
0.77
0.75
4993
Negative
0.76
0.73
0.75
5007
accuracy
0.75
10000
macro avg
0.75
0.75
0.75
10000
weighted avg
0.75
0.75
0.75
10000
Confusion matrix
[[3768 1239]
[1249 3744]]
[[3663 1344]
[1156 3837]]
Stochastic gradient descent or Linear support vector machines for bag of words and tfidf
features
SGDClassifier(alpha=0.0001, average=False, class_weight=None,
early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True,
l1_ratio=0.15, learning_rate='optimal', loss='hinge',
max_iter=500, n_iter_no_change=5, n_jobs=None, penalty='l2',
power_t=0.5, random_state=42, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
SGDClassifier(alpha=0.0001, average=False, class_weight=None,
early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True,
l1_ratio=0.15, learning_rate='optimal', loss='hinge',
max_iter=500, n_iter_no_change=5, n_jobs=None, penalty='l2',
power_t=0.5, random_state=42, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
Model performance on test data
In [39]:
#confusion matrix for bag of words
cm_bow
=
confusion_matrix
(
test_sentiments
,
lr_bow_predict
,
labels
=
[
1
,
0
])
print
(
cm_bow
)
#confusion matrix for tfidf features
cm_tfidf
=
confusion_matrix
(
test_sentiments
,
lr_tfidf_predict
,
labels
=
[
1
,
0
])
print
(
cm_tfidf
)
In [40]:
#training the linear svm
svm
=
SGDClassifier
(
loss
=
'hinge'
,
max_iter
=
500
,
random_state
=
42
)
#fitting the svm for bag of words
svm_bow
=
svm
.
fit
(
cv_train_reviews
,
train_sentiments
)
print
(
svm_bow
)
#fitting the svm for tfidf features
svm_tfidf
=
svm
.
fit
(
tv_train_reviews
,
train_sentiments
)
print
(
svm_tfidf
)
10/31/23, 7:49 PM
assignement-3-tanmay-agarwal
localhost:8888/nbconvert/html/Downloads/assignement-3-tanmay-agarwal.ipynb?download=false
9/10
[1 1 0 ... 1 1 1]
[1 1 1 ... 1 1 1]
Accuracy of the model
svm_bow_score : 0.5829
svm_tfidf_score : 0.5112
Print the classification report
precision
recall
f1-score
support
Positive
0.94
0.18
0.30
4993
Negative
0.55
0.99
0.70
5007
accuracy
0.58
10000
macro avg
0.74
0.58
0.50
10000
weighted avg
0.74
0.58
0.50
10000
precision
recall
f1-score
support
Positive
1.00
0.02
0.04
4993
Negative
0.51
1.00
0.67
5007
accuracy
0.51
10000
macro avg
0.75
0.51
0.36
10000
weighted avg
0.75
0.51
0.36
10000
Plot the confusion matrix
In [41]:
#Predicting the model for bag of words
svm_bow_predict
=
svm
.
predict
(
cv_test_reviews
)
print
(
svm_bow_predict
)
#Predicting the model for tfidf features
svm_tfidf_predict
=
svm
.
predict
(
tv_test_reviews
)
print
(
svm_tfidf_predict
)
In [42]:
#Accuracy score for bag of words
svm_bow_score
=
accuracy_score
(
test_sentiments
,
svm_bow_predict
)
print
(
"svm_bow_score :"
,
svm_bow_score
)
#Accuracy score for tfidf features
svm_tfidf_score
=
accuracy_score
(
test_sentiments
,
svm_tfidf_predict
)
print
(
"svm_tfidf_score :"
,
svm_tfidf_score
)
In [43]:
#Classification report for bag of words
svm_bow_report
=
classification_report
(
test_sentiments
,
svm_bow_predict
,
target_names
=
[
'Po
print
(
svm_bow_report
)
#Classification report for tfidf features
svm_tfidf_report
=
classification_report
(
test_sentiments
,
svm_tfidf_predict
,
target_names
=
print
(
svm_tfidf_report
)
In [44]:
#confusion matrix for bag of words
cm_bow
=
confusion_matrix
(
test_sentiments
,
svm_bow_predict
,
labels
=
[
1
,
0
])
print
(
cm_bow
)
#confusion matrix for tfidf features
cm_tfidf
=
confusion_matrix
(
test_sentiments
,
svm_tfidf_predict
,
labels
=
[
1
,
0
])
print
(
cm_tfidf
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10/31/23, 7:49 PM
assignement-3-tanmay-agarwal
localhost:8888/nbconvert/html/Downloads/assignement-3-tanmay-agarwal.ipynb?download=false
10/10
[[4948
59]
[4112
881]]
[[5007
0]
[4888
105]]
Conclusion:
From the above results we can observe that Logistic Regression model is performing better
than the Support Vector Machine.
The weighted F-1 score of Logistic Regression model is 0.75 while weighted F1 score of
SVM model is 0.50.
Still we can improve the accuracy of the models by preprocessing data and by using Deep
Neural Network models like RNN, LSTM or GRU
In [ ]:
Related Documents
Related Questions
A data set contains the observations 7,4,2,3,1. Find (∑x )^2.
arrow_forward
XYZ Corporation recorded the number of employee absences each week over a period of 10 weeks. The result is the data list 5, 3, 4, 1, 4, 7, 2, 6, 3, 5. Find the mean number of absences each week.
arrow_forward
USH financial is examining the use of its midtown ATM machine. Here are the numbers of transactions made per day at this ATM during the past 9 days.
28, 9, 10, 26, 28, 9, 19, 11, 35
Find the range of the data set.
ন
arrow_forward
Which data set has a median of 25?
О (11, 23, 37, 4, 27, 18, 30, 34, 9, 32, 18}
О 11, 23, 37, 4, 25, 18, 30, 34, 9, 32}
О 11, 23, 37, 4, 27, 18, 30, 34, 9, 32}
О (11, 23, 37, 4, 27, 18, 30, 34, 9}
arrow_forward
A nutritionist collects the weight of college students in the first semester, then again in the second semester. What is the best way to visually present this data?
a) Line Graphs
b) Scatterplots
c) Bar Graphs
d) Pie Charts
arrow_forward
THE SALE PRICE OF THE SIX TOWN HOUSES SOLD IN 2020 were {198,000;185,000;205,200;225,300;206,700;208,000} what is the range of the data?
arrow_forward
compute the range for the set of data
13, 13, 13, 20, 24, 24, 24
arrow_forward
In SPSS, what analysis can be used to quantify the relationship that exists among 5 variables
arrow_forward
4. The number of songs fifteen students have on their MP3 players is:
120, 124, 132, 145, 200, 255, 260, 292, 308, 314, 342, 407, 421, 435, 452
State the values of the minimum, 1st quartile, median, 3rd quartile, IQR and
maximum. Using these values, construct a box plot
Dulog
arrow_forward
The histogram to the right represents the weights (in pounds ) of members of a certain high school math team. what is the class width? What are the approximate lower and upper class limits of the first class?
what is the class width? the class width is
what are the approximate lower and upper class limits of the first class ? The approximate lower class is _the approximate upper class is _
arrow_forward
Thank you for your help, Please refer to the image.
Here is also the data show in the image in text format to help you.
Cause FrequencyPilot_error 683Other_human_error 53Weather 520Mechanical_Problems 617Sabotage 126
arrow_forward
The following data points represent the number of players on the Russian Bears volleyball team that were injured in each match this year.
\qquad1,2,2,1, 2,11,2,2,1,2,11, comma, 2, comma, 2, comma, 1, comma, 2, comma, 1
Using this data, create a frequency table.
Number of injured players
Number of matches
000
111
222
arrow_forward
The table below displays the adult literacy rate in Bolivia for several different years. The adult literacy rate is the percentage of people ages 15 and above who can both read and write with understanding a short simple statement about their everyday life.
Data downloaded on 2/19/2020 from https://ourworldindata.org/grapher/literacy-rate-adults?tab=chart&time=1973..2016.
Year 1976 2001 2012
Literacy Rate 63.2% 86.7% 94.5%
When answering the questions below, round to four decimal places in your intermediate computations.
Use interpolation or extrapolation (whichever is appropriate) to predict the literacy rate in Bolivia in 1992. Round your answer to one decimal place. You only get one submission for the unit.
---Select---
Use interpolation or extrapolation (whichever is appropriate) to predict the literacy rate in Bolivia in 2050. Round your answer to one decimal place. You only get one submission for the unit.
---Select---
Is your 2050 prediction realistic? You must…
arrow_forward
M
ui/v2/assessment-player/index.html?launchld=3cb6995a-a464-4ce8-9952-7c527abd86ce#/question/2
-/1 E
Question 3 of 14
View Policies
Current Attempt in Progress
A company has cost and revenue functions, in dollars, given by C(q) = 6000 + 8g and R(g) = 12g.
(a) Find the cost and revenue if the company produces 500 units. Does the company make a profit? What about 5000 units?
Enter the exact answers without comma separation of digits.
The cost of producing 500 units is $
i
The revenue if the company produces 500 units is $ i
Thus, the company
v a profit.
The cost of producing 5000 units is $
The revenue if the company produces 5000 units is $i
Thus, the company
v a profit.
eTextbook and Media
(b) Find the break-even point.
Enter the exact answer.
The break-even point is i
units.
eTextbook and Media
Which of tbe fellowina illust
break even point aranbically?
ssion_..docx
2 Discussion_-..docx
- Discussion_...docx
MacBook Pro
arrow_forward
Determine whether to use the formula for correlated comparisons or independent-groups comparisons?
arrow_forward
What are the variables for the study and classify each as categorical or numerical?
arrow_forward
Compute 70th percentile and compute the interquartile range
50,47,58,53,66,81,73,65,51,71,58
arrow_forward
PLEASE TYPEWRITTEN FOR UPVOTE. SHOW YOUR COMPLETE SOLUTION!
arrow_forward
In the dataset, the variable "diet_new" is a dichotomous variable, 0 = plant-based diet group, and 1 = non-plant-based diet group; and the variable "exercise" indicates the average exercise time per day.
arrow_forward
Only the histogram graphing
arrow_forward
XYZ Corporation recorded the number of employee absences each week over a period of 10 weeks. The result is the data list 5, 3, 4, 1, 4, 7, 2, 6, 3, 5. Find the standard deviation of the number of absences each week.
arrow_forward
>
Search
itc.edu.kh v
Activity
Midterm Statistics(2) (2020-2021GICI31STA_GIC_Statistics_OL Say_Mardi_7-9am)
Close
Teams
Hi DIM LIFY, when you submit this form, the owner will be able to see your name and email address.
Assignments
1
Question 5
Calendar
(20 Points)
Files
Let X1, X2, X3,..., Xn be a random sample from a Geometric distribution
Geo(0), where 0 is unknown. Find the maximum likelihood estimator (MLE) of
O based on this random sample. Recall that the pmf of X ~ Geo(0) is
f(x; 0) = (1 – 6)*-10,
(a) Ômle = X
(b) Ômle = 1/X
x = 0, 1, ....
%3D
(c) Ômle = E=, In X,
(d) Ômle = 2X
%3D
(a)
(b)
(c)
Apps
(d)
1:50 PM
A Spotify
T. General (2020-2021...
Details | bartleby - ..
A D 4) G E
ENG
12/16/2020
O
田
arrow_forward
Page of 9
ZOOM
+
14. (9 points) A dotplot of the number of tornadoes each year in a certain country from 1948 to 2004 is
given.
Tornadoes in Period 1948-2004
1
3
4
5
7
Number of Tornadoes
a. Circle ONE dot (any one of them) on the dotplot. Explain clearly in context what that dot represents.
b. What is the shape of the distribution?
LO
•...
....
arrow_forward
A tour company in a major city has a daily sightseeing trip. Each day, tour officials record the number of people making the trip. The data for a selection of 9
summer days are as follows.
49, 45, 45, 46, 41, 37, 50, 43, 45
Send data to calculator
(a) What is the mean of this data set? If your answer is not an
integer, round your answer to one decimal place.
(b) What is the median of this data set? If your answer is not
an integer, round your answer to one decimal place.
(c) How many modes does the data set have, and what are
their values? Indicate the number of modes by clicking in the
appropriate circle, and then indicate the value(s) of the
mode(s), if applicable.
44.6
45
O zero modes
O one mode:
O two modes:
X
45
and
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Related Questions
- A data set contains the observations 7,4,2,3,1. Find (∑x )^2.arrow_forwardXYZ Corporation recorded the number of employee absences each week over a period of 10 weeks. The result is the data list 5, 3, 4, 1, 4, 7, 2, 6, 3, 5. Find the mean number of absences each week.arrow_forwardUSH financial is examining the use of its midtown ATM machine. Here are the numbers of transactions made per day at this ATM during the past 9 days. 28, 9, 10, 26, 28, 9, 19, 11, 35 Find the range of the data set. নarrow_forward
- Which data set has a median of 25? О (11, 23, 37, 4, 27, 18, 30, 34, 9, 32, 18} О 11, 23, 37, 4, 25, 18, 30, 34, 9, 32} О 11, 23, 37, 4, 27, 18, 30, 34, 9, 32} О (11, 23, 37, 4, 27, 18, 30, 34, 9}arrow_forwardA nutritionist collects the weight of college students in the first semester, then again in the second semester. What is the best way to visually present this data? a) Line Graphs b) Scatterplots c) Bar Graphs d) Pie Chartsarrow_forwardTHE SALE PRICE OF THE SIX TOWN HOUSES SOLD IN 2020 were {198,000;185,000;205,200;225,300;206,700;208,000} what is the range of the data?arrow_forward
- compute the range for the set of data 13, 13, 13, 20, 24, 24, 24arrow_forwardIn SPSS, what analysis can be used to quantify the relationship that exists among 5 variablesarrow_forward4. The number of songs fifteen students have on their MP3 players is: 120, 124, 132, 145, 200, 255, 260, 292, 308, 314, 342, 407, 421, 435, 452 State the values of the minimum, 1st quartile, median, 3rd quartile, IQR and maximum. Using these values, construct a box plot Dulogarrow_forward
- The histogram to the right represents the weights (in pounds ) of members of a certain high school math team. what is the class width? What are the approximate lower and upper class limits of the first class? what is the class width? the class width is what are the approximate lower and upper class limits of the first class ? The approximate lower class is _the approximate upper class is _arrow_forwardThank you for your help, Please refer to the image. Here is also the data show in the image in text format to help you. Cause FrequencyPilot_error 683Other_human_error 53Weather 520Mechanical_Problems 617Sabotage 126arrow_forwardThe following data points represent the number of players on the Russian Bears volleyball team that were injured in each match this year. \qquad1,2,2,1, 2,11,2,2,1,2,11, comma, 2, comma, 2, comma, 1, comma, 2, comma, 1 Using this data, create a frequency table. Number of injured players Number of matches 000 111 222arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL