PROJECT DOC
.docx
keyboard_arrow_up
School
University of Nebraska, Lincoln *
*We aren’t endorsed by this school
Course
451
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
4
Uploaded by MateTankKudu37
TERM PROJECT- SCMA 451 PREDICTIVE
ANALYTICS
(Due on May 12
th
, 2023)
Aims: Data analysis, transformation, model development, assessment, and
prediction. Following deliverables will be submitted as part of this project:
1.
Written report in a Word Document: This report will include your answers
to the questions with the appropriate data analysis and model output.
(Organization of the report – 10 points)
2.
8-10
slides
PowerPoint
presentation:
This
presentation
should
be
intended
for
presenting and summarizing project steps and questions answered in the
report.
Discuss
with
your
group
how
to
organize
and
what
to
include
to
make
the point with the presentation. (Organization of the presentation 10
points)
3.
R
script
file
with
its
complied
report.
Be sure to submit the project report (PDF file) and R code to Canvas by the project deadline.
You must write up your course project results in a professional report, which should be no more than 15 double
-spaced pages long. The report should include substantive details of your analysis, and it should have several sections (e.g., Introduction, Analysis, Results, Conclusions). The report should provide sufficient details so that anyone with a reasonable statistical background can understand exactly what you have done. You should consider using tables and figures to enhance your report. The quality of your report including adherence to report guidelines stated; clarity of writing; organization and layout; appropriate use of tables and figures; careful proof-reading to minimize typos, incorrect spelling and grammatical errors will be considered in grading.
TERM
PROJECT
DESCRIPTION
MidWest University Foundation (MWUF) wishes to improve the cost- effectiveness of their direct marketing campaigns to previous donors. According to their recent mailing records, the typical overall response rate is 10%. Out of those who respond (donate) to the mailing, the average donation is $14.50. Each mailing costs $2.00 to produce and send; the mailing includes a gift of personalized address labels and assortment of cards and envelopes. It is not cost-effective to mail everyone because the expected profit from each mailing is 14.50 x 0.10 – 2 = -$0.55.
We would like to develop a classification model using data from the most recent campaign that can effectively capture likely donors so that the expected net profit is maximized. We would also like to build
a prediction model to predict expected gift amounts from donors – the data for this will consist of the records for donors only. The data are available in the file “MWUF.csv” (available in Canvas):
COURSE
PROJECT
REQUIREMENTS
1.
Discuss with your group how the CRISP-DM process would apply to this project. Explain the project goals and how each step applies to this project with 2-3 sentences.
2.
Check if there are any missing values in the dataset provided. If there are, discuss with your group how you would like to process the data and move forward for data analysis.
3.
Conduct exploratory data analysis on the data set prior to building classification and prediction models.
a.
Look at the correlations between donation amount (DAMT) and potential input
variables for predicting DAMT and also present these correlations in a table.
b.
Use proper data visualization tools to explore relations between potential input variables to predict DONR and DONR (do not include more than 5 visualizations).
4.
For predictive modeling purposes, form a data frame in RStudio and make sure all categorical
variables are coded as factors. Discuss if you need to make any other data transformation for this project.
5.
Develop following classification models for predicting DONR variable using any of the variables as predictors (do not include DAMT and REG1, REG2, REG3, REG4 variables
).
Use seed 123 for 70-30% data partitioning ratio for all models to train and test models’ predictive performance.
a.
Logistic regression model (LogR1). Which variables are statistically significant? State with the threshold value you use. Plot the ROC curve and state
the AUC statistic.
b.
Run feature selection over the LogR1 and call it LogR2. Explain the method you
used and which variables are in the final model. Plot the ROC curve and state
the AUC statistic.
c.
A decision tree classification model (DT). Explain which variables are used in the
model. Extract the rules from DR and state them. Plot the ROC curve and state the AUC statistic.
d.
A neural network model with 20 hidden nodes (ANN1). Plot the ROC curve and
state the AUC statistic.
e.
A neural network model with 100 hidden nodes (ANN2). Plot the ROC curve and
state the AUC statistic.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Why is it a critical and challenging part of the model-building process to determine the appropriate values?
arrow_forward
Discuss the importance of a model being well documented.
arrow_forward
Define mathematical models.
arrow_forward
Give 2 characteristics that indicate a linear model may be appropriate to model a data set
arrow_forward
How can you evaluate the accuracy of a forecast model?
arrow_forward
Give an example of a refinement in mathematical modeling with contextual explanation.
arrow_forward
Data mining is the extraction of knowledge and data patterns from various raw data sets by examining patterns from various raw data sets by examining trends and business reports used for classification of data and prediction of the data set.
Give an example of an actual or potential application of big data or data mining in a marketing organization. Describe how the application meets the criteria of being big data or data mining.
arrow_forward
Fully discuss whether we should omit a predictor from the modeling stage if it does not reflects any connection with the target variable in the EDA stage, and why.
arrow_forward
COULD YOU SOLVE IT WITH EXCEL SOLVER. I NEED EXCEL SOLVER SOLUTION AND ALSO COULD YOU UPLOAD ANSWER WITH EXCEL SOLVER PHOTOS.
please do not provide solution in image format thank you!
arrow_forward
Define and draw unit circle. What is the significance of unit circle for system analysis?
arrow_forward
What is model breakdown?
arrow_forward
Define multiperiod forecasting. Which Method Should we Use?
arrow_forward
Please help on all parts of question 2 and all parts of question 3. Thank you!
arrow_forward
Pls explain briefly the necessity and benefit of collecting data and using statistical methods in production processes. Pls list types of data collected and analyses implemented (graphics followed) with their reason.
arrow_forward
What are the steps that are taken after identifying a problem in order to define a research question and the parameters of an investigation?
arrow_forward
What is the simultaneous equation bias? Give an example? What are the techniques used to estimate such model? What are the necessary conditions that are required to validly estimate the original models parameters?
arrow_forward
The Ministry of Tourism in Trinidad and Tobago is interested in developing a campaign to increase the number of visitors to the island. The Ministry in collaboration with the island’s hotels collected data to be used as a guide to determine what steps should be taken going forward. Using the data in the Microsoft Excel file attached you are required to use the knowledge you have acquired during the semester to answer the following question. Ensure that your responses are detailed and all the necessary steps are clearly outlined.
Please note that JASP should only be used if stated in the question.
The Description of the variables in the data set is given below
Name
Description
ID
Visitor Number (1-150)
Length_stay
Length of stay in the island (days)
Age_years
Age of visitor (years)
Return_pct
Average estimated probability of returning to the island
Attraction
Number of attractions visited in the island
Trip_ratio
Number of trips taken off…
arrow_forward
Explain Variable Step Size Methods?
arrow_forward
Briefly describe the methods of collecting primary data
arrow_forward
MAPEH-10-Q3-Week-4.pdfX
Filipino-10-Q3-Week-4.pdf X
TLE CSS 10 Quarter3 Week X
ESP-10-03-Weeks-3-4.pdf
Math 10 Q3 Week 5.p
sers/L%20E%2ON%200%20V%200%20-%20P%20C/Downloads/Math%2010%20Q3%20Week%205.pdf
Facebook
- +
E| D Page view A Read aloud | Draw
E Highlight
b. 362 880
b. 181 440
C. 60 480
d. 30 240
3. Abby is selecting 3 desserts from 10 possible choices displayed on the dessert cart at a restaurant. How
many selections are there?
b. 120
b. 240
C. 360
d. 720
4. A mother, a father, and their 4 children will dine in a restaurant with circular tables. In how many ways
can the family be seated in a table?
b. 24
b. 56
C. 120
d. 720
5. From a Grade 10 class of 28 students, five representatives are to be chosen for academic competition.
In how many ways can students be chosen to represent their class?
b. 12 285
b. 19 565
C. 49 140
d. 98 280
6. Mr. Reyes has a vault with four-digit combination lock. He can set the combination himself. He can use
the digits 0 - 9. Each digit can be used…
arrow_forward
How does a process performance index differ from a process capability index?
arrow_forward
What is the advantage of using existing datasets as a data collection method?
A It provides the most accurate and reliable data
B
It allows for customization and control over the data collection process
It saves time and resources by utilizing data already available
All of the above
arrow_forward
naining Time: 3 hours, 23 minutes, 18 seconds.
✓ Question Completion Status:
Type of Job
000 000
White Collar
Blue Collar
43%
95%
28%
37%
91%
41%
Republican
11%
15%
26%
sessment_id= 415098_1&course_id= 309271_1&new_attempt=1&c...
Political
Democrat
21%
16%
37%
Affiliation
Independent Total
42%
58%
100%
10%
27%
37%
1. Find (Probability of Blue Collar | Democrat) Use two decimal places in
your answer.
Click Save and Submit to save and submit. Click Save All Answers to save all answers.
Save All Answers
0 Guest
arrow_forward
complete screenshot please
arrow_forward
Please do question 10 B part. Thanks
arrow_forward
What connects both internal and external data in operations and supply chain analytics?
Ai
Danalytics
Teradata
Deep Learning.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Related Questions
- Data mining is the extraction of knowledge and data patterns from various raw data sets by examining patterns from various raw data sets by examining trends and business reports used for classification of data and prediction of the data set. Give an example of an actual or potential application of big data or data mining in a marketing organization. Describe how the application meets the criteria of being big data or data mining.arrow_forwardFully discuss whether we should omit a predictor from the modeling stage if it does not reflects any connection with the target variable in the EDA stage, and why.arrow_forwardCOULD YOU SOLVE IT WITH EXCEL SOLVER. I NEED EXCEL SOLVER SOLUTION AND ALSO COULD YOU UPLOAD ANSWER WITH EXCEL SOLVER PHOTOS. please do not provide solution in image format thank you!arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:Cengage
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage