PROJECT DOC

.docx

School

University of Nebraska, Lincoln *

*We aren’t endorsed by this school

Course

451

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

4

Uploaded by MateTankKudu37

Report
TERM PROJECT- SCMA 451 PREDICTIVE ANALYTICS (Due on May 12 th , 2023) Aims: Data analysis, transformation, model development, assessment, and prediction. Following deliverables will be submitted as part of this project: 1. Written report in a Word Document: This report will include your answers to the questions with the appropriate data analysis and model output. (Organization of the report – 10 points) 2. 8-10 slides PowerPoint presentation: This presentation should be intended for presenting and summarizing project steps and questions answered in the report. Discuss with your group how to organize and what to include to make the point with the presentation. (Organization of the presentation 10 points) 3. R script file with its complied report. Be sure to submit the project report (PDF file) and R code to Canvas by the project deadline. You must write up your course project results in a professional report, which should be no more than 15 double -spaced pages long. The report should include substantive details of your analysis, and it should have several sections (e.g., Introduction, Analysis, Results, Conclusions). The report should provide sufficient details so that anyone with a reasonable statistical background can understand exactly what you have done. You should consider using tables and figures to enhance your report. The quality of your report including adherence to report guidelines stated; clarity of writing; organization and layout; appropriate use of tables and figures; careful proof-reading to minimize typos, incorrect spelling and grammatical errors will be considered in grading. TERM PROJECT DESCRIPTION MidWest University Foundation (MWUF) wishes to improve the cost- effectiveness of their direct marketing campaigns to previous donors. According to their recent mailing records, the typical overall response rate is 10%. Out of those who respond (donate) to the mailing, the average donation is $14.50. Each mailing costs $2.00 to produce and send; the mailing includes a gift of personalized address labels and assortment of cards and envelopes. It is not cost-effective to mail everyone because the expected profit from each mailing is 14.50 x 0.10 – 2 = -$0.55. We would like to develop a classification model using data from the most recent campaign that can effectively capture likely donors so that the expected net profit is maximized. We would also like to build a prediction model to predict expected gift amounts from donors – the data for this will consist of the records for donors only. The data are available in the file “MWUF.csv” (available in Canvas):
COURSE PROJECT REQUIREMENTS 1. Discuss with your group how the CRISP-DM process would apply to this project. Explain the project goals and how each step applies to this project with 2-3 sentences. 2. Check if there are any missing values in the dataset provided. If there are, discuss with your group how you would like to process the data and move forward for data analysis. 3. Conduct exploratory data analysis on the data set prior to building classification and prediction models. a. Look at the correlations between donation amount (DAMT) and potential input variables for predicting DAMT and also present these correlations in a table. b. Use proper data visualization tools to explore relations between potential input variables to predict DONR and DONR (do not include more than 5 visualizations). 4. For predictive modeling purposes, form a data frame in RStudio and make sure all categorical variables are coded as factors. Discuss if you need to make any other data transformation for this project. 5. Develop following classification models for predicting DONR variable using any of the variables as predictors (do not include DAMT and REG1, REG2, REG3, REG4 variables ). Use seed 123 for 70-30% data partitioning ratio for all models to train and test models’ predictive performance. a. Logistic regression model (LogR1). Which variables are statistically significant? State with the threshold value you use. Plot the ROC curve and state the AUC statistic. b. Run feature selection over the LogR1 and call it LogR2. Explain the method you used and which variables are in the final model. Plot the ROC curve and state the AUC statistic. c. A decision tree classification model (DT). Explain which variables are used in the model. Extract the rules from DR and state them. Plot the ROC curve and state the AUC statistic. d. A neural network model with 20 hidden nodes (ANN1). Plot the ROC curve and state the AUC statistic. e. A neural network model with 100 hidden nodes (ANN2). Plot the ROC curve and state the AUC statistic.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help