Lab6_Salonis_Paper

.docx

School

University of Louisiana, Lafayette *

*We aren’t endorsed by this school

Course

380

Subject

Business

Date

Jan 9, 2024

Type

docx

Pages

4

Uploaded by emarielaver

Report
Stefanie Salonis BIAM500 – January 2019 Professor Walker Lab 6: Data Mining with a Neural Network Data Mining with a Neural Network Scenario / Summary: Adventure Works Cycles, a fictional bicycle manufacturing and sales company, wants to be able to predict sales from new customers during their first year. Specifically, Adventure Works would like to classify new customers using the following categories.: Category Expected Sales in First Year D Less than $2,000 C $2,000–$2,999 B $3,000–$3,999 A $4,000 or more Adventure Works currently collects a set of demographic data from all customers through a customer survey. Data mining has been performed to determine the feasibility of classifying new customers based on their survey responses and been provided with two data sets extracted from the company's data warehouse: 1. A list of long-term customers with the survey responses and total first-year sales for each ( OldCustomers ) 2. A list of new customers to be classified with their survey responses ( NewCustomers ) Objective Give findings and recommendations for Adventure Works managers concerning this data mining effort: Include : An evaluation of the performance of the neural network on this data mining problem / in classifying customers Recommendations as to how the performance might be improved Recommendations as to how Adventure Works could use predicted customer classifications to improve business results Findings How could we proceed if we wanted to try to improve this neural network? Prof. Walker’s Comments : We’ve trained and tested a neural network for customer classification, used the network to predict classifications for some new customers, and evaluated the network’s performance. I’ll leave that mostly for you to think about and write about in your opinion paper, but here are a couple of ideas to get you started. The network may be having trouble learning to recognize A and D customers because there are relatively few of them in the training set, compared to the B and C customers. We might try training the network on a data set that over-samples As and Ds, and under-samples Bs and Cs, compared to the actual distribution. There’s no rule that says the proportions of different case types in your training set have to match reality, and sometimes, there can be benefits to deliberately over-representing or under-
Stefanie Salonis BIAM500 – January 2019 Professor Walker Lab 6: Data Mining with a Neural Network representing certain types of cases. We might also want to think more carefully about which variables should be included in our model. If we are including some irrelevant data that really don’t have anything to do with how much the customer will purchase, this noise might be confusing the network. We could look individually at how each variable is correlated with first-year sales, and eliminate the ones that have weak or no correlation. We might also want to eliminate variables that are highly correlated with other variables; sometimes if two variables have very similar information content, including them both can dilute the impact of either one on the network. You can probably think of some other things to try. Summary Workbook : The Training section tells us the number of training cases, the training time, and the percentage of bad predictions made on the training cases. The most enlightening section is the Testing section , which shows how well the network did on known cases that were not used during the training. This better approximates how the network will perform on new cases in practice. Usually, the testing performance is worse than the training performance. The two numbers to focus on here are the percent of bad predictions . This tells us in how many cases the network's predicted classification did not match the actual classification. It is the percent of bad test cases out of all test cases. We would like this to be a low number. The mean incorrect probability is the average of all the incorrect percent values from the test cases, so it is, again, a measure of confidence. The higher this number is, the less sure the network is of its predictions. Again, we would like this to be a low number.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help