Stefanie Salonis
BIAM500 – January 2019
Professor Walker
Lab 6: Data Mining with a Neural Network
Data Mining with a Neural Network
Scenario / Summary:
Adventure Works Cycles, a fictional bicycle manufacturing and sales company, wants to be able to predict sales from new customers during their first year. Specifically, Adventure Works would like to classify new customers using the following categories.:
Category
Expected Sales in First Year
D
Less than $2,000
C
$2,000–$2,999
B
$3,000–$3,999
A
$4,000 or more
Adventure Works currently collects a set of demographic data from all customers through a customer survey. Data mining has been performed to determine the feasibility of classifying new customers based on their survey responses and been provided with two data sets extracted from the company's data warehouse: 1.
A list of long-term customers with the survey responses and total first-year sales for each (
OldCustomers
) 2.
A list of new customers to be classified with their survey responses (
NewCustomers
)
Objective
–
Give findings and recommendations for Adventure Works managers concerning this data mining effort: Include
:
An evaluation of the performance of the neural network on this data mining problem / in classifying customers
Recommendations as to how the performance might be improved
Recommendations as to how Adventure Works could use predicted customer classifications to improve business results
Findings
–
How could we proceed if we wanted to try to improve this neural network? Prof. Walker’s Comments
:
We’ve trained and tested a neural network for customer classification, used the network to predict classifications for some new customers, and evaluated the network’s performance. I’ll leave that mostly for you to think about and write about in your opinion paper, but here are a couple of ideas to get you started. The network may be having trouble learning to recognize A and D customers because there are relatively few of them in the training set, compared to the B and C customers. We might try training the network on a data set that over-samples As and Ds, and under-samples Bs and Cs, compared to the actual distribution. There’s no rule that says the proportions of different case types in your training set have to match reality, and sometimes, there can be benefits to deliberately over-representing or under-