Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner
3rd Edition
ISBN: 9781118729274
Author: Galit Shmueli, Peter C. Bruce, Nitin R. Patel
Publisher: WILEY
expand_more
expand_more
format_list_bulleted
Want to see more full solutions like this?
Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!
Expert Solution & Answer
Want to see the full answer?
Check out a sample textbook solutionStudents have asked these similar questions
For this question, create a new column in the dataset where total rainfall is just the sum of our three separate rainfall variables. (a) plot crop yield vs. time. Does yield appear to be stationary? Why or why not? (b) plot total rainfall vs. time. Does total rainfall appear to be stationary? Why or why not? (c) plot the first-difference of crop yield vs. time. Does this series appear to be stationary? Why or why not? (d) formally test whether crop yield, rainfall and the first-difference of crop yield are stationary using the appropriate test. Be sure to do all parts of the hypothesis tests. After these tests, what can you say the order of integration is for each of the variables?(e) estimate a model where yield is a function of rainfall and time. You do not have to worry about the time variable being stationary or not, but the other two must be stationary (you might need to difference one or both of them to make it stationary). Fully report your results.(f) test your model for…
data: http://lib.stat.cmu.edu/datasets/
- California housing data by Pace and Barry (1997)
Q1: Properly select one independent variable from the data and carry out a polynomial regression analysiswith Y as the response by using "divide-and-conquer" algorithm.
What does this question mean? how to approach?
Solve this using R and R studio
Use the Boston data set from the MASS package to answer the below questions.
1. Using createDataPartition() function from the Caret package to partition the data into two parts – 80% into training data and 20% into test data.
2. Using train() function from the Caret package, run a k-NN model with medv as the response or target variable with the following:
a. Standardize the dataset using center and scale options in the preProcess() function in the Caret package
b. Use a 10-fold cross validation
3. Generate a plot showing model error RMSE against different values of k.
4. What is the optimal value of k? Explain how you chose this value.
Chapter 4 Solutions
Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner
Knowledge Booster
Similar questions
- In R please provide the code and explanation for the following: * the first three parts were previously answered * iv. Create 4 different data frames, one for the data corresponding to each of the 4factors. How many observations are there for each diet? v. Check the normality assumption for each subset by creating qq plots. Makesure each plot has an appropriate title. vi. What is the sample variance for each diet? Do you think that the assumption ofcommon variance holds? Why or why not? How could you formally test this?arrow_forwardIn this assignment you will implement linear regression model and evaluate their performance on the California house price data set. (housing.csv) Apply the codes and save them in a seperate .py or .ipynb file DO NOT PUT THE CODE IN YOUR REPORT DOCUMENT, only present your output metrics as well as requested graphs and personal comments in the report. Name the report and code files with surname_studentID_section. You will submit a report and a .py/.ipynb file. Only use the data set version provided with the assignment do not download other versions or use the ready made version in google colab. In the assignment you will do the following: - apply linear regression on each individual numerical feature (drop features : 'ocean_proximity' ‘longitude', 'latitude') - output the coefficients and your self implemented error measures: sum of squared error SSE, mean squared error MSE, use split percentage cross validation with 30% test size and shuffle as True refer to documentations during…arrow_forwardUsing R programming: Create a function that randomly picks up two columns from a data frame and does the following: if both columns are of character values, test if they are independent; if both columns are numeric, create a data plot to show their relationship trends with a regression line attached; if one is numeric and the other character, test if the numeric column is related to the character columnarrow_forward
- Take the Absenteeism at work Data Set, obtained from the UC-Irvine Machine Learning Data Repository, as a data set to be discretized. Perform data discretization for two of the numerical attributes using the Equal-Width or EqualFrequency method. (Let the stopping criteria be max-interval = 4). You need to write a small program to do this to avoid clumsy numerical computation. Submit your simple analysis and your test results: split points, final intervals, and your documented source program.arrow_forwardDo an analysis of a real data set and also mention where did you find the data set. Please do analysis completely in R studio. A basic requirement for the data set is that it includes one response variable and at least two predictor variables.The main objectives of this question are• to identify a suitable data set,• to come up with meaningful research questions based on the data,• to experience some of the problems encountered when analyzing real data,Also mention:• Where I find the data set?• Why the problem is of interest?• Which method or model is appropriate to this problem?• How do I apply the method to analysis the data set?• What is my conclusion?arrow_forwardLarge volumes of ungrouped data were often organized into intervals of our choice in the past. We are faced with colossal and formidable difficulties, just as in the real world. Do the advantages and disadvantages of working in a group outweigh working alone?arrow_forward
- Discuss very Big Data an normality and then very small samples of less than 35.arrow_forwardA common practice was to aggregate large volumes of ungrouped data into intervals of our own design. As in the real world, we face vast and difficult obstacles. Are there any advantages or disadvantages to working in a group vs working alone?arrow_forwardExercise #1 Load the data set "Cars93.csv" or "Cars93.pkl" in the Google Drive for our course from the unit4 subfolder into a dataframe called cars_df Create scatter plots of "Cylinders" versus "MPG.city" and versus "Horsepower" both in the same row (recall subplots from past workbooks) Calculate the correlation between "Cylinders" and "MPG.city" as well as "Cylinders" versus "Horsepower" What does these correlations tell us about association between the variables involved?arrow_forward
- What is the specific distinction between a DataReader and a DataSet, and how does this distinction manifest itself?arrow_forwardExplain the First Normal Form (1NF) and provide an example of a non-normalized dataset and its normalized counterpart.arrow_forwardIn R, the command to see the relative importance of your best predictive independent variables is ________. summary() data() data.frame() tree()arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education