Assignment2_CategoricalVariables_ShubhamJethwa

.docx

School

Seneca College *

*We aren’t endorsed by this school

Course

110

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

11

Uploaded by SuperEnergyTarsier27

Report
1/ 11 Assignment 2 Bank Marketing Case Study: Categorical Attributes Check The head of Marketing wants to know which customers have the highest propensity for buying a Certificate of Deposit (CD) from the institution. The goal of this assignment is to check errors in character variables and correct them. Learning outcomes Use PROC FREQ to inspect errors in character variables. Use character functions for data cleaning Use 2x2 contingency table to examine dependency between variables by looking at Chi- square test. Use mosaic plot to visually examine dependency between variables
2/ 11 Readings: Simple frequency table: http://support.sas.com/training/sas94/m15_2.htm (http://support.sas.com/training/sas94/m15_2.htm) 2x2 contingency table: A contingency table shows the frequency distribution of the variables in a matrix format, while a mosaic plot graphically displays the information. Look here for an example: http://assets.csom.umn.edu/assets/163747.pdf (http://assets.csom.umn.edu/assets/163747.pdf) https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_freq_sec (https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_freq_sec Mosaic plot: https://blogs.sas.com/content/iml/2013/11/04/create-mosaic-plots-in-sas-by-using-proc-freq.html https://towardsdatascience.com/mosaic-plot-and-chi-square-test-c41b1a527ce4 Measuring association between categorical variables: The null hypothesis for a chi-square independence test is that two categorical variables are independent in some population. The p-value is given by the area under the right tail after the χ² test value. p=P[X>=chisquareValue]. Usually we can say two variables are related (we’re rejecting the null hypothesis of independence) if p-value<0.01 (sometimes also p- value<0.05 is considered statistically significant) and we can assume the two variables are related. You can find the p-value in the third column of the statistics table. https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_freq_sec The range of Cramer’s V is [-1 1] for 2x2 tables The range of the phi coefficient is [-1 1] for 2x2 tables. The contingency coefficient is a measure of association derived from the Pearson chi-square and is >0. https://www.statisticshowto.datasciencecentral.com/contingency-coefficient/ Understanding Contingency Coefficient Values A contingency coefficient is particularly informative if you’re working with a large sample. The contingency coefficient helps us decide if variable b is ‘contingent’ on variable a. However, it is a rough measure and doesn’t quantify the dependence exactly; It can be used as a rough guide: If C is near zero (or equal to zero) you can conclude that your variables are independent of each other; there is no association between them. If C is away from zero there is some relationship; C can only take on positive values.
3/ 11 Q1. Examine the target variable y: Use PROC FREQ to list a simple frequency table for the variable y. Q2. Examine the variable "contact" and study its dependency with the target variable y. Use PROC FREQ to list a simple frequency table for the variable "contact". Examine the output for invalid values.
4/ 11 Q3. Contingency table Contact by y and mosaic plot: Create a 2x2 contingency table along with a mosaic plot. Show the statistics for a table of contact by y.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help