BAN110_2

.pdf

School

Seneca College *

*We aren’t endorsed by this school

Course

110

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

14

Uploaded by BailiffComputer14693

21/02/2024, 14:31 Program Summary - HIMANI_119717239_Assignment2.sas about:blank 1/14 Program Summary - HIMANI_119717239_Assignment2.sas Execution Environment Author: u63731080 File: /home/u63731080/BAN110/HIMANI_119717239_Assignment2.sas SAS Platform: Linux LIN X64 3.10.0-1062.12.1.el7.x86_64 SAS Host: ODAWS02-USW2-2.ODA.SAS.COM SAS Version: 9.04.01M7P08062020 SAS Locale: en_GB Submission Time: 21/02/2024, 14:31:52 Browser Host: CPEBC4DFB434483-CMBC4DFB434480.CPE.NET.CABLE.ROGERS.COM User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Application Server: ODAMID00-USW2-2.ODA.SAS.COM Code: HIMANI_119717239_Assignment2.sas libname record '/home/u63731080/BAN110' ; data customer_record1 ; set record.customer_all ; run ; /*Q1. Examine the target variable y: Use PROC FREQ to list a simple frequency table for the variable y. */ title 'Simple Frequency Table of target variable y' ; proc freq data = customer_record1 ; table y ; run ; title ; /* Q2. Examine the variable "contact" and study its dependency with the target variable y. Use PROC FREQ to list a simple frequency table for the variable "contact". Examine the output for invalid values. */ title 'Simple Frequency Table of variable Contact' ; proc freq data = customer_record1 ; table contact ; run ; title ; /* Q3. Contiengency table Contact by y and mosaic plot: create a 2x2 contingency table along with a mosaic plot. Show the statistics for Table of contact by y. */ proc freq data = customer_record1 ; tables contact * y / chisq plots = mosaicplot ; run ; /* Interpret: (a) Based on the mosaic plot, do you assume association between the two variables? (b) Based on the Contingency coefficient, is there an association between the two variables? Answer: (a) Based on the mosaic plot, I would assume that there is an association between the two variables. It apperas that customers contacted via cellular were more likely to buy the Certificate of Deposit(CD) from the institution. Customers were contacted via telephone were the next most likely and the least likely were customers contacted by unknown meth (b) According to contingency coefficient with a value of 0.2541 there is medium association between the two variables. The closer to 0 the contigency coefficient is, the association is weaker and closer to 1 the contigency coefficient is, the a */ /* Q4. Examine the variable "education" /* 4.1. define a new format, name it education_Check and use it to identify invalid values for the variable education. Valid values are 'primary', 'secondary', 'tertiary', 'unknown'. Refer to program 1.8. Chapter 1 - Working with Character Data Cody's Data Cleaning Techniques Using SAS, Third Edition*/ Proc format ; value $ education_check 'primary' , 'secondary' , 'tertiary' , 'unknown' = 'valid' 'SECONDARY' = 'invalid' ; run ; title 'Checking Invalid values of Education' ; proc freq data = customer_record1 ; table Education / nocum nopercent missing ; format Education $education_check. ; run ; title ; /* 4.2. Use the function lowcase on education column. use the same dataset name for output dataset. */ data customer_record1 ; set record.customer_all ; Education = lowcase ( Education ); run ;
21/02/2024, 14:31 Program Summary - HIMANI_119717239_Assignment2.sas about:blank 2/14 /*4.3. show the simple frequency table after the change. */ title 'Simple frquency table of variable Education' ; proc freq data = customer_record1 ; table Education / nocum nopercent missing ; run ; title ; /* Q5. Examine the variable "marital". 5.1. Use PROC print with a where statement to check for data errors in the variable marital. Consider the valid values as "single", "married", "divorced". Refer to program 1.6. Chapter 1 - Working with Character Data Cody's Data Cleaning Techniques Using SAS, Third Edition */ title 'Table of Invalid values of variable marital ' ; proc print data = customer_record1 ; var marital ; id customer_id ; where marital not in ( 'single' , 'divorced' , 'married' ); run ; title ; /* 5.2. Use the function lowcase on the variable marital. */ data customer_record1 ; set record.customer_all ; marital = lowcase ( marital ); run ; /* 5.3. show the simple frequency table after the change. */ title 'Simple Frequency Table of variable marital' ; proc freq data = customer_record1 ; table marital / nocum nopercent missing ; run ; title ; /* Q6. Examine the variable "Job". 6.1. Use PROC FREQ to list a simple frequency table. */ title 'Simple Frequency Table of variable Job' ; proc freq data = customer_record1 ; table Job / nocum nopercent missing ; run ; title ; /* 6.2. write a code to combine the categories "admin." and "ADMINISTRATION" for the job variable as "admin". replace any occurrence of the value "ADMINISTRATION" with "admin". */ data customer_record1 ; set record.customer_all ; if Job in ( 'admin.' , 'ADMINISTRATION' ) then Job = 'admin' ; run ; /* 6.3. show the simple frequency table after the change. */ title 'Simple Frequency Table of variable Job after change' ; proc freq data = customer_record1 ; table Job / nocum nopercent missing ; run ; title ; /* Q7. checking missing values Adapt the code in program 7.2. of Chapter 1 so it works on customer_all dataset. Refer to program 7.2. Counting Missing Values for Character Variables in Chapter 1 - Working with Character Data Cody's Data C title "Checking Missing Character Values"; proc format; value $Count_Missing ' ' = 'Missing' other = 'Nonmissing'; run; proc freq data=Clean.Patients; tables _character_ / nocum missing; format _character_ $Count_Missing.; run; */ title "Checking Missing Character Values" ; proc format ; value $ Character_Count_Missing ' ' = 'Missing' other = 'Nonmissing' ; run ; proc freq data = customer_record1 ; tables _character_ / nocum missing ; format _character_ $Character_Count_Missing. ; run ; title ; title "Checking Missing Numeric Values" ; proc format ; value Numeric_Count_missing .= 'missing' other = 'nonmissing' ; run ;
21/02/2024, 14:31 Program Summary - HIMANI_119717239_Assignment2.sas about:blank 3/14 proc freq data = customer_record1 ; tables _numeric_ / nocum missing ; format _numeric_ Numeric_Count_Missing. ; run ; title ; /* Q8. create a new variable named jobMF to indicate the most frequent job category Reuse the code provided in ch17, section 17.3.2. check the most frequent job category based on the output of proc freq. create the new variable jobMF print the first few observations. */ title 'Simple Frequency Table of variable Job' ; proc freq data = customer_record1 order = freq ; table Job / nocum nopercent missing ; run ; title ; /* Abbreviations: MF-MostFrequent and NM-NotMostFrequent */ data customer_record1 ; set record.customer_all ; if job = 'management' then jobMF = 'MF' ; else jobMF = 'NM' ; run ; proc print data = customer_record1 ( obs = 10 ); run ; /* Q9. Removing units from a value and standardizing For a reference example, refer to program 1.10 from chapter 1: Working with Character Data Cody's Data Cleaning Techniques Using SAS, Third Edition Section: Removing Units from a Program 1.10: Converting Weight with Units to Weight in Kilograms *Program to Remove Units from Numeric Data; data Units; input Weight $ 10.; Digits = compress(Weight,,'kd'); 1 if findc(Weight,'k','i') then 2 Wt_Kg = input(Digits,5.); else if not missing(Digits) then Wt_Kg = input(Digits,5.)/2.2; 3 datalines; 100lbs. 110 Lbs. 50Kgs. 70 kg 180 ; title "Reading Weight Values with Units"; proc print data=Units noobs; format Wt_Kg 5.1; run; */ data units ; input Length $ 10. ; datalines ; 100m. 110 ft. 50M. 70 Ft 180 ; run ; proc print data = units ; run ; /* Given the following units data, */ /*(a) use the approriate function to keep only digits. name the new variable "digits" */ /* (b) use the function findc on length to search for the character 'm' (stands for meter), if m is found, keep the value as it is, if not, make a foot to meter conversion. */ data units ; input Length $ 10. ; digits = input ( compress ( Length , , 'kd' ), best32. ); if findc ( Length , 'm' , 'i' ) then Length_m = input ( digits , best32. ); else Length_m = input ( digits , best32. )* 0.3048 ; datalines ; 100m 110 ft 50M. 70 Ft 180 ; run ; proc print data = units ; run ;
21/02/2024, 14:31 Program Summary - HIMANI_119717239_Assignment2.sas about:blank 4/14 Log: HIMANI_119717239_Assignment2.sas Notes (61) 1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; NOTE: ODS statements in the SAS Studio environment may disable some output features. 69 70 libname record'/home/u63731080/BAN110'; NOTE: Libref RECORD was successfully assigned as follows: Engine: V9 Physical Name: /home/u63731080/BAN110 71 data customer_record1; 72 set record.customer_all; 73 run; NOTE: There were 10578 observations read from the data set RECORD.CUSTOMER_ALL. NOTE: The data set WORK.CUSTOMER_RECORD1 has 10578 observations and 17 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 3421.71k OS Memory 27048.00k Timestamp 21/02/2024 07:31:51 PM Step Count 55 Switch Count 2 Page Faults 0 Page Reclaims 544 Page Swaps 0 Voluntary Context Switches 17 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 2568 74 75 76 /*Q1. Examine the target variable y: 77 Use PROC FREQ to list a simple frequency table for the variable y. */ 78 79 title'Simple Frequency Table of target variable y'; 80 proc freq data=customer_record1; 81 table y; 82 run; NOTE: There were 10578 observations read from the data set WORK.CUSTOMER_RECORD1. NOTE: PROCEDURE FREQ used (Total process time): real time 0.01 seconds user cpu time 0.02 seconds system cpu time 0.00 seconds memory 2937.75k OS Memory 25512.00k Timestamp 21/02/2024 07:31:51 PM Step Count 56 Switch Count 2 Page Faults 0 Page Reclaims 374 Page Swaps 0 Voluntary Context Switches 13 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 272 83 title; 84 85 /* Q2. Examine the variable "contact" and study its dependency with the target variable y. 86 Use PROC FREQ to list a simple frequency table for the variable "contact". 87 Examine the output for invalid values. */ 88 89 title'Simple Frequency Table of variable Contact'; 90 proc freq data=customer_record1; 91 table contact; 92 run; NOTE: There were 10578 observations read from the data set WORK.CUSTOMER_RECORD1. NOTE: PROCEDURE FREQ used (Total process time): real time 0.01 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 2044.18k OS Memory 25768.00k Timestamp 21/02/2024 07:31:51 PM Step Count 57 Switch Count 2 Page Faults 0 Page Reclaims 328 Page Swaps 0 Voluntary Context Switches 13 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 264 93 title; 94 95 96 /* Q3. Contiengency table Contact by y and mosaic plot: 97 create a 2x2 contingency table along with a mosaic plot. 98 Show the statistics for Table of contact by y. */ 99 100 101 proc freq data=customer_record1; 102 tables contact * y / chisq plots=mosaicplot; 103 run; NOTE: There were 10578 observations read from the data set WORK.CUSTOMER_RECORD1. NOTE: PROCEDURE FREQ used (Total process time): real time 0.15 seconds user cpu time 0.07 seconds system cpu time 0.01 seconds memory 10422.31k
21/02/2024, 14:31 Program Summary - HIMANI_119717239_Assignment2.sas about:blank 5/14 OS Memory 33332.00k Timestamp 21/02/2024 07:31:51 PM Step Count 58 Switch Count 4 Page Faults 0 Page Reclaims 2291 Page Swaps 0 Voluntary Context Switches 225 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 1056 104 105 106 /* Interpret: 107 (a) Based on the mosaic plot, do you assume association between the two variables? 108 (b) Based on the Contingency coefficient, is there an association between the two variables? 109 Answer: 110 (a) Based on the mosaic plot, I would assume that there is an association between the two variables. 111 It apperas that customers contacted via cellular were more likely to buy the Certificate of Deposit(CD) from the 111 ! institution. 112 Customers were contacted via telephone were the next most likely and the least likely were customers contacted by unknown 112 ! methods. 113 (b) According to contingency coefficient with a value of 0.2541 there is medium association between the two variables. 114 The closer to 0 the contigency coefficient is, the association is weaker and closer to 1 the contigency coefficient is, 114 ! the association is stronger. 115 */ 116 117 118 /* Q4. Examine the variable "education" 119 120 /* 4.1. define a new format, name it education_Check and 121 use it to identify invalid values for the variable education. 122 Valid values are 'primary', 'secondary', 'tertiary', 'unknown'. 123 Refer to program 1.8. 124 Chapter 1 - Working with Character Data 125 Cody's Data Cleaning Techniques Using SAS, Third Edition*/ 126 127 Proc format; 128 value $education_check 129 'primary','secondary','tertiary','unknown' ='valid' 130 'SECONDARY' = 'invalid'; NOTE: Format $EDUCATION_CHECK is already on the library WORK.FORMATS. NOTE: Format $EDUCATION_CHECK has been output. 131 run; NOTE: PROCEDURE FORMAT used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 249.71k OS Memory 30880.00k Timestamp 21/02/2024 07:31:51 PM Step Count 59 Switch Count 0 Page Faults 0 Page Reclaims 14 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 32 132 133 title'Checking Invalid values of Education'; 134 proc freq data=customer_record1; 135 table Education/ nocum nopercent missing; 136 format Education $education_check.; 137 run; NOTE: There were 10578 observations read from the data set WORK.CUSTOMER_RECORD1. NOTE: PROCEDURE FREQ used (Total process time): real time 0.01 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 2027.78k OS Memory 32168.00k Timestamp 21/02/2024 07:31:51 PM Step Count 60 Switch Count 2 Page Faults 0 Page Reclaims 329 Page Swaps 0 Voluntary Context Switches 12 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 272 138 title; 139 140 /* 4.2. Use the function lowcase on education column. 141 use the same dataset name for output dataset. */ 142 143 data customer_record1; 144 set record.customer_all; 145 Education=lowcase(Education); 146 run; NOTE: There were 10578 observations read from the data set RECORD.CUSTOMER_ALL. NOTE: The data set WORK.CUSTOMER_RECORD1 has 10578 observations and 17 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 3425.62k OS Memory 33704.00k Timestamp 21/02/2024 07:31:51 PM Step Count 61 Switch Count 2 Page Faults 0 Page Reclaims 521 Page Swaps 0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help