Assignment1 (1)

.pdf

School

Toronto Metropolitan University *

*We aren’t endorsed by this school

Course

123

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

1

Uploaded by ramabarham18

CIND 123 - Data Analytics: Basic Methods - Assignment 1 (10%) 2023-10-17 Question 1 (32 points) Q1a (8 points) Create and print a vector x with all integers from 15 to 100 and a vector y containing multiples of 5 in the same range. Hint: use seq()function. Calculate the difference in lengths of the vectors x and y. Hint: use length() insert your answer here: # START OF ANSWER FOR Q1a #Create and print a vector x with all integers from 4 to 115 x <-c(4:115) print(x) ## [1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ## [19] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 ## [37] 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 ## [55] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 ## [73] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 ## [91] 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 ## [109] 112 113 114 115 #finding length of x Lx<-length(x) #Creating a vector y containing multiples of 4 in the same range as x y <- seq(4, 115, by = 4) print(y) ## [1] 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 ## [20] 80 84 88 92 96 100 104 108 112 #finding length of y Ly<-length(y) #Calculating the difference in lengths between vectors x and y print(Lx-Ly) ## [1] 84 # END OF ANSWER FOR Q1a Q1b (8 points) Create a new vector, y_square, with the square of elements at indices 1, 3, 7, 12, 17, 20, 22, and 24 from the variable y. Hint: Use indexing rather than a for loop. Calculate the mean and median of the FIRST five values from y_square. Insert your answer here. # START OF ANSWER FOR Q1b #Create a new vector,y_square, with the square of elements at indices 1, 3, 7, 12, 17, 20, 22, and 24 from the variable #y. Hint: Use indexing rather than a for loop. Calculate the mean and median of the FIRST five values from y_square. #INSERT ANSWER HERE #creating variable y with values of 1, 3, 7, 12, 17, 20, 22, 24 y<-c(1, 3, 7, 12, 17, 20, 22, 24) #squaring the values from y under new variable y_square y_square<-(y^2) #Calculating the mean and median of the FIRST five values from y_square. mean_first_five<-(mean(y_square[1:5])) median_first_five<-(median(y_square[1:5])) # printing the mean print(mean_first_five) ## [1] 98.4 #printing the median print(median_first_five) ## [1] 49 # END OF ANSWER FOR Q1b Q1c (8 points) For a given factor variable of factorVar <- factor(c(1, 6, 5.4, 3.2)), would it be correct to use the following commands to convert factor to number? as.numeric(factorVar) If not, explain your answer and provide the correct one. Insert your answer here. # START OF ANSWER FOR Q1c # It is incorrect to only use the command "as.numeric(factorVar)" to convert the factor to number. # In order to correctly convert a factor to a number, you need to either: 1) use as.charactor() to convert the values in the factorVar, then use as.numeric() to convert the values to numbers; # or 2) use level() to extract the level labels, then use as.numeric() to convert the labels to numbers # I will be using method 1 to provide the correct solution: factorVar <- factor(c(1, 6, 5.4, 3.2)) #Using as.charactor() to convert the values in the factorVar characterVar <- as.character(factorVar) #Using as.numeric() to convert the values to numbers; numericVar <- as.numeric(characterVar) # Testing if the factor is cateogrized as a numeric class is.numeric(numericVar) ## [1] TRUE # END OF ANSWER FOR Q1c Q1d (8 points) A comma-separated values file dataset.csv consists of missing values represented by Not A Number (null) and question mark (?). How can you read this type of files in R? NOTE: Please make sure you have saved the dataset.csv file at your current working directory. Insert your answer here. # START OF ANSWER FOR Q1d # Set the file path file_path <- "dataset.csv" # Read the CSV file, specifying the missing values read.csv(file_path, na.strings = c("null", "?")) ## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ## 1 11 12 13 14 15 16 17 18 19 20 ## 2 21 22 23 24 25 26 27 28 29 30 ## 3 31 32 33 34 35 36 37 38 39 40 ## 4 41 42 43 44 45 NA 47 48 49 50 ## 5 51 52 53 NA 55 56 57 NA 59 60 ## 6 61 62 63 64 65 66 67 68 69 70 ## 7 71 72 NA 74 75 76 77 78 79 80 ## 8 81 82 83 84 85 86 87 88 89 NA ## 9 91 92 93 94 95 96 97 98 99 100 ## 10 NA 102 103 104 105 106 107 108 109 110 ## 11 111 112 113 114 115 116 117 118 119 120 ## 12 121 122 123 124 125 126 127 128 129 130 ## 13 131 132 133 134 135 136 137 138 139 NA ## 14 141 142 143 144 145 146 147 148 149 150 ## 15 151 152 153 154 155 156 157 158 159 160 ## 16 161 162 163 164 NA 166 167 168 169 170 # END OF ANSWER FOR Q1d Question 2 (32 points) Q2a (8 points) Compute: sigma ((-1)^x / (x!)^2) with x ranging from 5 to 20 Hint: Use factorial(n) to compute Insert your answer here. # Initialize the sum variable sum_value <- 0 # Iterate from 5 to 20 for (n in 5:20) { sum_value <- sum_value + ((-1)^n / (factorial(n)^2)) } # Compute the final result print(sum_value) ## [1] -6.755419e-05 # END OF ANSWER FOR Q2a Q2b (8 points) Compute: pi-product ((4 * n) + (1/2^n)) with n ranging from 1 to 5 NOTE: The pi-product symbol represents multiplication. Insert your answer here. # START OF ANSWER FOR Q2b # Initialize the product variable product_value <- 1 # Calculate the product for (n in 1:5) { product_value <- product_value * (4 * n + 1/(2^n)) } # Calculate the final result print(product_value) ## [1] 144833.6 # END OF ANSWER FOR Q2b Q2c (8 points) Describe what the following R command does: c(0:5)[NA] Insert your answer here. # START OF ANSWER FOR Q2c # The R command c(0:5)[NA] creates a numeric vector containing the numbers 0 through 5. # The c() function is what creates the vector and (0:5) describes the range from 0 to 5. vector <- c(0:5) # The [NA] is an index which stands for (Not Available]. This index extracts elements from the vector. Output <- c(0:5)[NA] # However, since the index is NA, the command will generate a vector of NA values with the same length as the original vector, where each element is NA: # Example: Output - [1] NA NA NA NA NA NA print(Output) ## [1] NA NA NA NA NA NA #In summary, the command creates a vector of NA values with the same length as the numbers 0 through 5. # END OF ANSWER FOR Q2c Q2d (8 points) Describe the purpose of is.vector(), is.character(), is.numeric(), and is.na() functions? Please use x <- c(“a”, “b”, NA, 2) to explain your description. Insert your answer here. # START OF ANSWER FOR Q2d # The purpose of these functions is to determine the data type of the object by determining if it's TRUE or FALSE. # I will be using the value x <- c("a", "b", NA, 2) to explain my description. x <- c("a", "b", NA, 2) # The purpose of the is.vector() function checks whether an object is a vector or not. is_vector_output <- is.vector(x) print(is_vector_output) # Output: TRUE since x is a vector. ## [1] TRUE # The purpose of the is.character() function checks whether an object is a character or not. is_character_output <- is.character(x) print(is_character_output) # Output: TRUE for the first two elements ("a" and "b") and FALSE for NA and 2. ## [1] TRUE # The purpose of the is.numeric() function checks whether an object is a numeric type (i.e., numbers). is_numeric_output <- is.numeric(x) print(is_numeric_output) # In this case, the output is: FALSE (a) FALSE (b) FALSE (NA) TRUE (2) ## [1] FALSE # The purpose of the is.na() function checks whether elements of an object are missing (NA). is_na_output <- is.na(x) print(is_na_output) # Output: FALSE (a) FALSE (a) TRUE (NA) FALSE (2) ## [1] FALSE FALSE TRUE FALSE # END OF ANSWER FOR Q2d Question 3 (36 points) The airquality dataset contains daily air quality measurements in New York from May to September 1973. The variables include Ozone level, Solar radiation, wind speed, temperature in Fahrenheit, month, and day. Please see the detailed description using help(“airquality”). Install the airquality data set on your computer using the command install.packages(“datasets”). Then load the datasets package into your session. library(datasets) Question 3a (4 points) Display the first 10 rows of the airquality data set. Insert your answer here. library(datasets) #Using the head() function to display the first 10 rows present in the input data frame. head(airquality, n = 10) ## Ozone Solar.R Wind Temp Month Day ## 1 41 190 7.4 67 5 1 ## 2 36 118 8.0 72 5 2 ## 3 12 149 12.6 74 5 3 ## 4 18 313 11.5 62 5 4 ## 5 NA NA 14.3 56 5 5 ## 6 28 NA 14.9 66 5 6 ## 7 23 299 8.6 65 5 7 ## 8 19 99 13.8 59 5 8 ## 9 8 19 20.1 61 5 9 ## 10 NA 194 8.6 69 5 10 # END OF ANSWER FOR Q3a Question 3b (8 points) Compute the average of the first four variables (Ozone, Solar.R, Wind and Temp) for the fifth month using the sapply() function. Hint: You might need to consider removing the NA values; otherwise, the average will not be computed. Insert your answer here. # START OF ANSWER FOR Q3b library(datasets) # Define a function to calculate the mean excluding NA values omitna_mean <- function(value) { mean(value, na.rm = TRUE) } # Use sapply to apply the function to each column mean_values <- sapply(airquality[1:5, ], omitna_mean)[0:4] # Print the mean values print(mean_values) ## Ozone Solar.R Wind Temp ## 26.75 192.50 10.76 66.20 # END OF ANSWER FOR Q3b Question 3c (8 points) Construct a boxplot for the all Wind and Temp variables, then display the values of all the outliers which lie beyond the whiskers. Insert your answer here. # START OF ANSWER FOR Q3c library(datasets) # Airquality$Wind represents the Wind variable, and airquality$Temp represents the Temperature variable.These are the variables/columns that will be used for the boxplot. # The names refer to first boxplot, labeled "Wind", and the second, labeled "Temp". boxplot(airquality$Wind, airquality$Temp, names = c("Wind", "Temp"), outline = TRUE) # outline is equal to TRUE to display the outliers. outliers <- boxplot.stats(airquality$Wind)$out # Printing outliers which lie beyond the whiskers. outliers <- c(outliers, boxplot.stats(airquality$Temp)$out) print(outliers) ## [1] 20.1 18.4 20.7 # END OF ANSWER FOR Q3c Question 3d (8 points) Compute the upper quartile of the Wind variable with two different methods. HINT: Only show the upper quartile using indexing. For the type of quartile, please see https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/quantile . Insert your answer here. # START OF ANSWER FOR Q3d # Method 1: Using quantile() function #calculating the upper quartile (third quartile) probs set to 75% and type set to 7 quartile_method1 <- quantile(airquality$Wind, probs = 0.75, type = 7) print(quartile_method1) ## 75% ## 11.5 # Method 2: Using indexing #Sorting the Wind variable from the 'airquality' dataset. wind_sorted <- sort(airquality$Wind) # Calculating number of elements in the sorted list and assigning it to variable 'x'. x <- length(wind_sorted) # Calculating the upper quartile by taking the value at the position corresponding to 75% of the data when sorted. quartile_method2 <- wind_sorted[(x * 0.75)] print(quartile_method2) ## [1] 11.5 # END OF ANSWER FOR Q3d Question 3e (8 points) Construct a pie chart to describe the number of entries by Month. HINT: use the table() function to count and tabulate the number of entries within a Month. Insert your answer here. # START OF ANSWER FOR Q3e library(datasets) # Counting the number of entries by Month. count <- table(airquality$Month) # Create a pie chart pie(count, labels = names(count), main = "Number of Entries by Month") # END OF ANSWER FOR Q3e
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help