Assignment1-320-WorkingWithData

.docx

School

The University of Tennessee, Knoxville *

*We aren’t endorsed by this school

Course

320

Subject

Mathematics

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by ProfessorSquidMaster853

BAS 320 - Assignment 1 - Working with R Your name here Response: My favorite animal is the red panda! getwd () #This should output the path to your BAS folder; change with Session, Set Working Directory if not ## [1] "/Users/samples/Documents/BAS 320" load ( "BAS320datasets.RData" ) Question 1: R as a calculator Translate into R syntax the following mathematical expressions. Have the answers be “printed to the screen” (i.e., don’t left-arrow them to anything) when it is knitted. Note: you can mouse over the equations to see them, or you can head to the Canvas page to see what they look like. If you only see half an equation, click the double up arrows on the far right of the equation to collapse it, then click again to expand it. 15 ( 1 − 3 8 + 5 ) b. 2 2 / 3 + 3 4 / 5 5 2 − 4 6 / 7 c. 1 + √ 5 + e − 2 ( 5 − 3 ) + | ln ( 15 ) − log 10 ( 54321 ) ) d. When translating in R, write in R’s version of scientific notation 5.32 × 10 3 + 3.12 × 10 2 + 9.87 × 10 − 1 # 1.a; about 11.5 15 * ( 1 - ( 3 / ( 8 + 5 ))) ## [1] 11.53846

# 1.b; about 0.18 (( 2 ^ ( 2 / 3 ) + 3 ^ ( 4 / 5 )) / ( 5 ^ 2-4 ^ ( 6 / 7 ))) ## [1] 0.183972 # 1.c; about 5.3 1 + ( sqrt ( 5 )) + exp (( - 2 * ( 5-3 ))) + ( abs ( log ( 15 ) - log10 ( 54321 ))) ## [1] 5.281301 # 1.d; about 5633 5.32 * 10 ^ 3 +3.12 * 10 ^ 2 +9.87 * 10 ^ ( - 1 ) ## [1] 5632.987 Question 2: Left-arrow Step 1: Define a variable named timeonsite to equal 56. Imagine this is the number of minutes that someone spends on a website on their first visit. Step 2: Re-define timeonsite to be 80% its current value, plus 0.9 (for example, it will go from equaling 56 to equaling 45.7). Imagine this is the number of minutes that someone spends on a website on their second visit. Step 3: Step 2 is repeated a total of 8 times (giving the time spend on the 3rd, 4th, 5th, 6th, 7th, 8th, and 9th visit). Write R code that updates the value of timeonsite through the 9th visit. Print to the screen the value of timeonsite after it’s had its value updated 8 times. Sanity check: about 13.1. timeonsite <- 56 timeonsite <- timeonsite * . 8 +.9 timeonsite ## [1] 45.7 Question 3: vector creation a. Create a numeric vector named Q3a that contains the elements 10, 5, 3, 2, 8, 7. Print to the screen the average of the elements of this vector, rounded to 2 decimal places (you’ll need to use the mean and round functions here). Q3a <- c ( 10 , 5 , 3 , 2 , 8 , 7 ) round ( mean (Q3a), 2 ) ## [1] 5.83

b. Create a categorical vector named Q3b that contains the elements cup, spoon, cup, cup, spoon, knife. Provide a frequency table of the elements in Q3b (you’ll need the factor and table commands here). Q3b <- factor ( c ( 'cup' , 'spoon' , 'cup' , 'cup' , 'spoon' , 'knife' )) table (Q3b) ## Q3b ## cup knife spoon ## 3 1 2 c. Create a numeric vector named Q3c whose elements are a regularly spaced sequence that starts at 5.4, ends at 8.2, and increments by 0.1 (you’ll need the seq command here). Print the contents of Q3c to the screen. Q3c <- c ( seq ( from= 5.4 , to= 8.2 , by= . 1 )) Q3c ## [1] 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 ## [20] 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 d. Create a text vector named Q3d that consists of the five words coffee, tea, tea, tea, and crackers, repeated a total of five times. You’ll need to use the rep command, and your vector should contain a total of 25 elements. Print to the screen the contents of Q3d and show the result of running table(Q3d) to give a frequency table of its elements. Q3d <- rep ( c ( "coffee" , "tea" , "tea" , "tea" , "crackers" ), 5 ) Q3d ## [1] "coffee" "tea" "tea" "tea" "crackers" "coffee" ## [7] "tea" "tea" "tea" "crackers" "coffee" "tea" ## [13] "tea" "tea" "crackers" "coffee" "tea" "tea" ## [19] "tea" "crackers" "coffee" "tea" "tea" "tea" ## [25] "crackers" table (Q3d) ## Q3d ## coffee crackers tea ## 5 5 15 e. Create a vector named up that contains the integer sequence from 12 to 65. Create a vector named down that contains the integer sequence from 104 to 51. What’s the median of the vector produced when you triple the elements of up and all them to the elements of down ? up <- c ( seq ( from= 12 , to= 65 )) down <- c ( seq ( from= 104 , to= 51 ))

down <- down + (up * 3 ) median (down) ## [1] 193 Question 4: Vectors (Charity data) Load in the CHARITY dataframe that is contained in the regclass library (you’ll need the library and data commands). This data contains information on 15283 donors to a charity. Let’s look at the ResponseProportion column, which gives the fraction of solicitations made by the charity that resulted in that person giving a donation. Save the ResponseProportion column into a vector named rp , and use this vector in the follow parts. For most parts, you’ll need to use the which function. library (regclass) ## Loading required package: bestglm ## Loading required package: leaps ## Loading required package: VGAM ## Loading required package: stats4 ## Loading required package: splines ## Loading required package: rpart ## Loading required package: randomForest ## randomForest 4.7-1.1 ## Type rfNews() to see new features/changes/bug fixes. ## Important regclass change from 1.3: ## All functions that had a . in the name now have an _ ## all.correlations -> all_correlations, cor.demo -> cor_demo, etc. data ( "CHARITY" ) rp <- CHARITY $ ResponseProportion a. What are the values in the 2874th and 3333rd positions of rp ? rp[ c ( 2874 , 3333 )] ## [1] 0.182 0.368 b. In what positions of rp will you find the number 0.875 (there are three positions)? which (rp == . 875 ) ## [1] 1096 14971 14984

c. What are the unique values of rp that are less than 0.11? You’ll need to use the unique function (not discussed before; you can read about it by running ?unique in the Console, but it’s pretty self-explanatory). ?unique which (rp == unique (rp) & rp < . 11 ) ## Warning in rp == unique(rp): longer object length is not a multiple of shorter ## object length ## [1] 4 579 1293 2284 2358 2707 3424 4450 6423 6803 7449 8717 ## [13] 9287 11910 12435 13798 13962 15057 unique_rp <- unique (rp) which (unique_rp < . 11 ) ## [1] 4 9 21 26 31 39 53 54 60 77 78 79 82 85 86 unique_rp[ which (unique_rp < . 11 )] ## [1] 0.100 0.095 0.077 0.063 0.059 0.105 0.056 0.091 0.067 0.050 0.048 0.083 ## [13] 0.053 0.045 0.071 d. What is the average value of all elements of rp that are greater than 0? You’ll need to use the mean function. mean (rp[ which (rp > 0 )]) ## [1] 0.2116369 e. How many values of rp and between 0.595 and 0.775? You’ll need to use the length function. ?length length (rp[ which (rp > . 595 & rp < . 775 )]) ## [1] 123 f. Some of the elements of rp are equal to 0.012, 0.233, 0.557, 0.583, 0.636, 0.751. By using the %in% shortcut, print to the screen all elements in rp that are equal to one of these values. Note: some of these numbers don’t appear at all in rp ! rp[ which (rp %in% c (. 012 ,. 233 ,. 557 ,. 583 ,. 636 ,. 751 ))] ## [1] 0.583 0.636 0.583 0.636 g. Determine the number of elements that can be written as a number with at most a single digit after the decimal point, i.e., that are equal to 0, 0.1, 0.2, 0.3, …, 0.9, 1. length (rp[ which (rp %in% seq ( from= 0 , to= 1 , by= . 1 ))]) ## [1] 2217

h. Report the average value of rp using everything but the values in positions 10001 through 15000. Your answer will be close to 0.213. mean (rp[ - seq ( from= 10001 , to= 15000 , by= 1 )]) ## [1] 0.2130305 Question 5: Data Frames (Spotify data) After loading in the .RData file for this assignment ( BAS320datasets.RData ; see chunk code near the top), you’ll see a dataframe called HIT in the global environment. This dataset contains information on 41106 songs that you can stream on Spotify (see track and artist ) along with song characteristics such as danceability , energy , valence (see https://rpubs.com/PeterDola/SpotifyTracks for detailed definitions of these quantities). a. The class function reveals the type of objects (numeric vector, factor, data.frame, function, etc.) in the global environment. What type of objects are HIT , HIT$track , HIT$energy , and HIT$mode ? The output of running class will suffice. ?class class (HIT) ## [1] "data.frame" class (HIT $ track) ## [1] "character" class (HIT $ energy) ## [1] "numeric" class (HIT $ mode) ## [1] "factor" b. What is total length of time (in milliseconds) of all songs in this data (i.e., the sum of all values that are in the column duration_ms )? sum (HIT $ duration_ms) ## [1] 9654876589 c. What percentage (a number between 0-1) of songs have values of energy that are greater than 0.1? Try using length and which to count up the number of entries in the energy column that are greater than 0.1 then dividing by then number of rows in HIT . length ( which (HIT $ energy > . 1 )) ## [1] 39442

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help