Assignment1-320-WorkingWithData
.docx
keyboard_arrow_up
School
The University of Tennessee, Knoxville *
*We aren’t endorsed by this school
Course
320
Subject
Mathematics
Date
Apr 3, 2024
Type
docx
Pages
64
Uploaded by ProfessorSquidMaster853
BAS 320 - Assignment 1 - Working with R
Your name here
Response:
My favorite animal is the red panda!
getwd
() #This should output the path to your BAS folder; change with Session, Set Working Directory if not
## [1] "/Users/samples/Documents/BAS 320"
load
(
"BAS320datasets.RData"
)
Question 1: R as a calculator
Translate into R syntax the following mathematical expressions. Have the answers be “printed to the screen” (i.e., don’t left-arrow them to anything) when it is knitted. Note: you can mouse over the equations to see them, or you can head to the Canvas page to see what they look like. If you only see half an equation, click the double up arrows on the far right of
the equation to collapse it, then click again to expand it.
15
(
1
−
3
8
+
5
)
b.
2
2
/
3
+
3
4
/
5
5
2
−
4
6
/
7
c.
1
+
√
5
+
e
−
2
(
5
−
3
)
+
|
ln
(
15
)
−
log
10
(
54321
)
)
d.
When translating in R, write in R’s version of scientific notation
5.32
×
10
3
+
3.12
×
10
2
+
9.87
×
10
−
1
# 1.a; about 11.5
15
*
(
1
-
(
3
/
(
8
+
5
)))
## [1] 11.53846
# 1.b; about 0.18
((
2
^
(
2
/
3
)
+
3
^
(
4
/
5
))
/
(
5
^
2-4
^
(
6
/
7
)))
## [1] 0.183972
# 1.c; about 5.3
1
+
(
sqrt
(
5
))
+
exp
((
-
2
*
(
5-3
)))
+
(
abs
(
log
(
15
)
-
log10
(
54321
)))
## [1] 5.281301
# 1.d; about 5633
5.32
*
10
^
3
+3.12
*
10
^
2
+9.87
*
10
^
(
-
1
)
## [1] 5632.987
Question 2: Left-arrow
Step 1: Define a variable named timeonsite
to equal 56. Imagine this is the number of minutes that someone spends on a website on their first visit.
Step 2: Re-define timeonsite
to be 80% its current value, plus 0.9 (for example, it will go from equaling 56 to equaling 45.7). Imagine this is the number of minutes that someone spends on a website on their second visit.
Step 3: Step 2 is repeated a total of 8 times (giving the time spend on the 3rd, 4th, 5th, 6th, 7th, 8th, and 9th visit).
Write R code that updates the value of timeonsite
through the 9th visit. Print to the screen the value of timeonsite
after it’s had its value updated 8 times. Sanity check: about
13.1.
timeonsite
<-
56
timeonsite
<-
timeonsite
*
.
8
+.9
timeonsite
## [1] 45.7
Question 3: vector creation
a.
Create a numeric vector named Q3a
that contains the elements 10, 5, 3, 2, 8, 7. Print to the screen the average of the elements of this vector, rounded to 2 decimal places (you’ll need to use the mean
and round
functions here).
Q3a
<-
c
(
10
,
5
,
3
,
2
,
8
,
7
)
round
(
mean
(Q3a),
2
)
## [1] 5.83
b.
Create a categorical vector named Q3b
that contains the elements cup, spoon, cup, cup, spoon, knife. Provide a frequency table of the elements in Q3b
(you’ll need the factor
and table
commands here).
Q3b
<-
factor
(
c
(
'cup'
,
'spoon'
,
'cup'
,
'cup'
,
'spoon'
,
'knife'
))
table
(Q3b)
## Q3b
## cup knife spoon ## 3 1 2
c.
Create a numeric vector named Q3c
whose elements are a regularly spaced sequence that starts at 5.4, ends at 8.2, and increments by 0.1 (you’ll need the seq
command here). Print the contents of Q3c
to the screen.
Q3c
<-
c
(
seq
(
from=
5.4
, to=
8.2
, by=
.
1
))
Q3c
## [1] 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2
## [20] 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2
d.
Create a text vector named Q3d
that consists of the five words coffee, tea, tea, tea, and crackers, repeated a total of five times. You’ll need to use the rep
command, and
your vector should contain a total of 25 elements. Print to the screen the contents of Q3d
and show the result of running table(Q3d)
to give a frequency table of its elements.
Q3d
<-
rep
(
c
(
"coffee"
,
"tea"
,
"tea"
,
"tea"
,
"crackers"
),
5
)
Q3d
## [1] "coffee" "tea" "tea" "tea" "crackers" "coffee" ## [7] "tea" "tea" "tea" "crackers" "coffee" "tea" ## [13] "tea" "tea" "crackers" "coffee" "tea" "tea" ## [19] "tea" "crackers" "coffee" "tea" "tea" "tea" ## [25] "crackers"
table
(Q3d)
## Q3d
## coffee crackers tea ## 5 5 15
e.
Create a vector named up
that contains the integer sequence from 12 to 65. Create a vector named down
that contains the integer sequence from 104 to 51. What’s the median of the vector produced when you triple the elements of up
and all them to the elements of down
?
up
<-
c
(
seq
(
from=
12
,
to=
65
))
down
<-
c
(
seq
(
from=
104
,
to=
51
))
down
<-
down
+
(up
*
3
)
median
(down)
## [1] 193
Question 4: Vectors (Charity data)
Load in the CHARITY
dataframe that is contained in the regclass
library (you’ll need the library
and data
commands). This data contains information on 15283 donors to a charity. Let’s look at the ResponseProportion
column, which gives the fraction of solicitations made by the charity that resulted in that person giving a donation. Save the ResponseProportion
column into a vector named rp
, and use this vector in the follow parts. For most parts, you’ll need to use the which
function.
library
(regclass)
## Loading required package: bestglm
## Loading required package: leaps
## Loading required package: VGAM
## Loading required package: stats4
## Loading required package: splines
## Loading required package: rpart
## Loading required package: randomForest
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
## Important regclass change from 1.3:
## All functions that had a . in the name now have an _
## all.correlations -> all_correlations, cor.demo -> cor_demo, etc.
data
(
"CHARITY"
)
rp
<-
CHARITY
$
ResponseProportion
a.
What are the values in the 2874th and 3333rd positions of rp
?
rp[
c
(
2874
,
3333
)]
## [1] 0.182 0.368
b.
In what positions of rp
will you find the number 0.875 (there are three positions)?
which
(rp
==
.
875
)
## [1] 1096 14971 14984
c.
What are the unique values of rp
that are less than 0.11? You’ll need to use the unique
function (not discussed before; you can read about it by running ?unique
in
the Console, but it’s pretty self-explanatory).
?unique
which
(rp
==
unique
(rp) &
rp
<
.
11
)
## Warning in rp == unique(rp): longer object length is not a multiple
of shorter
## object length
## [1] 4 579 1293 2284 2358 2707 3424 4450 6423 6803 7449 8717
## [13] 9287 11910 12435 13798 13962 15057
unique_rp
<-
unique
(rp)
which
(unique_rp
<
.
11
)
## [1] 4 9 21 26 31 39 53 54 60 77 78 79 82 85 86
unique_rp[
which
(unique_rp
<
.
11
)]
## [1] 0.100 0.095 0.077 0.063 0.059 0.105 0.056 0.091 0.067 0.050 0.048 0.083
## [13] 0.053 0.045 0.071
d.
What is the average value of all elements of rp
that are greater than 0? You’ll need to use the mean
function.
mean
(rp[
which
(rp
>
0
)])
## [1] 0.2116369
e.
How many values of rp
and between 0.595 and 0.775? You’ll need to use the length
function.
?length
length
(rp[
which
(rp
>
.
595
&
rp
<
.
775
)])
## [1] 123
f.
Some of the elements of rp
are equal to 0.012, 0.233, 0.557, 0.583, 0.636, 0.751. By using the %in%
shortcut, print to the screen all elements in rp
that are equal to one of these values. Note: some of these numbers don’t appear at all in rp
!
rp[
which
(rp
%in%
c
(.
012
,.
233
,.
557
,.
583
,.
636
,.
751
))]
## [1] 0.583 0.636 0.583 0.636
g.
Determine the number of elements that can be written as a number with at most a single digit after the decimal point, i.e., that are equal to 0, 0.1, 0.2, 0.3, …, 0.9, 1.
length
(rp[
which
(rp
%in%
seq
(
from=
0
,
to=
1
,
by=
.
1
))])
## [1] 2217
h.
Report the average value of rp
using everything but
the values in positions 10001 through 15000. Your answer will be close to 0.213.
mean
(rp[
-
seq
(
from=
10001
, to=
15000
,
by=
1
)])
## [1] 0.2130305
Question 5: Data Frames (Spotify data)
After loading in the .RData file for this assignment (
BAS320datasets.RData
; see chunk code near the top), you’ll see a dataframe called HIT
in the global environment. This dataset contains information on 41106 songs that you can stream on Spotify (see track
and artist
) along with song characteristics such as danceability
, energy
, valence
(see https://rpubs.com/PeterDola/SpotifyTracks
for detailed definitions of these quantities).
a.
The class
function reveals the type of objects (numeric vector, factor, data.frame, function, etc.) in the global environment. What type of objects are HIT
, HIT$track
, HIT$energy
, and HIT$mode
? The output of running class
will suffice.
?class
class
(HIT)
## [1] "data.frame"
class
(HIT
$
track)
## [1] "character"
class
(HIT
$
energy)
## [1] "numeric"
class
(HIT
$
mode)
## [1] "factor"
b.
What is total length of time (in milliseconds) of all songs in this data (i.e., the sum of all values that are in the column duration_ms
)?
sum
(HIT
$
duration_ms)
## [1] 9654876589
c.
What percentage
(a number between 0-1) of songs have values of energy
that are greater than 0.1? Try using length
and which
to count up the number of entries in the energy
column that are greater than 0.1 then dividing by then number of rows in HIT
.
length
(
which
(HIT
$
energy
>
.
1
))
## [1] 39442
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help