439 sample midterm
.pdf
keyboard_arrow_up
School
Rutgers University *
*We aren’t endorsed by this school
Course
439
Subject
Mathematics
Date
Apr 3, 2024
Type
Pages
9
Uploaded by MasterEnergyCapybara27
Question 1 - Multiple Choice - 16 points
This question covers multiple topics. Each question is worth 2 points.
1.
Suppose that site locations of 100 sites are given as latitude and longitude of each side. What
is the best way to visualize this data?
(a) histogram
(b) scatter plot
(c) bar plot
(d) KDE plot
2.
Your letter grade (e.g., A+, A, B+. . . ) in a class that grades on a curve is most accurately
described as what kind of data?
(a) ordinal
(b) nominal
(c) none of the above
3. The set that consists of all possible values of a random variable is called a
(a) random range
(b) sample space
(c) specific range
(d) none of the above
4. SVD and PCA applied to a matrix can be used to
(a) factor matrices
(b) find linearly independent columns
(c) reduce dimensions
(d) all of the above
(e) none of the above
5. A data scientist must always consider potential sources of bias in a given dataset.
(a) True
(b) False
(c) May be
Page ii
6. Which data formats would be well suited for nested data? Select all that apply.
(a) .csv
(b) .xml
(c) .ipynb
(d) .json
(e) .tsv
7.
Which of the following are reasonable motivations for applying a log transformation? Select
all that apply:
(a) Perform dimensionality reduction on the data.
(b) To help straighten relationships between pairs of variables.
(c) Removing missing values.
(d) Bring data distribution closer to random sampling.
(e) To help visualize highly skewed distributions
8.
The return type of the pandas.DataFrame.groupby function can either be a DataFrame or a
Series object.
(a) TRUE
(b) FALSE
Question 2 - EDA - 12 points
1.
Suppose you are given a data set that contains the stock market performance from Jan 1, 1981
to January 1, 2019. The presidents Reagan (8 yrs), H.W. Bush (4 yrs), Clinton(8 yrs), W.
Bush(8 yrs), Obama(8 yrs), Trump(2 yrs). The performances are given by the following chart.
During EDA, what are some questions one can ask? We are looking for 3 brief, but good
questions/observations.
Page iii
(a) Question 1:
(b) Question 2:
(c) Question 3:
2.
During the data cleaning process, is it always a good idea to remove records that contain
missing values? Briefly Justify your answer.
3. TRUE or FALSE. Exploratory data analysis is the process of testing key hypotheses.
4. TRUE or FALSE. The structure of the data describes how it is formatted and organized.
5.
TRUE or FALSE. Throughout the process of exploratory data analysis it is often necessary to
transform and clean data.
Question 3 - Data Visualization - 12 points
1.
What is the best data type description for home prices in New York city? Circle the answer
and briefly justify.
(a) Nominal
(b) Ordinal
(c) Quantitative
(d) Numerical
2. Justification:
Page iv
3.
Consider the following graph that shows registered male and female names and year that they
were sampled. The graph seem to show some unlikely phenomenon. Assuming data were valid,
briefly explain what might have caused this.
4.
TRUE/FALSE The descriptive statistics of a data set such as mean and variance is a good
metric to understand the distribution of the data. Justify your answer (briefly)
5.
For each of the following cases, choose the ideal plot type from : 1D : Bar chart, Histogram,
2D: Scatter plot, line plot, box-whisker heatmap, 3D: scatter matrix, bubble chart
(a) Plot 10,000 student grades consisting of letters A, B, C, D, F
(b) compare chicken and beef prices from 50 states for 1-year of data (365 data points)
(c) Compare the average, median, max and mean temperature in 3 different counties
(d) Density of traffic in NY city during rush hour.
6.
Consider the following heatmap showing height-weight distribution of Americans. State two
important facts revealed by this chart. Please be brief.
(a)
Page v
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help