439 sample midterm

.pdf

School

Rutgers University *

*We aren’t endorsed by this school

Course

439

Subject

Mathematics

Date

Apr 3, 2024

Type

pdf

Pages

9

Uploaded by MasterEnergyCapybara27

Report
Question 1 - Multiple Choice - 16 points This question covers multiple topics. Each question is worth 2 points. 1. Suppose that site locations of 100 sites are given as latitude and longitude of each side. What is the best way to visualize this data? (a) histogram (b) scatter plot (c) bar plot (d) KDE plot 2. Your letter grade (e.g., A+, A, B+. . . ) in a class that grades on a curve is most accurately described as what kind of data? (a) ordinal (b) nominal (c) none of the above 3. The set that consists of all possible values of a random variable is called a (a) random range (b) sample space (c) specific range (d) none of the above 4. SVD and PCA applied to a matrix can be used to (a) factor matrices (b) find linearly independent columns (c) reduce dimensions (d) all of the above (e) none of the above 5. A data scientist must always consider potential sources of bias in a given dataset. (a) True (b) False (c) May be Page ii
6. Which data formats would be well suited for nested data? Select all that apply. (a) .csv (b) .xml (c) .ipynb (d) .json (e) .tsv 7. Which of the following are reasonable motivations for applying a log transformation? Select all that apply: (a) Perform dimensionality reduction on the data. (b) To help straighten relationships between pairs of variables. (c) Removing missing values. (d) Bring data distribution closer to random sampling. (e) To help visualize highly skewed distributions 8. The return type of the pandas.DataFrame.groupby function can either be a DataFrame or a Series object. (a) TRUE (b) FALSE Question 2 - EDA - 12 points 1. Suppose you are given a data set that contains the stock market performance from Jan 1, 1981 to January 1, 2019. The presidents Reagan (8 yrs), H.W. Bush (4 yrs), Clinton(8 yrs), W. Bush(8 yrs), Obama(8 yrs), Trump(2 yrs). The performances are given by the following chart. During EDA, what are some questions one can ask? We are looking for 3 brief, but good questions/observations. Page iii
(a) Question 1: (b) Question 2: (c) Question 3: 2. During the data cleaning process, is it always a good idea to remove records that contain missing values? Briefly Justify your answer. 3. TRUE or FALSE. Exploratory data analysis is the process of testing key hypotheses. 4. TRUE or FALSE. The structure of the data describes how it is formatted and organized. 5. TRUE or FALSE. Throughout the process of exploratory data analysis it is often necessary to transform and clean data. Question 3 - Data Visualization - 12 points 1. What is the best data type description for home prices in New York city? Circle the answer and briefly justify. (a) Nominal (b) Ordinal (c) Quantitative (d) Numerical 2. Justification: Page iv
3. Consider the following graph that shows registered male and female names and year that they were sampled. The graph seem to show some unlikely phenomenon. Assuming data were valid, briefly explain what might have caused this. 4. TRUE/FALSE The descriptive statistics of a data set such as mean and variance is a good metric to understand the distribution of the data. Justify your answer (briefly) 5. For each of the following cases, choose the ideal plot type from : 1D : Bar chart, Histogram, 2D: Scatter plot, line plot, box-whisker heatmap, 3D: scatter matrix, bubble chart (a) Plot 10,000 student grades consisting of letters A, B, C, D, F (b) compare chicken and beef prices from 50 states for 1-year of data (365 data points) (c) Compare the average, median, max and mean temperature in 3 different counties (d) Density of traffic in NY city during rush hour. 6. Consider the following heatmap showing height-weight distribution of Americans. State two important facts revealed by this chart. Please be brief. (a) Page v
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help