Skip to main content

Documents Computer Science

Homework_Assignment_1 - 5.75.docx

Homework_Assignment_1 - 5

.docx

School

Pennsylvania State University *

*We aren’t endorsed by this school

Course

200

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

9

Uploaded by GeneralSummer13484

Homework Assignment 1 DS200: Introduction to Data Sciences 2022 fall Please complete this assignment by entering your answers in this document. You can submit this Word document or a PDF export on Canvas. Problem 1: Sampling [0.5 points] Suppose that you work at a hospital, and you have to recruit participants for a medical study to test a new heart disease medication. Match the three examples below to the three sampling approaches. Examples : 1. You try to recruit the 100 patients with the highest blood pressure. 2. You order all patients by age and try to recruit every 500 th patient, starting with a randomly chosen patient from the first 500. 3. You store patient identifiers in an array called patients , apply numpy.random. choice (patients, 100) , and try to recruit the patients returned by this function. Sampling approaches :  deterministic  systematic random  simple random Answer: 1: deterministic sampling 2: systematic random sampling 3: simple random sampling

Problem 2: Distribution [1 point] Consider the following distribution of values: Identify the following:  median  outlier  1 st quartile  95% percentile Answer: a: 1 st quartile b: median c: 95% percentile d: outlier

Problem 3: Association [1 point] Consider the following three scatter plots: What kind of association can you observe between X and Y in each figure? Explain your answer. Hint: Possible kinds of association are:  positive association  negative association  no association Answer: First figure: negative association Second figure: positive association Third figure: no association

Problem 4: Causality [1 point] Suppose that a positive association was observed between the following three variables:  number of tooth cavities,  ounces of sugary drinks consumed,  weight in pounds. Which of these three variables might be a confounding variable? What spurious conclusion may it cause? Motivate your answer. Answer: Though the correlation between the number of tooth cavities and the weight in pounds may be positive, that does not mean that higher weight makes a greater number of cavities or that a greater number of tooth cavities makes higher. Ounces of sugary drinks consumed is a confound ing variable: drinking more ounces of sugary drinks makes both heavier and a greater number of teeth, on average. Even if there is a correlation for the two groups (number of cavities and weight) that do not meet the conditions of the confounding variable, it is considered highly likely to be a coincidence.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Questions

LINKS : https://youtu.be/rR_mmsfIzzs Input for games - UWP applications | Microsoft Learn

Python Coding Question Use the sacramento.csv file to complete the following assignment. Create a file, sacramento.py, that loads the .csv file and runs a logistic regression. The regression should predict whether or not a house has 1 or more than one bathroom based on beds, sqft, and price, in that order. Note: you will not need to upload the .csv to CodeGrade because I have pre-loaded it. You will need to create a new variable from baths, and it should make it such that those observations of 1 bath correspond to a value of 0, and those with more than 1 bath correspond to a 1. Make sure to add a constant using sm.add_constant(X) Your file should print the results in this way: print(mod.params.round(2)) print(mod.pvalues.round(2)) print('The smallest p-value is for sqft')

In the attached file, you will find the oil production for all countries that produce more than 1Mbpd, use pie chart and bar chart to show the percentage of production for each of them. country bpd United States 11567000 Russia 10503000 Saudi Arabia 10225000 Canada 4656000 Iraq 4260000 China 3969000 United Arab Emirates 2954000 Brazil 2852000 Kuwait 2610000 Iran 2546000 Kazakhstan 1937000 Norway 1744000 Mexico 1733000 Qatar 1297000 Nigeria Libya Angola 1258000 1220000 1158000

TREE PROJECT There is a real program developed by a computer company that reads a report (running text ) and issues warnings on style and partially correct bad style. You are to write a simplified version of this program with the following features: Statistics A statistical summary with the following information: Total number of words in the report Number of unique words Number of unique words of more than three letters Average word length Average sentence length An index (alphabetical listing) of all the unique words (see next page for a specific format) Style Warnings Decrease Font Size Issue a warning in the following cases: Word used too often: list each unique word of more than three letters if its usage is more than 5% of the total number of words of more than three letters Sentence length : write a warning message if the average sentence length is greater than 10 Word length : write a warning message if the average word length is greater than 5 Input From the keyboard: The name…

Please answer in matlab code. Download the data file AtlanticHurricanes20012020.csv, read in Matlab,and assign to the array hurrData:hurrData = readmatrix('AtlanticHurricanes20012020.csv'); Create a histogram plot showing the number of Hurricanes per year Label the x-axis Number of Hurricanes/year Label the y-axis Frequency Title the plot Hurricane Frequency Distribution 2001-2020 Save the figure as an emf file Create a bar plot showing annual hurricaines occurence Set the x = to the year; y = number of hurricanes Label the x-axis Year Label the y-axis Number of Hurricanes Title the plot Annual Hurricane Occurrence 2001-2020 Save the figure as an emf file. Create a line plot showing annual hurricaines occurence Set the x = to the year; y = number of hurricanes. The curve should be a red line with square symbols. Label the x-axis Year Label the y-axis Number of Hurricanes Title the plot Annual Hurricane Occurrence 2001-2020 Save the figure as an emf file. Plot the histogram, the bar…

Create a new workbook as shown below and save the file with the name "Call Statistics". 1 Panda EST Monthly Sales Report - July 2 3 Sales Amount 1600 1800 Total Salary 4 Emp. No. Name 5 S101 6 S105 7 S112 8 s107 9 S110 Salary Comission 2500 ? 3000 Ahmed Hassan Ali 1500 2200 Waleed Mohammed Samir 4500 3500 2000 1700 10 s103 1600 2500 11 Totals Average Highest Lowest Count 12 13 12 14 15 16 a) Create the worksheet shown above. b) Set the column widths as follows: Column A: 8, Column B: 14, Columns C & D: 15, Columns E & F: 14. c) Enter the formula to find COMMISSION for the first employee. The commission rate is 2% of sales, COMMISSION = SALES * 2% Copy the formula to the remaining employees. d) Enter the formula to find TOTAL SALARY for the first employee where: TOTAL SALARY = SALARY + COMMISSION Copy the formula to the remaining employees. e) Enter formula to find TOTALS, AVERAGE, HIGHEST, LOWEST, and COUNT values. Copy the formula to each column. f) Format numeric data to include…

Horizontal sequence :VIRL Vertical sequence:MKF Scoring rules: g/o = -3, g/e = -1, match or mismatch - from PAM250 substitution matrix below. SW algorithm. 1. Complete the scoring matrix. Scoring matrix with PAM250 scores: V I R L M K F 2. Set up, initialize and complete the SW matrix. 3. Retrace, align and score alignment(s). Use the arrows and circles for the matrix and path(s). V I R L M K F Align and score all optimal alignments here. PLZ the arrows and circles for the matrix and path(s) AND SHOW ALL possible Alignment

Horizontal sequence :VIRL Vertical sequence:MKF Scoring rules: g/o = -3, g/e = -1, match or mismatch - from PAM250 substitution matrix below. NW algorithm. 1. Complete the scoring matrix. Scoring matrix with PAM250 scores: V I R L M K F 2. Set up, initialize and complete the NW matrix. 3. Retrace, align and score alignment(s). Use the arrows and circles for the matrix and path(s). V I R L M K F Align and score all optimal alignments here. PLZ the arrows and circles for the matrix and path(s) AND SHOW ALL possible Alignment

Horizontal sequence :VIRL Vertical sequence:MKF Scoring rules: g/o = -3, g/e = -1, match or mismatch - from PAM250 substitution matrix below. NW algorithm. 1. Complete the scoring matrix. Scoring matrix with PAM250 scores: V I R L M K F 2. Set up, initialize and complete the NW matrix. 3. Retrace, align and score alignment(s). Use the arrows and circles for the matrix and path(s). V I R L M K F Align and score all optimal alignments here.

Can somebody help me with my homework? I provided screenshots of my code that goes alongside the question. *Side note comments would be helpful, but not required.

Horizontal sequence :RIVL Vertical sequence:FMK Scoring rules: g/o = -3, g/e = -1, match or mismatch - from PAM250 substitution matrix below. SW algorithm. 1. Complete the scoring matrix. Scoring matrix with PAM250 scores: R I V L F M K 2. Set up, initialize and complete the SW matrix. 3. Retrace, align and score alignment(s). Use the arrows and circles for the matrix and path(s). R I V L F M K Align and score all optimal alignments here. PLZ the arrows and circles for the matrix and path(s) AND SHOW ALL possible Alignment

Open the Excel file Student_Excel_Intro_Cap1_Year_End_Report.xlsx downloaded with this project. On the Net Sales worksheet, calculate totals in the ranges F4:F8 and B9:F9. Apply the Total cell style to the range B9:F9. Using absolute cell references as necessary, in cell G4, construct a formula to calculate the percent that the Colorado Total is of Total Sales, and then apply Percent Style with zero decimals. Fill the formula down through the range G5:G8. In the range H4:H8, insert Line sparklines to represent the trend of each state across the four quarters. Do not include the totals. Display Markers. Select the range A3:E8, and then use the Recommended Charts command to suggest an appropriate chart. Click the first Clustered Column chart that uses the state names as the category axis. Align the upper left corner of the chart inside the upper left corner of cell A11, and then size the chart so that its lower right corner is slightly inside cell H24. Apply chart Style…

This exercise allows a user to load one of two CSV files and then perform histogram analysis and plots for select variables on the datasets. The first dataset represents the population change for specific dates for U.S. regions. The second dataset represents Housing data over an extended period of time describing home age, number of bedrooms and other variables. The first row provides a column name for each dataset. The following columns should be used to perform analysis:  PopChange.csv:  Pop Apr 1  Pop Jul 1  Change Pop Housing.csv:  AGE  BEDRMS  BUILT  ROOMS  UTILITY Notice for the Housing CSV file, there are more columns in the file than are required to be analyzed. You can and should still load each column. Specific statistics should include:  Count  Mean  Standard Deviation  Min  Max  Histogram A user interface might look similar to this: ***************** Welcome to the Python Data Analysis App********** Select the file you want to analyze: 1. Population Data 2.…

So matalab please. Defined

Follow these instructions:● Create a python program called taskXML.py. Write the code to:○ Read in the movie.xml file.○ Read about the iter() and itertext() function here. Use the iter()function to list all the child tags of the movie element.○ Use the itertext() function to print out the movie descriptions.○ Find the number of movies that are favourites and the number ofmovies that are not.

In Java Language Write Code below Image

sacramento.pyUse the sacramento.csv file to complete the following assignment. Create a file, sacramento.py, that loads the .csv file and runs a logistic regression. The regression should predict whether or not a house has 1 or more than one bathroom based on beds, sqft, and price, in that order. Note: you will not need to upload the .csv to CodeGrade because I have pre-loaded it.You will need to create a new variable from baths, and it should make it such that those observations of 1 bath correspond to a value of 0, and those with more than 1 bath correspond to a 1.Make sure to add a constant using sm.add_constant(X)Your file should print the results in this way: print(mod.params.round(2))print(mod.pvalues.round(2))print('The smallest p-value is for sqft') sacramento.csv…

please code in python You work in XYZ Company as a Python. The company officials want you to write code for reducing the dimensions of a dataset Tasks to be performed: - Using load_digits function from sklearn import wines data - Take a look at the shape of image data - Import PCA, LDA and FactorAnalysis from Sklearn - Project data in 2 D space using the PCA, LDA and FactorAnalysis algorithm form sklearn - Take a look at the new data

5 Import tips.csv. This dataset has a column named sex. Write a function named recode gender that has one parameter (gender) and will recode Male to 0 and Female to 1, and will return np. nan if the value is neither Male nor Female. Apply this function to the column sex of tips using apply (). Print the first five lines of the new dataframe. Code and Output

Models are used for a variety of purposes. Sort the models into groups.

Json document: P = { "p": [ { "logType": “PDF, "accountId": “xxxx” }, { "logType": “PDF”, "accountId": “xxxx” }, { "logType": “PDF”, "accountId": “xxxx” }, { "logType": “PDF”, "accountId": “xxxx” }, { "logType": “PDF”, "accountId": “xxxx” } ] } Then, I want to count the total of logType and accountId like this in python { "logType": 5, "accountId": 5 }

def print_table( values: tuple[float, ) -> None: ], drag_coeff: float, increments: int, step: float 3 4 Parameters: 6 values (tuple[float, ...]): mass, force, ref_area, density, init_velocity, lift_velocity, start_position, time_inc drag_coeff (float): The drag coefficient. increments (int): The number of drag coefficients displayed. step (float): The difference between each drag coefficient. 8 9 10 11 12 Returns: 13 None 14 II II 15 For this function you need to compute the distance before lift-off for a range of drag coefficients and then you need to print these results in a table. The drag coefficient of an aeroplane has a significant impact on the plane's ability to lift-off. If the drag coefficient is sufficiently high, the plane will not actually be able to generate enough speed to lift off. In this task you will write a function which will explore this phenomenon.

Using a random number generator, create a list of 500 integers. Perform a benchmark analysis using some of the sorting algorithms from this module. What is the difference in execution speed between the different sorting algorithms? In your paper, be sure to provide a brief discussion of the sorting algorithms used in this activity. Your paper should be 2-3 pages in length (not including title and references pages) and conform to APA guidelines

1. Read a give “data.csv” file, analyze the data, write the analysis result to“report.txt” file :in the report.txt file: include information of:1). How many rows in this dataset, for example: “This dataset has 10 rows”2). How many columns in this dataset, for example:”This dataset has 3 col-umns.”3). What are the name for the columns, print the all the column names, forexample, “The 3 columns are: name,age,gpa”4). How many numeric column(s), for example, “ This dataset has 2 numericcolumns, they are age, and gpa”5). The mean (avarage) of each column, for example, “The means are:mean1, mean2” python

SEE MORE QUESTIONS

Recommended textbooks for you

Text book image

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

SEE MORE TEXTBOOKS

Related Questions

SEE MORE QUESTIONS

Recommended textbooks for you

Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage

Text book image

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

SEE MORE TEXTBOOKS