Stats Project2

.pdf

School

University of Texas, Dallas *

*We aren’t endorsed by this school

Course

6313

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

8

Uploaded by DoctorPelican3795

Report
CS6313 STATISTICAL METHODS FOR DATA SCIENCE PROJECT 2 Problem A : Study Group Aishwarya Vinod Menon (AXV220062) Trishala Reddy (TXR220017) Dibyanshi Singh (DXS210139) Chapter 8.1 and 8.2 Descriptive Statistics Citation: Statology.org, 2020., StackOverflow,2021. Step 0: Identification and language # Name: Aishwarya Vinod Menon, Trishala Reddy # Language: Python 3.x import numpy as np import pandas as pd Step 1: Load Chapter8.txt (As demonstrated by the professor) def loadData(): data = list() fp = open("Chapter8.txt", "r") text = fp.readline() fp.close() dataset1 = text.split(","); for item in dataset1: data.append(int(item)) return data data=loadData() Step 2. The Population length= len (data) min_data=np.min(data) max_data=np.max(data) mean=np.mean(data) variance = np.var(data, ddof=0) std_dev = np.sqrt(variance) quartile_25=np.percentile(data,25) quartile_75=np.percentile(data,75) interquartile_range=quartile_75-quartile_25 lower_outlier_limit= quartile_25-1.5*interquartile_range upper_outlier_limit=quartile_75+1.5*interquartile_range
outliers = np.where((data < lower_outlier_limit) | (data > upper_outlier_limit)) num_outliers = len(outliers) print("Length:", length) print("Minimum :", min_data) print("Maximum :", max_data) print("Mean:", mean) print("Variance: " ,variance ) print("Standard Deviation :", std_dev) print("25% Quartile:", quartile_25) print("75% Quartile:", quartile_75) print("Interquartile Range (IQR):", interquartile_range) print("Number of Outliers:", num_outliers) Output: Step 2. 1000 Unit Sample: sample= np.random.choice(data,1000) Num_datapoints= len(sample) minimum=min(sample) maximum=max(sample) mean=np.mean(sample) lower_quartile= np.percentile(sample,25) upper_quartile=np.percentile(sample,75) inter_quartile_range=upper_quartile-lower_quartile lower_limit= lower_quartile-1.5*inter_quartile_range upper_limit=upper_quartile+1.5*inter_quartile_range outlier = np.where((sample < lower_limit) | (sample > upper_limit)) outliers_no = len(outlier) print("Number of datapoints:", Num_datapoints) print("Minimum Value:", minimum) print("Maximum Value:",maximum) print("Mean:",mean) print("25% Quartile:", lower_quartile) print("75% Quartile:", upper_quartile) print("The lower Outlier limit:",lower_limit) print("The upper Outlier limit:",upper_limit) print("The number of outliers:",outliers_no)
Output: Step 3. 10000 Unit Sample sample_s= np.random.choice(data,10000) datapoints= len(sample_s) min_s=min(sample_s) max_s=max(sample_s) mean_s=np.mean(sample_s) lower_quartile_s= np.percentile(sample_s,25) upper_quartile_s=np.percentile(sample_s,75) inter_quartile_range_s=upper_quartile_s-lower_quartile_s lower_limit_s= lower_quartile_s-1.5*inter_quartile_range_s upper_limit_s=upper_quartile_s+1.5*inter_quartile_range_s outlier_s = np.where((sample_s < lower_limit_s) | (sample_s > upper_limit_s)) outliers_no_s = len(outlier_s) print("Number of datapoints:", datapoints) print("Minimum Value:", min_s) print("Maximum Value:",max_s) print("Mean:",mean_s) print("25% Quartile:", lower_quartile_s) print("75% Quartile:", upper_quartile_s) print("The lower Outlier limit:",lower_limit_s) print("The upper Outlier limit:",upper_limit_s) print("The number of outliers:",outliers_no_s) Output: Step 4: 100000 Unit Sample sample_se= np.random.choice(data,100000) Datapoints_se= len(sample_se) min_se=min(sample_se)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help