Hi ! Im having trouble with splitting the data to meet the conditions and then converting to a list and would love some help to solve this! This is the question. The datatype of the array created by NumPy in Task 1 is unstructured. This is because, in the default setting, NumPy decides the datatype for each value. Also, the output in Task 1 contains the header row that may not be required in our assignment. So, remove the header row and convert all the columns to type float (i.e., "float") apart from the columns specified by the input parameter indexes (mentioned below). Also, the remaining columns which are not mentioned in indexes should be in Unicode of length 30 characters (i.e., "
Hi ! Im having trouble with splitting the data to meet the conditions and then converting to a list and would love some help to solve this! This is the question.
The datatype of the array created by NumPy in Task 1 is unstructured. This is because, in the default setting, NumPy decides the datatype for each value. Also, the output in Task 1 contains the header row that may not be required in our assignment. So, remove the header row and convert all the columns to type float (i.e., "float") apart from the columns specified by the input parameter indexes (mentioned below). Also, the remaining columns which are not mentioned in indexes should be in Unicode of length 30 characters (i.e., "<U30"). Finally, every row is converted as a type tuple (e.g., tuple(i) for i in data).
I need to produce a function data_type_format(data, indexes) that can complete the above-mentioned task, where, the input data is a NumPy array and indexes contains the column indices (in list) which are to be converted into <U30 data type, and the remaining columns in data which are not in indexes will be converted to type float.
I used this code in task one(the aim was to extract data into nine columns in a numpy array):
import numpy as np
import pandas as pd
def load_mydata(filename):
"return"
# read the file
df =pd.read_csv(filename,delimiter=',',quotechar="",quoting=3,header=None)
# get the desired columns
df = df.iloc[:,[0,1,2,5,8,9,10,11,12]]
# datatype as 30-character string
ndarray = np.array(df,dtype='U30')
return ndarray
The code will be tested with a python shell using the following tests and must give the desired results as shown.
For example:
Test | Result |
---|---|
data = load_mydata("location_review_data.csv") data = data_type_format(data, [0, 1, 2, 3]) print(data[0]) | ('Thu Jan 30 06:58:27 +0000 2020', '98675', '1', '22847', 0.421, 0.442, 0.452, 0.397, 0.357) |
data = load_mydata("location_review_data.csv") data = data_type_format(data, [0, 1, 7]) print(data[5][7].dtype) | <U4 |
data = load_mydata("location_review_data.csv") data = data_type_format(data, [0, 1, 7, 8]) print(data[5].dtype) | [('created_at', '<U30'), ('user_ID', '<U30'), ('review_ID', '<f8'), ('location_ID', '<f8'), ('sad', '<f8'), ('happy', '<f8'), ('surprise', '<f8'), ('disgust', '<U30'), ('joy', '<U30')] |
Thank you!
Subject: Python Programming
Step by step
Solved in 2 steps with 2 images