lab03_tables

.html

School

Temple University *

*We aren’t endorsed by this school

Course

1013

Subject

Computer Science

Date

Dec 6, 2023

Type

html

Pages

Uploaded by samzahroun

Lab 3: Tables ¶ Welcome to lab 3! This week, we will focus on manipulating tables. We will import our data sets into tables and complete the majority of analysis using these tables. Tables are described in Chapter 6 of the Inferential Thinking text. A related approach in Python programming is to use what is known as a pandas dataframe which we will need to resort to occasionally. Pandas is a mainstay datascience tool. First, set up the tests and imports by running the cell below. In [1]: import numpy as np from datascience import * # Brings into Python the datascience Table object # These lines load the tests. from gofer.ok import check In [2]: # Enter your name as a string # Example dogname = "Fido" # Your name name = "Sam Zahroun" 1. Introduction ¶ For a collection of things in the world, an array is useful for describing a single attribute of each thing. For example, among the collection of US States, an array could describe the land area of each. Tables extend this idea by describing multiple attributes for each element of a collection. In most data science applications, we have data about many entities, but we also have several kinds of data about each entity. For example, in the cell below we have two arrays. The first one contains the world population in each year (estimated by the US Census Bureau), and the second contains the years themselves. These elements are in order, so the year and the world population for that year have the same index in their corresponding arrays. In [3]: population_amounts = Table.read_table("world_population.csv").column("Population") years = np.arange(1950, 2016,1) print("Population column:", population_amounts) print("Years column:", years) Population column: [2557628654 2594939877 2636772306 2682053389 2730228104 2782098943 2835299673 2891349717 2948137248 3000716593 3043001508 3083966929 3140093217 3209827882 3281201306 3350425793 3420677923 3490333715 3562313822 3637159050 3712697742 3790326948 3866568653 3942096442 4016608813 4089083233 4160185010 4232084578 4304105753 4379013942 4451362735 4534410125 4614566561 4695736743 4774569391 4856462699 4940571232 5027200492 5114557167 5201440110 5288955934 5371585922 5456136278 5538268316 5618682132 5699202985 5779440593 5857972543 5935213248 6012074922 6088571383 6165219247 6242016348 6318590956 6395699509 6473044732 6551263534 6629913759 6709049780 6788214394 6866332358 6944055583 7022349283 7101027895 7178722893 7256490011] Years column: [1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964

1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015] Suppose we want to answer this question: When did world population cross 6 billion? You could technically answer this question just from staring at the arrays, but it's a bit convoluted, since you would have to count the position where the population first crossed 6 billion, then find the corresponding element in the years array. In cases like these, it might be easier to put the data into a Table , a 2-dimensional type of dataset. The expression below: • creates an empty table using the expression Table() , • adds two columns to the table by calling with_columns with four arguments (column and data for each), • assigns the result to the name population , and finally • evaluates population so that we can see the table. The strings "Year" and "Population" are column labels that we have chosen. Ther names population_amounts and years were assigned above to two arrays of the same length. The function with_columns (you can find the documentation here ) takes in alternating strings (to represent column labels) and arrays (representing the data in those columns), which are all separated by commas. Tip: Both population_amounts and years need the same number of data points or an error will be returned on attempting to construct the table. In [4]: population = Table().with_columns( "Population", population_amounts, "Year", years ) population Out[4]: Population Year 2557628654 1950 2594939877 1951 2636772306 1952 2682053389 1953 2730228104 1954 2782098943 1955 2835299673 1956 2891349717 1957

Population Year 2948137248 1958 3000716593 1959 ... (56 rows omitted) Now the data are all together in a single table! It's much easier to parse this data--if you need to know what the population was in 1959, for example, you can tell from a single glance. We'll revisit this table later. Question 1 From the example in the cell above, identify the variables or data types for each of the following: which variable contains the table? which variable contains an array? On the right of the equals sign provide the correct variable name. In [5]: table_var = population array_var = years In [6]: check('tests/q1.py') Out[6]: All tests passed! 2. Creating Tables ¶ Question 2 In the cell below, we've created 2 arrays. In these examples, we're going to be looking at the Enviornmental Protection Index which describes the state of sustainability in each country. More information can be found: Yale EPI . Using the steps above, assign top_10_epi to a table that has two columns called "Country" and "Score", which hold top_10_epi_countries and top_10_epi_scores respectively. In [7]: top_10_epi_scores = make_array(82.5, 82.3, 81.5, 81.3, 80., 79.6, 78.9, 78.7, 77.7, 77.2) top_10_epi_countries = make_array( 'Denmark', 'Luxembourg', 'Switzerland', 'United Kingdom', 'France', 'Austria', 'Finland', 'Sweden', 'Norway', 'Germany' ) top_10_epi = Table().with_columns( "Country", top_10_epi_countries, "Score", top_10_epi_scores ) # We've put this next line here so your table will get printed out when you # run this cell. top_10_epi

Out[7]: Country Score Denmark 82.5 Luxembourg 82.3 Switzerland 81.5 United Kingdom 81.3 France 80 Austria 79.6 Finland 78.9 Sweden 78.7 Norway 77.7 Germany 77.2 In [8]: check('tests/q2.py') Out[8]: All tests passed! Loading a table from a file ¶ In most cases, we aren't going to go through the trouble of typing in all the data manually. Instead, we can use our Table functions. Table.read_table takes one argument, a path to a data file (a string) and returns a table. There are many formats for data files, but CSV ("comma-separated values") is the most common. Question 3 The file yale_epi.csv in the current directory contains a table of information about 180 countries with their corresponding Environmental Performance Index (EPI) based on 32 indicators of sustainability. Load it as a table called epi using the Table.read_table function. In [9]: epi = Table.read_table("yale_epi.csv") epi Out[9]: Country Score Decade Change Rank Afghanistan 25.5 5 178 Angola 29.7 5.3 158

Country Score Decade Change Rank Albania 49 10.2 62 United Arab Emirates 55.6 11.3 42 Argentina 52.2 5 54 Armenia 52.3 4.5 53 Antigua and Barbuda 48.5 3.3 63 Australia 74.9 5.5 13 Austria 79.6 5.4 6 Azerbaijan 46.5 4 72 ... (170 rows omitted) In [10]: check('tests/q3.py') Out[10]: All tests passed! Notice the part about "... (170 rows omitted)." This table is big enough that only a few of its rows are displayed, but the others are still there. 10 are shown, so there are 180 movies total. Where did yale_epi.csv come from? Take a look at this lab's folder . You should see a file called yale_epi.csv . Open up the yale_epi.csv file in that folder and look at the format. What do you notice? The .csv filename ending says that this file is in the CSV (comma-separated value) format . 3. Using lists ¶ A list is another Python sequence type, similar to an array. It's different than an array because the values it contains can all have different types. A single list can contain int values, float values, and strings. Elements in a list can even be other lists! A list is created by giving a name to the list of values enclosed in square brackets and separated by commas. For example, values_with_different_types = ['data', 8, 8.1] Lists can be useful when working with tables because they can describe the contents of one row in a table, which often corresponds to a sequence of values with different types. A list of lists can be used to describe multiple rows. Each column in a table is a collection of values with the same type (an array). If you create a table column from a list, it will automatically be converted to an array. A row, on the ther hand, mixes types. Here's a table from Chapter 5. (Run the cell below.) In [11]: # Run this cell to recreate the table flowers = Table().with_columns( 'Number of petals', make_array(8, 34, 5),

'Name', make_array('lotus', 'sunflower', 'rose') ) flowers Out[11]: Number of petals Name 8 lotus 34 sunflower 5 rose Question 4 Create a list that describes a new fourth row of this table. The details can be whatever you want, but the list must contain two values: the number of petals (an int value) and the name of the flower (a string). For example, your flower could be "pondweed"! (A flower with zero petals) In [12]: my_flower = [0, "pondweed"] my_flower Out[12]: [0, 'pondweed'] In [13]: check('tests/q4.py') Out[13]: All tests passed! Question 5 my_flower fits right in to the table from chapter 5. Complete the cell below to create a table of seven flowers that includes your flower as the fourth row followed by other_flowers . You can use with_row to create a new table with one extra row by passing a list of values and with_rows to create a table with multiple extra rows by passing a list of lists of values. In [14]: # Use the method .with_row(...) to create a new table that includes my_flower four_flowers = flowers.with_row(my_flower) # Use the method .with_rows(...) to create a table that # includes four_flowers followed by other_flowers other_flowers = [[10, 'lavender'], [3, 'birds of paradise'], [6, 'tulip']] seven_flowers = four_flowers.with_rows(other_flowers) seven_flowers Out[14]: Number of petals Name 8 lotus

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version