1_senator_pca - Jupyter Notebook

.pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

54

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

6

Uploaded by DeanWombat2487

Report
PCA and senate voting data In this problem, we are given the data matrix with entries in , where each row corresponds to a senator and each column to a bill. We first import this data, print some relevant values, and normalize it as necessary to ready it for further computation. To run this code, you'll need a number of standard Python libraries, all of which are installable via or . We highly recommend using a virtual environment (https://realpython.com/python-virtual-environments-a-primer/) , for this class and in general. Lastly, ensure that all data files ( senator_pca_data_matrix.csv and senator_pca_politician_labels.txt ) are located in the same folder as the notebook. Places you will need to modify this code are enclosed in a block. You should not need to modify code outside these blocks to complete the problems. Questions that you are expected to answer in text are marked in red . For solution files, solutions will be presented in blue . # In [1]: In [2]: We observe that the number of rows, , is the number of senators and is equal to 100. The number of columns, , is the number of bills and is equal to 542. X.shape: (100, 542) # import the necessary packages for data manipulation, computation and PCA import pandas as pd import numpy as np import scipy as sp from numpy import linalg as LA import matplotlib.pyplot as plt from sklearn.decomposition import PCA % matplotlib inline np.random.seed( 7 ) # import the data matrix senator_df = pd.read_csv( 'senator_pca_data_matrix.csv' ) affiliation_file = open ( 'senator_pca_politician_labels.txt' , 'r' ) affiliations = [line.split( '\n' )[ 0 ].split( ' ' )[ 1 ] for line in affiliation_file X = np.array(senator_df.values[:, 3 :].T, dtype = 'float64' ) # transpose to get s print ( 'X.shape: ' , X.shape) n = X.shape[ 0 ] # number of senators d = X.shape[ 1 ] # number of bills # this is just used for plotting, feel free to ignore assert set (affiliations) == { "Red" , "Blue" , "Yellow" } # assign a marker and hatch to each affiliation markers = [( "Red" , "o" , "/" ), ( "Blue" , "^" , "-" ), ( "Yellow" , "D" , "+" )]
In [3]: A row of consists of 542 entries -1 (senator voted against), 1 (senator voted for), or 0 (senator abstained), one for each bill. In [4]: (542,) [ 1. 1. 1. -1. -1. 1. 1. 1. 1. -1. 1. -1. -1. 1. 1. -1. 1. 1. 1. 1. 1. -1. 1. 1. 1. -1. 1. -1. 1. 1. 1. 1. 1. -1. 1. -1. -1. -1. -1. 1. 1. -1. -1. -1. -1. 1. 1. 1. -1. 1. 1. -1. 1. 1. -1. 1. 1. 1. 1. -1. 1. -1. -1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. 0. -1. 1. 1. 1. -1. -1. 1. 1. -1. -1. 1. 1. 1. -1. 1. -1. 1. -1. 1. 1. -1. -1. -1. 1. 1. 1. -1. -1. -1. -1. -1. -1. 1. -1. 1. 1. -1. -1. -1. 1. -1. 1. -1. 1. 0. 0. 1. 1. -1. 1. 1. -1. 1. 1. -1. 1. -1. -1. 1. 1. 1. 1. 0. -1. -1. 1. 1. -1. 1. 1. 1. 1. 1. 0. 1. 0. 1. 1. 1. 1. 1. 1. 1. -1. 1. 1. -1. 1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. -1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. 1. 1. 1. 1. 1. 1. 1. 1. -1. 1. 1. 0. 1. 0. -1. 1. 1. 1. 1. 1. 1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. 1. 1. 1. -1. 1. 1. 1. 1. 1. 1. -1. -1. -1. 1. 1. -1. 1. -1. -1. 1. 1. 1. -1. 1. 1. 1. -1. 1. -1. 1. -1. -1. 1. -1. -1. 1. 1. 1. -1. 1. 1. 1. 1. 1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. -1. 1. -1. 1. -1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. 1. 1. 1. 1. 1. -1. 1. -1. 1. 1. 1. 1. 1. 1. -1. 1. -1. -1. -1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. 1. 1. 1. 1. 1. 1. 1. -1. -1. 1. -1. 1. 1. 1. 1. 1. -1. 1. -1. 1. -1. 1. 1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. -1. 1. 1. 1. 1. 1. -1. -1. -1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. -1. -1. 0. 0. 0. 0. 0. 0. 0. 1. 1. -1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. 1. -1. 1. 1. 1. 1. -1. -1. 1. 1. 1. 1. -1. -1. 1. 1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. 1. 1. 1. -1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. -1. -1. 1. 1. 1. 1. -1. -1. 1. 1. -1. 1. 1. 1. 1. 1. 1.] (100,) [ 1. 1. 1. 1. 1. 1. 1. -1. 1. -1. 1. -1. 1. -1. -1. -1. 1. 1. -1. 1. 1. -1. 1. -1. 1. 1. 1. -1. -1. 1. 1. 1. -1. 1. 1. 1. -1. -1. -1. -1. 1. -1. -1. 1. 1. -1. -1. -1. -1. -1. 1. 1. -1. -1. 1. 1. -1. -1. -1. -1. -1. 1. 1. 1. 1. 1. -1. -1. -1. 1. -1. -1. 1. -1. -1. 1. 1. 1. -1. -1. -1. 1. 1. -1. 1. -1. 1. 1. 1. -1. -1. -1. -1. -1. 1. 1. 1. -1. -1. -1.] # print an example row of the data matrix typical_row = X[ 0 ] print (typical_row.shape) print (typical_row) # print an example column of the data matrix typical_column = X[:, 0 ] print (typical_column.shape) print (typical_column)
A column of consists of 100 entries in {-1, 0, 1}, one for each senator that voted on the bill. In [5]: We observe that the mean of the columns is not zero, so we center the data by subtracting the mean of each bill's vote from its respective column. In [6]: a) Maximizing In this problem, you are asked to find a unit-norm vector maximizing the empirical variance . We first provide a function to calculate the scores, . # compute the mean vote on each bill X_mean = np.mean(X, axis = 0 ) plt.plot(X_mean) plt.title( 'means of each column of X' ) plt.xlabel( 'column/bill' ) plt.ylabel( 'mean vote' ) plt.show() # center the data matrix X_original = X.copy() # save a copy for part (d) and (e) X = X - np.mean(X, axis = 0 )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help