problem14
.pdf
keyboard_arrow_up
School
Georgia Institute Of Technology *
*We aren’t endorsed by this school
Course
CS6040
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
15
Uploaded by ChefStraw5566
11/28/23, 8:13 PM
problem14
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem14-sample-solutions.html
1/15
Problem 14: Scraping data from "FiveThirtyEight"
There are a ton of fun interactive visualizations at the website,
FiveThirtyEight
(http://fivethirtyeight.com). For
example, consider the one that tracks the US President's approval ratings:
https://projects.fivethirtyeight.com/trump-approval-ratings/
(https://projects.fivethirtyeight.com/trump-approval-
ratings/)
Here is a screenshot of the interactive graph it contains:
In it, you can select each day ("movable cursor") and get information about the approval ratings for that day.
11/28/23, 8:13 PM
problem14
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem14-sample-solutions.html
2/15
As it turns out, this visualization is implemented in JavaScript and all of the individual data items are embedded
within the web page itself. For example, here is a 132-page PDF file, which is the source code for the web page
taken on September 6, 2018:
PDF file
(https://cse6040.gatech.edu/datasets/538-djt-pop/2018-09-06.pdf). The
raw data being rendered in the visualization starts on page 50.
Of course, that means you can use your Python-fu to try to extract this data for your own purposes! Indeed, that
is your task for this problem.
Although the data in this problem comes from an HTML file with embedded JavaScript, you do
not
need to know anything about HTML or JavaScript to solve this problem. It is purely an
exercise of rudimentary Python and computational problem solving.
Reading the raw HTML file
Let's read the raw contents of the FiveThirtyEight approval ratings page (i.e., the same contents as the PDF) into
a variable named
raw_html
.
Like the groceries problem in Notebook 2, this cell contains a bunch of code for getting the data
file you need, which you can ignore.
11/28/23, 8:13 PM
problem14
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem14-sample-solutions.html
3/15
In [1]:
def
download(url, local_file, overwrite=
False
):
import
os
,
requests
if
not
os.path.exists(local_file)
or
overwrite:
print("Downloading:
{}
...".format(url))
r = requests.get(url)
with
open(local_file, 'wb')
as
f:
f.write(r.content)
return
True
return
False
# File existed already
def
get_checksum(local_file):
import
io
,
hashlib
with
io.open(local_file, 'rb')
as
f:
body = f.read()
body_checksum = hashlib.md5(body).hexdigest()
return
body_checksum
def
download_or_load_locally(file, local_dir="", url_base=
None
, checks
um=
None
):
if
url_base
is
None
: url_base = "https://cse6040.gatech.edu/datase
ts/"
local_file = "
{}{}
".format(local_dir, file)
remote_url = "
{}{}
".format(url_base, file)
download(remote_url, local_file)
if
checksum
is
not
None
:
body_checksum = get_checksum(local_file)
assert
body_checksum == checksum, \
"Downloaded file '
{}
' has incorrect checksum: '
{}
' instead
of '
{}
'".format(local_file,
body_checksum,
checksum)
print("'
{}
' is ready!".format(file))
def
on_vocareum():
import
os
return
os.path.exists('.voc')
if
on_vocareum():
URL_BASE =
None
DATA_PATH = "./resource/asnlib/publicdata/538-djt-pop/"
else
:
URL_BASE = "https://cse6040.gatech.edu/datasets/538-djt-pop/"
DATA_PATH = ""
datasets = {'2018-09-06.html': '291a7c1cbf15575a48b0be8d77b7a1d6'}
for
filename, checksum
in
datasets.items():
download_or_load_locally(filename, url_base=URL_BASE, local_dir=DA
TA_PATH, checksum=checksum)
with
open('
{}{}
'.format(DATA_PATH, '2018-09-06.html'))
as
fp:
raw_html = fp.read()
print("
\n
(All data appears to be ready.)")
11/28/23, 8:13 PM
problem14
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem14-sample-solutions.html
4/15
File snippets.
Run the following code cell. It takes the
raw_html
string and prints the substring just around the
start of the raw data you'll need, i.e., starting at page 50 of the PDF:
In [2]:
sample_offset, sample_len = 69950, 1500
print(raw_html[sample_offset:sample_offset+sample_len])
Run the following code cell to see the end of the raw data region.
'2018-09-06.html' is ready!
(All data appears to be ready.)
thPrefix="/trump-approval-ratings/";
var subgroup="All polls";
var showMoreCutoff=5;
var approval=[{"date":"2017-01-23","future":false,"subgroup":"All poll
s","approve_estimate":"45.46693","approve_hi":"50.88971","approve_l
o":"40.04416","disapprove_estimate":"41.26452","disapprove_hi":"46.687
29","disapprove_lo":"35.84175"},{"date":"2017-01-24","future":false,"s
ubgroup":"All polls","approve_estimate":"45.44264","approve_hi":"50.82
922","approve_lo":"40.05606","disapprove_estimate":"41.87849","disappr
ove_hi":"47.26508","disapprove_lo":"36.49191"},{"date":"2017-01-25","f
uture":false,"subgroup":"All polls","approve_estimate":"47.76497","app
rove_hi":"52.66397","approve_lo":"42.86596","disapprove_estimate":"42.
52911","disapprove_hi":"47.42811","disapprove_lo":"37.63011"},{"dat
e":"2017-01-26","future":false,"subgroup":"All polls","approve_estimat
e":"44.37598","approve_hi":"48.93261","approve_lo":"39.81936","disappr
ove_estimate":"41.06081","disapprove_hi":"45.61743","disapprove_lo":"3
6.50418"},{"date":"2017-01-27","future":false,"subgroup":"All poll
s","approve_estimate":"44.13586","approve_hi":"48.70494","approve_l
o":"39.56679","disapprove_estimate":"41.67268","disapprove_hi":"46.241
75","disapprove_lo":"37.1036"},{"date":"2017-01-28","future":false,"su
bgroup":"All polls","approve_estimate":"43.87527","approve_hi":"48.468
21","approve_lo":"39.28233","disapprove_estimate":"41.91362","disappro
ve_hi":"46.50656","disapprove_lo":"37.32067"},{"date":"2017-01-29","fu
ture":false,"subgroup":"All
11/28/23, 8:13 PM
problem14
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem14-sample-solutions.html
5/15
In [3]:
sample_end = 257500
print(raw_html[sample_end:sample_end+sample_len])
Please make the following observations about the file snippets shown above:
The raw data of approval ratings begins with the text,
'var approval=['
and ends with a closing
square bracket,
']'
. No other square brackets appear between these two.
Each "data point" or "data record" is encoded in JavaScript Object Notation (JSON), which is
essentially the same as a Python dictionary. That is, it is enclosed in curly brackets,
{...}
and
contains a number of key-value pairs. These include the date (
"date":"yyyy-mm-dd"
), approval and
disapproval rating estimates (
"approve_estimate":"45.46693"
and
"disapprove_estimate":"41.26452"
), as well as upper and lower error bounds (
"..._hi"
and
"..._lo"
). The estimates correspond to the green (approval) and orange (disapproval) lines, and the
error bounds form the shaded regions around those lines.
Each data record includes a key named
"future"
. That's because FiveThirtyEight has projected the
ratings into the future, so some records correspond to observed values (
"future":false
) while
others correspond to extrapolated values (
"future":true
).
In addition, for the exercises below, you may assume the data records are encoded in the same way, e.g., the
fields appear in the same order and there are no variations in punctuation or whitespace from what you see in
the above snippets.
","approve_lo":"29.24131","disapprove_estimate":"51.94407","disapprove
_hi":"63.94288","disapprove_lo":"39.94526"},{"date":"2019-05-10","futu
re":true,"subgroup":"All polls","approve_estimate":"41.47093","approve
_hi":"53.72246","approve_lo":"29.2194","disapprove_estimate":"51.9422
5","disapprove_hi":"63.96438","disapprove_lo":"39.92012"},{"date":"201
9-05-11","future":true,"subgroup":"All polls","approve_estimate":"41.4
719","approve_hi":"53.74633","approve_lo":"29.19748","disapprove_estim
ate":"51.94044","disapprove_hi":"63.98589","disapprove_lo":"39.895"},
{"date":"2019-05-12","future":true,"subgroup":"All polls","approve_est
imate":"41.47285","approve_hi":"53.77016","approve_lo":"29.17555","dis
approve_estimate":"51.93866","disapprove_hi":"64.0074","disapprove_l
o":"39.86993"},{"date":"2019-05-13","future":true,"subgroup":"All poll
s","approve_estimate":"41.47378","approve_hi":"53.79396","approve_l
o":"29.15361","disapprove_estimate":"51.9369","disapprove_hi":"64.0289
2","disapprove_lo":"39.84487"},{"date":"2019-05-14","future":true,"sub
group":"All polls","approve_estimate":"41.47469","approve_hi":"53.8177
3","approve_lo":"29.13165","disapprove_estimate":"51.93515","disapprov
e_hi":"64.05045","disapprove_lo":"39.81984"}];
</script>
<div class="container">
<div id="footer">
<div class="notes">
<p>
When the dates of tracking polls from the same pollster overlap,
only the most recent version is shown.
</p>
</div>
<div class="additional-credits">
<p>
11/28/23, 8:13 PM
problem14
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem14-sample-solutions.html
6/15
Your task: Extracting the approval ratings
Exercise 0
(1 point). Recall that the data begins with
'var approval=[...'
and ends with a closing square
bracket,
']'
. Complete the function,
extract_approval_raw(html)
, below. The input variable,
html
, is a
string corresponding to the raw HTML file. Your function should return the substring beginning immediately
after
the opening square bracket and up to, but
excluding
, the last square bracket. It should return exactly that
substring from the file, and should not otherwise modify it.
While you don't have to use regular expressions for this problem, if you wish to, observe that the
cell below imports the
re
module.
In [4]:
import
re
def
extract_approval_raw(html):
assert
isinstance(html, str), "`html` is not a string."
### BEGIN SOLUTION
match = re.search(r'var\s+approval\s*=\s*\[([^\]]*)\];', html)
if
match:
return
match.groups(0)[0]
return
''
### END SOLUTION
raw_data = extract_approval_raw(raw_html)
print("type(raw_data) ==
{}
(should be a string!)
\n
".format(type(raw
_data)))
print("=== First and last 300 characters ===
\n{}\n
...
\n{}
".forma
t(raw_data[:300], raw_data[-300:]))
type(raw_data) == <class 'str'>
(should be a string!)
=== First and last 300 characters ===
{"date":"2017-01-23","future":false,"subgroup":"All polls","approve_es
timate":"45.46693","approve_hi":"50.88971","approve_lo":"40.04416","di
sapprove_estimate":"41.26452","disapprove_hi":"46.68729","disapprove_l
o":"35.84175"},{"date":"2017-01-24","future":false,"subgroup":"All pol
ls","approve_estimat
...
e_estimate":"51.9369","disapprove_hi":"64.02892","disapprove_lo":"39.8
4487"},{"date":"2019-05-14","future":true,"subgroup":"All polls","appr
ove_estimate":"41.47469","approve_hi":"53.81773","approve_lo":"29.1316
5","disapprove_estimate":"51.93515","disapprove_hi":"64.05045","disapp
rove_lo":"39.81984"}
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
CIS 261 Data Structures
Programming Project: Graphical "Shut the Box" Game
Create a graphical version of the classic "Shut the Box" game using JavaFX as introduced in our text.
Short Video of rules overview:
Board Game Rules: How to Play Shut the Box Game
"Easy" introduction to Shut the Box (and some elementary Math applications):
https://www.gamesforyoungminds.com/blog/2017/11/30/shut-the-box
Your game needs to include:
• Graphic representation of the "Box" and the dice
Roll the dice
Allow player to select which number(s) to "shut" and determine if that move is valid.
• Use GUI for selecting which tiles (numbers) to shut
• Graphically display status of tiles - open or shut
Continue until the player "shuts the box" (closes all numbers) or can't play (no valid moves left).
• Display the players score at the end - "You Win" or total of remaining numbers.
Bonus for including:
Multiple players (2-4). Decide how many "rounds" will be played. Lowest total score Wins
• HINT feature -…
arrow_forward
Taken from chegg, please provide orginal non plagarised work.
https://www.chegg.com/homework-help/questions-and-answers/hi-m-struggling-please-provide-pswedocode-java-comment-thanks-q72540970
arrow_forward
Give an example of something you can use a page-word index for that you can't do with a page index (no word positions), such as the one below. Word Index a 2, 3 according 2 and 2 be 2 blue 3 …
(please type answer not write by hend)
arrow_forward
Use python to write a code that gives you this graph.
arrow_forward
yWl öyiäll - wioilI jbisJl.docx [Compatibility Mode] - Word
Sign in
Insert
Design
Mailings
Help
O Tell me what you want to do
A Share
File
Home
Layout
References
Review
View
Q2) Implement the following method:
Drawing the polyline with the given points using the method drawPolygon
void draw2 (int x []
int y [], Graphics g ){
// Your code
}
Page 2 of 4
317 words
English (United States)
160%
5:41 PM
P Type here to search
A d))
G ENG
04/07/2021
arrow_forward
public void consolidate(Block block) // when the dropping block has reached its final location,//this method will consolidate it into the tetris well -- O(block_size)
public void clearRows() // clear any/all rows that are complete and shifts the above tiles down -- O(board_size)
public void reward() // applies the reward as explained in the project description -- O(board_size)
public void penalize() // applies the penalty as explained in the project description -- O(board_size)help me please
arrow_forward
#create a duplicate of the image
for y in range (image.getHeight()-1):
for x in range (1, image.getWidth () -1):
#Row major traversal of each pixel
#display in the image
oldPixel=image.getPixel(x,y)
#Get the pixel present at current place
leftPixel=image.getPixel(x-1,y)
#Get left pixel.
bottomPixel=image.getPixel (x, y+1)
#Get bottom pixel
oldLum-average (oldPixel)
#calcule luminance of current pixel
leftLum-average (leftPixel)
#calculate luminance of left pixel
bottomLum=average (bottomPixel)
#calculate luminance of right pixel
if abs (oldLum-leftLum) >threshold or\
abs (oldLum-bottomLum) >threshold:
new.setPixel (x,y, sharpen (old, degree))
return new
#3main function
def main ():
#Input image
image=Image ("testImage.gif")
image.draw ()
#Call the method, edge detection
out=edge_detection (image, 80,10)
out.draw ()
if
main ()
name
main
":
==
arrow_forward
Assuming that you have a non-empty SQUARE matrix B of size n (assume n is ODD and n > 2 ; assume n is a variable present in Workspace) available in your Workspace, which Matlab command will not "extract" the center column of B?
B(:,floor(n/2))
B(:,ceil(end/2))
B(1:end,ceil(mean(1:n)))
B(:,ceil(length(B)/2))
arrow_forward
explain each line
arrow_forward
DO NOT COPY FROM OTHER WEBSITES
Write your own answer. Thank you!
arrow_forward
As usual, make a complete page. GIVE ALL CODES. i will upload it after in my server. thank you i will upvote.
arrow_forward
Please show me how to do part C) step by step with screenshots.
arrow_forward
What code in python can fix this?
arrow_forward
show how it is done and explain
arrow_forward
Building Additional Pages
In this lab we will start to build out the Learning Log project. We’ll build two pages that display data: a page that lists all topics and a page that shows all the entries for a particular topic. For each of these pages, we’ll specify a URL pattern, write a view function, and write a template. But before we do this, we’ll create a base template that all templates in the project can inherit from.
arrow_forward
https://www.tutorialspoint.com/updating-lists-in-python
Please solve with python
arrow_forward
One-page test plan
Fill up the following template word document attached in the image, in order to make a one-page test plan for our parking management system.
Use the class diagram as a reference.
The template gives you an idea of what a test plan should contain.
arrow_forward
Write Octave commands in the script file that you just created to do the following:1. Create a row vector r1 with six columns, where each of its elements is a 1.2. Create a row vector r2 with six columns, where each of its elements is a 0.3. Create a row vector r3 with six columns, where each of its elements is a random integer value between
arrow_forward
Html/javascript!!!!!!
First, create a deck of cards.
Now, write test code that shows you can sort your deck in these three (3) ways:
in order of card index number
in reverse order of card index number
at random
arrow_forward
Using this Online IDE, create three (3) linked lists named songs, artists, and playlists. You may refer to the sample codes in this module.
Add five (5) song titles to the first linked list then add the five (5) singers/bands of those songs to the second linked list.
Combine the first two (2) linked lists and add each combination to the third linked list, which is a playlist
arrow_forward
Create a program that will create a private jagged array with the following specifications:
a. contains 10 rowsb. the length of the first row is 10, next row is 9…and the last row is 1c. each index contains the sum of its indexes
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education
Related Questions
- CIS 261 Data Structures Programming Project: Graphical "Shut the Box" Game Create a graphical version of the classic "Shut the Box" game using JavaFX as introduced in our text. Short Video of rules overview: Board Game Rules: How to Play Shut the Box Game "Easy" introduction to Shut the Box (and some elementary Math applications): https://www.gamesforyoungminds.com/blog/2017/11/30/shut-the-box Your game needs to include: • Graphic representation of the "Box" and the dice Roll the dice Allow player to select which number(s) to "shut" and determine if that move is valid. • Use GUI for selecting which tiles (numbers) to shut • Graphically display status of tiles - open or shut Continue until the player "shuts the box" (closes all numbers) or can't play (no valid moves left). • Display the players score at the end - "You Win" or total of remaining numbers. Bonus for including: Multiple players (2-4). Decide how many "rounds" will be played. Lowest total score Wins • HINT feature -…arrow_forwardTaken from chegg, please provide orginal non plagarised work. https://www.chegg.com/homework-help/questions-and-answers/hi-m-struggling-please-provide-pswedocode-java-comment-thanks-q72540970arrow_forwardGive an example of something you can use a page-word index for that you can't do with a page index (no word positions), such as the one below. Word Index a 2, 3 according 2 and 2 be 2 blue 3 … (please type answer not write by hend)arrow_forward
- Use python to write a code that gives you this graph.arrow_forwardyWl öyiäll - wioilI jbisJl.docx [Compatibility Mode] - Word Sign in Insert Design Mailings Help O Tell me what you want to do A Share File Home Layout References Review View Q2) Implement the following method: Drawing the polyline with the given points using the method drawPolygon void draw2 (int x [] int y [], Graphics g ){ // Your code } Page 2 of 4 317 words English (United States) 160% 5:41 PM P Type here to search A d)) G ENG 04/07/2021arrow_forwardpublic void consolidate(Block block) // when the dropping block has reached its final location,//this method will consolidate it into the tetris well -- O(block_size) public void clearRows() // clear any/all rows that are complete and shifts the above tiles down -- O(board_size) public void reward() // applies the reward as explained in the project description -- O(board_size) public void penalize() // applies the penalty as explained in the project description -- O(board_size)help me pleasearrow_forward
- #create a duplicate of the image for y in range (image.getHeight()-1): for x in range (1, image.getWidth () -1): #Row major traversal of each pixel #display in the image oldPixel=image.getPixel(x,y) #Get the pixel present at current place leftPixel=image.getPixel(x-1,y) #Get left pixel. bottomPixel=image.getPixel (x, y+1) #Get bottom pixel oldLum-average (oldPixel) #calcule luminance of current pixel leftLum-average (leftPixel) #calculate luminance of left pixel bottomLum=average (bottomPixel) #calculate luminance of right pixel if abs (oldLum-leftLum) >threshold or\ abs (oldLum-bottomLum) >threshold: new.setPixel (x,y, sharpen (old, degree)) return new #3main function def main (): #Input image image=Image ("testImage.gif") image.draw () #Call the method, edge detection out=edge_detection (image, 80,10) out.draw () if main () name main ": ==arrow_forwardAssuming that you have a non-empty SQUARE matrix B of size n (assume n is ODD and n > 2 ; assume n is a variable present in Workspace) available in your Workspace, which Matlab command will not "extract" the center column of B? B(:,floor(n/2)) B(:,ceil(end/2)) B(1:end,ceil(mean(1:n))) B(:,ceil(length(B)/2))arrow_forwardexplain each linearrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education