STEP 1: Begin work within your Jupyter Notebook by importing the following modules:     import numpy as np     import pandas as pd     from matplotlib import pyplot as plt     import re Jupyter Notebooks   Q1. Within your Jupyter Notebook, write the code for a Python function called def parseWeatherByYear(year) : This function will parse an html page containing weather for an entire year of data for the city of Toronto. The html pages containing weather data can be downloaded from: https://www.extremeweatherwatch.com/cities/toronto/year-2023 The file to parse for this lab however can be downloaded here: https://matrix.senecacollege.ca/~danny.abesdris/prg550.232/labs/lab6/torontoWeather.2023.html The html file itself contains markers as where to begin parsing the data to extract. The 3 pieces of data that must be extracted consist of the high and low temperatures (in degrees Celsius) as well as the amount of precipitation (in cm) for every day so far in the current year (2023). A series of lines containing where to begin extracting data is listed below: January 1 5.0 2.7 0.15   Notice the marker in the lines above: /cities/toronto/day/month-n   In the example above, the data to extract would be: 5.0, 2.7, and 0.15. The extraction can be achieved in several ways, but a carefully structured regular expression (using the match.group( ) directive as well as the re.S and re.M flags) is recommended for speed and simplicity. The trick here is to match text up to the point where the data begins (as groups) and then forming another regular expression that matches the data (again as a group). As always, the website https://regex101.com will be invaluable in helping you to achieve your solution with this. The data to be extracted must range from january 1, 2023 to the cutoff date for this file of march 16, 2023. It would be helpful to create a Numpy array of the number of days in each month of the year and then to investigate the Pandas date_range( ) function and the Series.dt.month_name attribute to allow you to programmatically capture the month names. https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.month_name.html The html file itself must be opened and the entire contents read into a string.   As the data from the html file is extracted, your function must also write the data into a CSV (comma separated values) file using the initial heading (title) of:   City,dayOfyear,month,dayOfMonth,Year,highTemp,lowTemp,precipitation   You are to write each field separated by commas (,) and followed by the new line.   The first 10 records of the resultant file should be exactly as listed below:   City,dayOfyear,month,dayOfMonth,Year,highTemp,lowTemp,precipitation Toronto,1,january,1,2023,5.0,2.7,0.15 Toronto,2,january,2,2023,5.6,3.5,0.00 Toronto,3,january,3,2023,4.4,2.8,0.33 Toronto,4,january,4,2023,4.4,2.5,2.11 Toronto,5,january,5,2023,4.8,3.2,0.02 Toronto,6,january,6,2023,5.1,2.9,0.00 Toronto,7,january,7,2023,3.2,-4.1,0.00 Toronto,8,january,8,2023,-1.5,-4.8,0.00 Toronto,9,january,9,2023,2.2,-1.7,0.01   There are exactly 75 records in the html file to extract and therefore 75 records are to be written to the CSV file.   Once the file has been created and all records written, your function must load the CSV file into a Pandas data frame and display ALL records in the data frame using the functions:   pd.read_csv(csvFile) # read csv file into Data Frame pd.set_option('display.max_rows', None) # set a flag to display all rows in the output   The data frame's shape attribute and describe( ) method must also be invoked and displayed.   The exact output on the command line should be as listed below:       City     dayOfyear     month  dayOfMonth  Year  highTemp  lowTemp  precipitation 0   Toronto          1   january           1  2023       5.0      2.7           0.15 1   Toronto          2   january           2  2023       5.6      3.5           0.00 2   Toronto          3   january           3  2023       4.4      2.8           0.33 3   Toronto          4   january           4  2023       4.4      2.5           2.11 4   Toronto          5   january           5  2023       4.8      3.2           0.02 5   Toronto          6   january           6  2023       5.1      2.9           0.00 ...TO 75

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

 

STEP 1: Begin work within your Jupyter Notebook by importing the following modules:

    import numpy as np

    import pandas as pd

    from matplotlib import pyplot as plt

    import re

Jupyter Notebooks

 

Q1. Within your Jupyter Notebook, write the code for a Python function called

def parseWeatherByYear(year) :

This function will parse an html page containing weather for an entire year of data for the city of Toronto.

The html pages containing weather data can be downloaded from: https://www.extremeweatherwatch.com/cities/toronto/year-2023

The file to parse for this lab however can be downloaded here: https://matrix.senecacollege.ca/~danny.abesdris/prg550.232/labs/lab6/torontoWeather.2023.html

The html file itself contains markers as where to begin parsing the data to extract. The 3 pieces of data that must be extracted consist of the high and low temperatures (in degrees Celsius) as well as the amount of precipitation (in cm) for every day so far in the current year (2023).

A series of lines containing where to begin extracting data is listed below:

<td><div class='width-130'><a href='/cities/toronto/day/january-1'>January 1</a></div></td>

<td class='text-right temp40'>5.0</td>

<td class='text-right temp30'>2.7</td>

<td class='text-right rainsnow1'>0.15</td>

</tr>

 

Notice the marker in the lines above:

/cities/toronto/day/month-n

 

In the example above, the data to extract would be: 5.0, 2.7, and 0.15. The extraction can be achieved in several ways, but a carefully structured regular expression (using the match.group( ) directive as well as the re.S and re.M flags) is recommended for speed and simplicity. The trick here is to match text up to the point where the data begins (as groups) and then forming another regular expression that matches the data (again as a group).

As always, the website https://regex101.com will be invaluable in helping you to achieve your solution with this.

The data to be extracted must range from january 1, 2023 to the cutoff date for this file of march 16, 2023.

It would be helpful to create a Numpy array of the number of days in each month of the year and then to investigate the Pandas date_range( ) function and the Series.dt.month_name attribute to allow you to programmatically capture the month names.

https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.month_name.html

The html file itself must be opened and the entire contents read into a string.

 

As the data from the html file is extracted, your function must also write the data into a CSV (comma separated values) file using the initial heading (title) of:

 

City,dayOfyear,month,dayOfMonth,Year,highTemp,lowTemp,precipitation

 

You are to write each field separated by commas (,) and followed by the new line.

 

The first 10 records of the resultant file should be exactly as listed below:

 

City,dayOfyear,month,dayOfMonth,Year,highTemp,lowTemp,precipitation

Toronto,1,january,1,2023,5.0,2.7,0.15

Toronto,2,january,2,2023,5.6,3.5,0.00

Toronto,3,january,3,2023,4.4,2.8,0.33

Toronto,4,january,4,2023,4.4,2.5,2.11

Toronto,5,january,5,2023,4.8,3.2,0.02

Toronto,6,january,6,2023,5.1,2.9,0.00

Toronto,7,january,7,2023,3.2,-4.1,0.00

Toronto,8,january,8,2023,-1.5,-4.8,0.00

Toronto,9,january,9,2023,2.2,-1.7,0.01

 

There are exactly 75 records in the html file to extract and therefore 75 records are to be written to the CSV file.

 

Once the file has been created and all records written, your function must load the CSV file into a Pandas data frame and display ALL records in the data frame using the functions:

 

pd.read_csv(csvFile) # read csv file into Data Frame

pd.set_option('display.max_rows', None) # set a flag to display all rows in the output

 

The data frame's shape attribute and describe( ) method must also be invoked and displayed.

 

The exact output on the command line should be as listed below:

 

    City     dayOfyear     month  dayOfMonth  Year  highTemp  lowTemp  precipitation

0   Toronto          1   january           1  2023       5.0      2.7           0.15

1   Toronto          2   january           2  2023       5.6      3.5           0.00

2   Toronto          3   january           3  2023       4.4      2.8           0.33

3   Toronto          4   january           4  2023       4.4      2.5           2.11

4   Toronto          5   january           5  2023       4.8      3.2           0.02

5   Toronto          6   january           6  2023       5.1      2.9           0.00

...TO 75

 

 

Expert Solution
steps

Step by step

Solved in 4 steps with 3 images

Blurred answer
Knowledge Booster
Array
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education