Homework 05 (1)
.docx
keyboard_arrow_up
School
McGill University *
*We aren’t endorsed by this school
Course
370
Subject
Computer Science
Date
Apr 3, 2024
Type
docx
Pages
1
Uploaded by HighnessCaribouMaster1075
COMP 370, Fall 2023
COMP 370 Homework 5 – 311 Question Formulation
Assigned Oct 2, 2023
Due Oct 11, 2023 @ 11:59 PM
In this homework, we continue with our engagement with the New York City data division. We’re working on refining questions. You’ll be using the same derivate of the following dataset:
-
https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9
As before, for the purpose of this assignment:
1.
Download nyc_311.csv.tgz
from MyCourses.
2.
Trim it down to only include the incidents that occurred in 2020 (for an added challenge, see if you can trim the dataset down using exactly one call to the grep command line tool).
For the remainder of this assignment, you should only work with the trimmed down dataset.
For each task, below you have 2 objectives.
Objective 1: Formalize the question in a way that you can actually answer. When you do this, follow the example given in class (online political violence) in which the question was iteratively refined. For each question, show at least four versions of the question. Version 1: the question as posed by the stakeholder. Version 2: one refinement of the question that is more measurable and maps better onto the available data. Version 3: a second refinement that maps even better onto the available data. Version 4: the final version of the question that is very quantifiable and maps onto the available data. Objective 2: Use python (Jupyter notebooks, CLIs, or any other pythonic approach) to build a visualization that credibly answers the question.
Task 1: Noise
The mayor wants to know if noise issues tend to stem from different causes across the year.
Task 2: Urban Rodents
The Departments of Sanitation and Health would like to know the where in the city rats and mice are most likely to create sanitation issues. In discussion with them, you determine that they aren’t thinking in terms of geography, but more in terms of the kinds of buildings/properties/structures we find around a city.
Submission Instructions
-
question_formalizations.md – a document with two sections, one for each task. Each section contains an itemized list of the versions of the question, ending with the fully refined question.
-
task1_plot.png/jpg – the plot generated for task 1 that addresses the question in Task 1.
-
task2_plot.png/jpg – the plot generated for task 2 that addresses the question in Task 2.
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
1. SOL
Using SOLvoa, Create following table, Insert ALL the data/record.
STUDENT
ADDRESS
Ampang
Gombak
Selayang
Ampang
Bangsar
NO MATRIK
NO TEL
NAME
Mazlina
Kumar
Ridwan
Low Teck
Aminah
COURSE
12345
012-3842375
DTM
MLVK
DPG
SSK
DPG
23456
012-6101617
34567
45678
013-2356894
56789
017-2567741
COURSE
COURSE
DTM
SUBJECT1
TT235
SUBJECT2
TT695
SUBJECT3
PP563
LECT1
LECT2
LECT3
Lo56
Lo03
L322
MLVK
ТТ365
TT695
SSo03
TT235
SS003
L123
Lo56
L123
L322
L056
Lo03
L601
DPG
SSK
PP563
TT234
PP999
PP023
Lo03
L322
LECTURER
ADDRESS
Damansara
Kajang
Gombak
Kajang
Cheras
QUALIFICATION DEPARTMENT
Msc CS
Mba
Bς CS
Bsc IT
Βα Econ
LECT ID
NAME
Lo03
L123
L056
L322
L601
John
Milah
Ahmad
Raj
Siti
Teknologi
Perdagangan
Teknologi
Teknologi
Perdagangan
Using SQL Query, answer all trhis following statement.
i. Subject taken by Kumar (2.5m)
ii. Aminah's lecturer and subject.(2.5m)
iii. Subject that teach by Ahmad(2.5m)
iv. Student that don't have any contact number.(2.5m)
arrow_forward
mysql
Create four tables below and populate a sample data.
arrow_forward
SQL
FINAL.LOCATIONS
FINAL.COUNTRIES
P LOCATION_ID
NUMBER
VARCHAR2 (40 BYTE)
FINAL.REGIONS
P. COUNTRY_ID
COUNTRY_NAME
REGION_ID
CREATED_BY
CREATED DATE
MODIFIED_BY
MODIFIED_DATE
CHAR (2 BYTE)
VARCHAR2 (40 BYTE)
* REGION_ID
REGION_NAME
CREATED_BY
STREET ADDRESS
NUMBER
* CITY
VARCHAR2 (30 BYTE)
VARCHAR2 (25 BYTE)
NUMBER
VARCHAR2 (25 BYTE)
VARCHAR2 (20 BYTE)
CHAR (2 BYTE)
VARCHAR2 (30 BYTE)
STATE_PROVINCE
VARCHAR2 (30 BYTE)
VARCHAR2 (30 BYTE)
POSTAL_CODE
COUNTRY_ID
CREATED_BY
CREATED DATE
DATE
DATE
MODIFIED_BY
MODIFIED_DATE
VARCHAR2 (30 BYTE)
VARCHAR2 (30 BYTE)
DATE
DATE
CREATED_DATE
MODIFIED_BY
MODIFIED_DATE
DATE
REGION_PK (REGION_ID)
O REGION_PK (REOION_ID)
VARCHAR2 (00 BYTE)
COUNTRIES_PK (COUNTRY_ID)
DATE
s COUNTRIES FK1 (REGION_ID)
LOCATIONS_PK(LOCATION_ID)
o COUNTRIES_PK (COUNTRY_ID)
LOCATIONS_FK1 (COUNTRY_ID)
• LOCATIONS_PK (LOCATION_ID)
FINAL.EMPLOYEES
FINAL. EMP DETAILS_VIEW
P'EMPLOYEE_ID
FIRST NAME
LAST_NAME
FINAL.DEPARTMENTS
NUMBER
C1
ROWID
VARCHAR2 (20 BYTE)
P' DEPARTMENT_ID
*…
arrow_forward
ANSWER THE FOLLOWING
arrow_forward
The InstantRide User Satisfaction team are looking forward to creating discounts for the users. However, the team suspects that there could be duplicate users in the system with different emails. Check for the users with their names and surnames for potential duplicates. Therefore, you need to JOIN the USERS table with USERS table and compare for equality of USER_FIRST_NAME and USER_LAST_NAME and difference in USER_ID fields.
arrow_forward
How do we choose a model for our Microsoft Access tables?z
arrow_forward
Q2/ Consider the following table is used to store contact information:
Name
Company
Address
Phone1
Phone2
Phone3
ZipCode
Joe
АВС
123
5532
2234
3211
12345
Jane
XYZ
456
3421
14454
Chris
PDQ
789
2341
6655
14423
The table is considered as not normalized, explain why? And what is the
1NF of the table?
arrow_forward
Identify for each table, the followings: Foreign keys Candidate keys Primary key Alternate keys
arrow_forward
Create a data dictionary for this receipt
arrow_forward
How can indexing be optimized for complex data types to improve query performance?
arrow_forward
Discuss the role of indexing algorithms such as B-trees and bitmap indexes in optimizing query performance. When and why would you choose one indexing method over another?
arrow_forward
The data in flat files has been provided:
INVOICE TABLE
INVOICE_NUM
CUSTOMER_ID
INVOICE_DATE
EMPLOYEE_ID
COIN_ID
DELIVERY_ID
8111
11011
15 May 2021
emp103
7111
511
8112
11013
15 May 2021
emp101
7116
512
8113
11012
17 May 2021
emp101
7112
513
8114
11015
17 May 2021
emp102
7111
514
8115
11011
17 May 2021
emp102
7115
515
8116
11015
18 May 2021
emp103
7115
516
8117
11012
19 May 2021
emp105
7112
517
8118
11013
19 May 2021
emp105
7112
517
COIN_RETURNS TABLE
RETURN_ID
RETURN_DATE
REASON
CUSTOMER_ID
COIN_ID
EMPLOYEE_ID
ret001
25 May 2021
Customer not satisfied with product
11011
7116
emp101
ret002
25 May 2021
Product missing part
11013
7114
emp103
COIN TABLE
COIN_ID
PRODUCT
PRICE
QTY
7111
1oz Gold Kruger Rand
R 5 999
10
7112
1oz Silver Kruger Rand
R 12 999
8
7113
Gold Big 5 Uncirculated
R 15 999
8
7114
Silver Big 5 Pack
R 7 999
5
7115
1oz Gold Palaeontology
R 11 999
15
7116
1oz Silver Palaeontology
R 7 999
12
COIN_DELIVERY TABLE…
arrow_forward
Normalize the Index metadata connection and explain why you think it would be slower to use.
arrow_forward
Insert the following records into their corresponding tables, using the MySQL CLI. Take a screenshot of a Select * command being successfully executed on each table (6 total).
School
School Number
Name
Address
PhoneNumber
built
size
54
John Adams High School
8226 Selby Lane
5056444088
2012-12-13
118500
45
Hogwarts School of Witchcraft and Wizardry
738 North Williams Ave.
5056448362
2001-11-14
414000
119
Dillon High School
475 South University Ave.
5058672818
2006-10-03
102598
345
Green Dale High
772 Grand St.
5056624410
2009-09-17
250345
93
Bayside High
7914 Aspen Drive
5057756575
2000-08-20
175645
arrow_forward
room
rent price_day
€ 90.000
€ 100.000
€ 170.000
€ 175.000
€ 200.000
€ 250.000
€ 300.000
room_number
capacity floor -
201
2
205
1
305
1
202
4
302
104
1
102
4
Question 23 /31
In this Datasheet view, the data sorted according to the:
1. Ocapacity field in an ascending order
2. Oroom number field in descending order
3. Orent price_day field in descending order
4. Orent price_day field in an ascending order
5. Onone of the above
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Related Questions
- 1. SOL Using SOLvoa, Create following table, Insert ALL the data/record. STUDENT ADDRESS Ampang Gombak Selayang Ampang Bangsar NO MATRIK NO TEL NAME Mazlina Kumar Ridwan Low Teck Aminah COURSE 12345 012-3842375 DTM MLVK DPG SSK DPG 23456 012-6101617 34567 45678 013-2356894 56789 017-2567741 COURSE COURSE DTM SUBJECT1 TT235 SUBJECT2 TT695 SUBJECT3 PP563 LECT1 LECT2 LECT3 Lo56 Lo03 L322 MLVK ТТ365 TT695 SSo03 TT235 SS003 L123 Lo56 L123 L322 L056 Lo03 L601 DPG SSK PP563 TT234 PP999 PP023 Lo03 L322 LECTURER ADDRESS Damansara Kajang Gombak Kajang Cheras QUALIFICATION DEPARTMENT Msc CS Mba Bς CS Bsc IT Βα Econ LECT ID NAME Lo03 L123 L056 L322 L601 John Milah Ahmad Raj Siti Teknologi Perdagangan Teknologi Teknologi Perdagangan Using SQL Query, answer all trhis following statement. i. Subject taken by Kumar (2.5m) ii. Aminah's lecturer and subject.(2.5m) iii. Subject that teach by Ahmad(2.5m) iv. Student that don't have any contact number.(2.5m)arrow_forwardmysql Create four tables below and populate a sample data.arrow_forwardSQL FINAL.LOCATIONS FINAL.COUNTRIES P LOCATION_ID NUMBER VARCHAR2 (40 BYTE) FINAL.REGIONS P. COUNTRY_ID COUNTRY_NAME REGION_ID CREATED_BY CREATED DATE MODIFIED_BY MODIFIED_DATE CHAR (2 BYTE) VARCHAR2 (40 BYTE) * REGION_ID REGION_NAME CREATED_BY STREET ADDRESS NUMBER * CITY VARCHAR2 (30 BYTE) VARCHAR2 (25 BYTE) NUMBER VARCHAR2 (25 BYTE) VARCHAR2 (20 BYTE) CHAR (2 BYTE) VARCHAR2 (30 BYTE) STATE_PROVINCE VARCHAR2 (30 BYTE) VARCHAR2 (30 BYTE) POSTAL_CODE COUNTRY_ID CREATED_BY CREATED DATE DATE DATE MODIFIED_BY MODIFIED_DATE VARCHAR2 (30 BYTE) VARCHAR2 (30 BYTE) DATE DATE CREATED_DATE MODIFIED_BY MODIFIED_DATE DATE REGION_PK (REGION_ID) O REGION_PK (REOION_ID) VARCHAR2 (00 BYTE) COUNTRIES_PK (COUNTRY_ID) DATE s COUNTRIES FK1 (REGION_ID) LOCATIONS_PK(LOCATION_ID) o COUNTRIES_PK (COUNTRY_ID) LOCATIONS_FK1 (COUNTRY_ID) • LOCATIONS_PK (LOCATION_ID) FINAL.EMPLOYEES FINAL. EMP DETAILS_VIEW P'EMPLOYEE_ID FIRST NAME LAST_NAME FINAL.DEPARTMENTS NUMBER C1 ROWID VARCHAR2 (20 BYTE) P' DEPARTMENT_ID *…arrow_forward
- ANSWER THE FOLLOWINGarrow_forwardThe InstantRide User Satisfaction team are looking forward to creating discounts for the users. However, the team suspects that there could be duplicate users in the system with different emails. Check for the users with their names and surnames for potential duplicates. Therefore, you need to JOIN the USERS table with USERS table and compare for equality of USER_FIRST_NAME and USER_LAST_NAME and difference in USER_ID fields.arrow_forwardHow do we choose a model for our Microsoft Access tables?zarrow_forward
- Q2/ Consider the following table is used to store contact information: Name Company Address Phone1 Phone2 Phone3 ZipCode Joe АВС 123 5532 2234 3211 12345 Jane XYZ 456 3421 14454 Chris PDQ 789 2341 6655 14423 The table is considered as not normalized, explain why? And what is the 1NF of the table?arrow_forwardIdentify for each table, the followings: Foreign keys Candidate keys Primary key Alternate keysarrow_forwardCreate a data dictionary for this receiptarrow_forward
- How can indexing be optimized for complex data types to improve query performance?arrow_forwardDiscuss the role of indexing algorithms such as B-trees and bitmap indexes in optimizing query performance. When and why would you choose one indexing method over another?arrow_forwardThe data in flat files has been provided: INVOICE TABLE INVOICE_NUM CUSTOMER_ID INVOICE_DATE EMPLOYEE_ID COIN_ID DELIVERY_ID 8111 11011 15 May 2021 emp103 7111 511 8112 11013 15 May 2021 emp101 7116 512 8113 11012 17 May 2021 emp101 7112 513 8114 11015 17 May 2021 emp102 7111 514 8115 11011 17 May 2021 emp102 7115 515 8116 11015 18 May 2021 emp103 7115 516 8117 11012 19 May 2021 emp105 7112 517 8118 11013 19 May 2021 emp105 7112 517 COIN_RETURNS TABLE RETURN_ID RETURN_DATE REASON CUSTOMER_ID COIN_ID EMPLOYEE_ID ret001 25 May 2021 Customer not satisfied with product 11011 7116 emp101 ret002 25 May 2021 Product missing part 11013 7114 emp103 COIN TABLE COIN_ID PRODUCT PRICE QTY 7111 1oz Gold Kruger Rand R 5 999 10 7112 1oz Silver Kruger Rand R 12 999 8 7113 Gold Big 5 Uncirculated R 15 999 8 7114 Silver Big 5 Pack R 7 999 5 7115 1oz Gold Palaeontology R 11 999 15 7116 1oz Silver Palaeontology R 7 999 12 COIN_DELIVERY TABLE…arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Database Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781305627482Author:Carlos Coronel, Steven MorrisPublisher:Cengage Learning
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning