HW1

.pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

5645

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

8

Uploaded by ChancellorGazelle2475

Report
EECE 5645 Assignment 1: Text Analyzer Follow the Discovery Cluster Rules : Never run jobs on the gateways. Do not reserve more than one node from the courses partition in the last 24 hours before a homework deadline Start working on homework assignments early. 1
EECE5645 Preparation Follow the “Discovery Cluster Checklist” under “Programming Resources” on Canvas to copy the latest .bashrc file to your Discovery cluster home directory. Familiarize yourself with the ld5645 command, also descirbed there. Make sure that a folder on discovery named after your username exists under the directory /scratch . You can confirm this by logging into the cluster and typing ls /scratch/ | grep $USER You should see a directory named after your username. Copy the directory /courses/EECE5645.202410/data/HW1/Files to the folder you just checked, renamed as HW1 . You can do so by typing: cp -r /courses/EECE5645.202410/data/HW1/Files /scratch/$USER/HW1 Make the contents of this directory private, by typing: chmod -R go-rx /scratch/$USER/HW1 After you do this, your scratch HW1 folder should contain two Python files, called TextAnalyzer.py and helpers.py . Data The directory called /courses/EECE5645.202410/data/HW1/Data contains books from the Project Guten- berg 1 , other documents from the American National Corpus 2 , and the files DaleChallEasyWordList.txt and fireandice.txt , which will be used throughout this assignment. Deliverables In this assignment, you are asked to modify the provided code and use it to analyze this dataset. You must: 1. Provide a report, in pdf format, outlining the answers of the questions below. The report should be type-written in a word processor of your choice (e.g., MS Word, L A T E X, etc.). 2. Provide the final files TextAnalyzer.py and helpers.py you wrote. The report, along with your final code, should be uploaded on Canvas. Upload files separately. DO NOT UPLOAD .zip FILES. 1 https://www.gutenberg.org 2 http://anc.org/ c 2022, Stratis Ioannidis 2
EECE5645 Question 0: Go to the directory that contains TextAnalyzer.py and run the following from the command prompt: python TextAnalyzer.py --help What does this print? What portion of the code causes this to be printed? Find the documentation of the module that offers this functionality. Use this to describe what happens at each line of code that uses a method or object defined in this module. Question 1: Implement the missing functions in file helpers.py . In particular: 1(a) Implement strip_non_alpha , as indicated in the docstring of the function. Modify the main body of helpers.py so that, when you run the program via python helpers.py the main body of the program runs unit tests (e.g., via the assert command) with several different inputs to confirm that the correct output was produced. Make sure that you include tests in which either the input or the output of the function is the empty string. Include (i) your definition of strip_non_alpha , (ii) the tests you implemented in your report. 1(b) Similarly, implement is_inflection_of , same , and find_match , as indicated by the corresponding docstrings. Again, test these extensively by modifying the main body of helpers.py , making sure these functions correctly handle empty strings. Include again (i)-(ii) as above (your code and unit tests) in your report. Hint: If you have python3 installed on your computer, you can implement and test these functions on your own machine without connecting to the cluster. c 2022, Stratis Ioannidis 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help