BIO257_PS2

.txt

School

University of Rochester *

*We aren’t endorsed by this school

Course

257W

Subject

Biology

Date

Jan 9, 2024

Type

txt

Pages

3

Uploaded by MagistrateViper3914

Report
Problem Set 2 BIO 257 Due 9/19/2023 You may discuss this problem set with your classmates, but the work must be done independently. Please refer to the Academic honesty statement on Blackboard for additional details. Be sure to include your modified versions of the python scripts for each question and their output in the tar file that you submit. Please annotate your code with comments using ‘#’. Follow the example comments found in each python script in the PS2 folder. For all questions (1-5), please paste the commands that you used on the command line in this text file (BIO257_PS2.2023.txt) below each question, and be sure to turn this in with the rest of the assignment. Do not enter python code or output that is contained in a file into BIO257_PS2.2023.txt, just enter what you typed in/printed out to the terminal. Hint: refer to the Lab worksheets for example code that you can modify! Total points: 10 1) Make a "user_id.PS2" directory inside your '/scratch/bio257_2023/Users/user_id’ directory, and upload the PS2 files from blackboard (BIO257_PS2.2023.txt, change_case.py, GC_content.py, mixed_case.fasta, and genome.assembly.key.txt) into that folder. If PS2 folder exists on Bluehive, then you may copy that folder to '/scratch/bio257_2023/Users/user_id’ directory, and change the name of new PS2 folder in your directory to ‘user_id.PS2’. (Note: replace ‘user_id’ with your user id). Change permissions on your ‘user_id.PS2’ folder itself (not recursively) so that owner (you) can read, write, and execute, but all others cannot read, write, or execute. Do all of the following problems inside that directory— be sure that all of your code and output are in this directory before you do #5. cp -r /scratch/bio257_2023/Problem_Sets/PS2/. /scratch/bio257_2023/Users/xlu36 mkdir xlu36.PS2 mv BIO257_PS2.2023.txt genome.assembly.key.txt change_case.py mixed_case.fasta GC_content.py xlu36.PS2 chmod 700 xlu36.PS2 cd .. chmod 700 xlu36.PS2 cd .. ls -l xlu36 Output: total 4 drwxrws---+ 2 xlu36 bio257_2023 4096 Sep 5 12:54 Module1 drwxrws---+ 2 xlu36 bio257_2023 4096 Sep 21 13:13 Module4 -rwxr-x---+ 1 xlu36 bio257_2023 1177 Sep 21 09:25 module6mo drwxrws---+ 2 xlu36 bio257_2023 4096 Sep 21 13:31 Module6 -rw-rwsr--+ 1 xlu36 bio257_2023 92 Sep 5 13:09 xlu36.info.txt drwxrws---+ 2 xlu36 bio257_2023 4096 Sep 8 11:32 xlu36.PS1 drwx--S---+ 2 xlu36 bio257_2023 4096 Sep 22 15:34 xlu36.PS2 2) Modify the 'change_case.py' script to read in a file, convert each line to all lowercase and print to output file 'lower_case.fasta’. Run your code using ‘mixed_case.fasta’ as the input file. python3 change_case.py mixed_case.fasta lower_case.fasta
more -10 lower_case.fasta 3) Currently, the 'GC_content.py' script only calculates 'G'%. Modify it so that it calculates 'G'%, 'C'%, and GC% (that is, percentage of bases that are G or C) and use the following string as input: CATGCATTATTGTCTCAGTGCAGTTGTCAGTTGCAGTTCAGCAGACGGGCTAACGAGTACTTGCATCTCTTCAAATTTACTTA ATTGATCAAGTAAGTAGCAAAAGGGCACACAATTGAAGGAAATTCTTGTTTAATTGAATTTATTATGCAAGTGCGGAAATAAA ATGACAGTATTAAATAGTAAATATTTTGTAAAATCATATATAATCAAATTTATTCAATCAGAACTAATTCAAGC Paste the command line that you used to run your script and the output that was printed to your screen below. python3 GC_content.py CATGCATTATTGTCTCAGTGCAGTTGTCAGTTGCAGTTCAGCAGACGGGCTAACGAGTACTTGCATCTCTTCAAATTTACTTA ATTGATCAAGTAAGTAGCAAAAGGGCACACAATTGAAGGAAATTCTTGTTTAATTGAATTTATTATGCAAGTGCGGAAATAAA ATGACAGTATTAAATAGTAAATATTTTGTAAAATCATATA Output: The G% of your sequence is: 18.446601941747574 The C% of your sequence is: 13.592233009708737 The GC% of your sequence is: 32.038834951456316 4) You’ve made a genome assembly that has 4 scaffolds and 17 contigs. Your ‘genome.assembly.key.txt’ file contains a list of contigs and the scaffolds they belong to in your assembly, where contig and scaffold names are delimited by a colon. Write a python script named 'summarize_contigs.py' to: a. read in the file 'genome.assembly.key.txt', b. use a 'for' loop to go through every line in this file, c. output the name of the contigs belonging to 'Scaffold_3' to an output file ‘Scaffold3.contig.list.txt'. Be sure to only print out the name of the contigs (not the scaffolds), one name on each line of your output file. python3 summarize_contigs.py genome.assembly.key.txt Scaffold3.contig.list.txt lsls Output: BIO257_PS2.2023.txt genome.assembly.key.txt Scaffold3.contig.list.txt change_case.py lower_case.fasta summarize_contigs.py GC_content.py mixed_case.fasta 5) Make a tarfile of your directory that includes your user_id and *move* (not copy) this tarfile to the following directory: /scratch/bio257_2023/Assignment_dump/PS2. (Be sure to paste your commands here.) tar -cf xlu36.PS2.tar xlu36.PS2 mv xlu36.PS2.tar /scratch/bio257_2023/Assignment_dump/PS2 Hint: you can use the command ‘tar -tvf mytarfile.tar’ to list the contents of your tar file after archiving your folder. Use this command to check that your tar file has all of its contents before you move it to the Assignment_dump/PS2 directory. I have worked with the following students on this assignment:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help