Background In Lecture 5 to Lecture 6, we learned the python programming skills to process biological sequences and pattern searching. In this assignment, you are required to write python programs to practice processing of biological sequences and pattern searching. Question 1 (3%) Write a function that check whether an input sequence is a valid protein sequence. The function template is given to you as below. The function should return True if the input sequence is a valid protein sequence, and return False otherwise. The following is the list of valid aminoacid symbols: "A", "C", "D", "E", "F", "G", "H", "I", "L", "M", "N", "P", "Q", "R", "S", "T","V", "W", "Y" def validate_protein (protein_seq): """ Checks if protein sequence is valid. Returns True is sequence is valid, or False otherwise. # To be completed... Question 2 (5%) 11 11 11 Write a function that, given a sequence as an argument, allows to detect if there are repeated sub-sequences of size k (the second argument of the function). The result should be a dictionary where keys are sub-sequences and values are the number of times they occur (at least 2). The function template is given to you as below. (Hints: you can make use of the function "search_all_occurrences" shown to you in Lecture 6) def number_of_repeated_subseq (seq, k): """Return a dictionary where keys are sub-sequences of size k and

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question

Background
In Lecture 5 to Lecture 6, we learned the python programming skills to process biological
sequences and pattern searching. In this assignment, you are required to write python programs
to practice processing of biological sequences and pattern searching.
Question 1 (3%)
Write a function that check whether an input sequence is a valid protein sequence. The function
template is given to you as below. The function should return True if the input sequence is a
valid protein sequence, and return False otherwise. The following is the list of valid aminoacid
symbols:
"A", "C", "D", "E", "F", "G", "H", "I", "L", "M", "N", "P", "Q", "R", "S",
"T", "V", "W", "Y"
def validate_protein (protein_seq) :
""" Checks if protein sequence is valid. Returns True is sequence is
valid, or False otherwise.
# To be completed...
Question 2 (5%)
Write a function that, given a sequence as an argument, allows to detect if there are repeated
sub-sequences of size k (the second argument of the function). The result should be a dictionary
where keys are sub-sequences and values are the number of times they occur (at least 2). The
function template is given to you as below. (Hints: you can make use of the function
"search_all_occurrences" shown to you in Lecture 6)
def number_of_repeated_subseq (seq, k):
"""Return a dictionary where keys are sub-sequences of size k and
11 11 11
values are number of times they occur (at least 2) """
dic = {}
# To be completed...
return dic
Question 3 (7%)
Write a function that performs a text search in a text file. The function takes the filename and
a list of strings as inputs. The list of strings is the patterns to be searched within the text file. A
"*" character within the string is a wildcard character, which can stand for unknown characters
with any length greater than zero. It returns a dictionary where keys are the patterns in the list
of strings being searched, and values are the number of times they occur. Your function should
have the following template:
def text_search (filename, pattern. 1/2
"""It searches the file filename and returns a dictionary of search
result, showing patterns with number of occurrences"""
dic = {}
# To be completed.
1
Transcribed Image Text:Background In Lecture 5 to Lecture 6, we learned the python programming skills to process biological sequences and pattern searching. In this assignment, you are required to write python programs to practice processing of biological sequences and pattern searching. Question 1 (3%) Write a function that check whether an input sequence is a valid protein sequence. The function template is given to you as below. The function should return True if the input sequence is a valid protein sequence, and return False otherwise. The following is the list of valid aminoacid symbols: "A", "C", "D", "E", "F", "G", "H", "I", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y" def validate_protein (protein_seq) : """ Checks if protein sequence is valid. Returns True is sequence is valid, or False otherwise. # To be completed... Question 2 (5%) Write a function that, given a sequence as an argument, allows to detect if there are repeated sub-sequences of size k (the second argument of the function). The result should be a dictionary where keys are sub-sequences and values are the number of times they occur (at least 2). The function template is given to you as below. (Hints: you can make use of the function "search_all_occurrences" shown to you in Lecture 6) def number_of_repeated_subseq (seq, k): """Return a dictionary where keys are sub-sequences of size k and 11 11 11 values are number of times they occur (at least 2) """ dic = {} # To be completed... return dic Question 3 (7%) Write a function that performs a text search in a text file. The function takes the filename and a list of strings as inputs. The list of strings is the patterns to be searched within the text file. A "*" character within the string is a wildcard character, which can stand for unknown characters with any length greater than zero. It returns a dictionary where keys are the patterns in the list of strings being searched, and values are the number of times they occur. Your function should have the following template: def text_search (filename, pattern. 1/2 """It searches the file filename and returns a dictionary of search result, showing patterns with number of occurrences""" dic = {} # To be completed. 1
where keys are sub-sequences and values are the number of times they occur (at least 2). The
function template is given to you as below. (Hints: you can make use of the function
"search_all_occurrences" shown to you in Lecture 6)
def number_of_repeated_subseq
(seq, k):
"""Return a dictionary where keys are sub-sequences of size k and
values are number of times they occur (at least 2) """
dic = {}
# To be completed...
re rn dic
Question 3 (7%)
Write a function that performs a text search in a text file. The function takes the filename and
a list of strings as inputs. The list of strings is the patterns to be searched within the text file. A
"*" character within the string is a wildcard character, which can stand for unknown characters
with any length greater than zero. It returns a dictionary where keys are the patterns in the list
of strings being searched, and values are the number of times they occur. Your function should
have the following template:
def text_search (filename, patterns):
"""It searches the file filename and returns a dictionary of search
result, showing patterns with number of occurrences"""
dic = {}
# To be completed...
return dic
For example, suppose a file with filename “File 1.txt" contains the following texts,
Mary is a girl.
They are girls.
Fish has gills.
By calling the following lines of codes,
strings = ["is", "gi*1"]
print (text_search ("File 1.txt", strings))
The following output is obtained:
{'is': 2, 'girl': 2, 'gill': 1}
1
2/2
2
Transcribed Image Text:where keys are sub-sequences and values are the number of times they occur (at least 2). The function template is given to you as below. (Hints: you can make use of the function "search_all_occurrences" shown to you in Lecture 6) def number_of_repeated_subseq (seq, k): """Return a dictionary where keys are sub-sequences of size k and values are number of times they occur (at least 2) """ dic = {} # To be completed... re rn dic Question 3 (7%) Write a function that performs a text search in a text file. The function takes the filename and a list of strings as inputs. The list of strings is the patterns to be searched within the text file. A "*" character within the string is a wildcard character, which can stand for unknown characters with any length greater than zero. It returns a dictionary where keys are the patterns in the list of strings being searched, and values are the number of times they occur. Your function should have the following template: def text_search (filename, patterns): """It searches the file filename and returns a dictionary of search result, showing patterns with number of occurrences""" dic = {} # To be completed... return dic For example, suppose a file with filename “File 1.txt" contains the following texts, Mary is a girl. They are girls. Fish has gills. By calling the following lines of codes, strings = ["is", "gi*1"] print (text_search ("File 1.txt", strings)) The following output is obtained: {'is': 2, 'girl': 2, 'gill': 1} 1 2/2 2
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY