Concept explainers
Eukaryotic genomes are replete with repetitive sequences that make genome assembly from sequence reads difficult. For example, sequences such as CTCTCTCTCT .(tandem repeats of the dinucleotide sequence CT) are found at many chromosomal locations, with variable numbers (n) of the CT repeating unit at each location. Scientists can assemble genomes despite these difficulties by using the paired-end sequencing strategy diagrammed in Fig. 9.9. In other words, they can make libraries with genomic inserts of defined size, and then sequence both ends of individual clones.
Following are 12 DNA sequence reads from six cloned fragments analyzed in a genome project. 1A and 1B represent the two end reads from clone 1, 2A and 2B the two end reads from clone 2, etc. Clones 1–4 were obtained from a library in which the genomic inserts are about 2 kb long, while the inserts in clones 5 and 6 are about 4 kb long. All of these sequences have their 5′ ends at the left and their 3′ ends at the right. To simplify your analysis, assume that these sequences together represent two genomic locations (loci; singular locus), each of which contains a (CT)n repeat, and that each of the 12 sequences overlaps with one and only one other sequence.
1A: CCGGGAACTCCTAGTGCCTGTGGCACGATCCTATCAAC
1B: AGGACTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCT
2A: GTTTTTGAGAGAGAGAGAGAGAGAGAGAGACCTGGGGG
2B: ACGTAGCTAGCTAACCGGTTAAGCGCGCATTACTTCAA
3A: CTCTCTCTCTCTCTCTCTCTCAAAAACTATGGAAATTT
3B: TAGTGATAGGTAACCCAGGTACTGCACCACCAGAAGTC
4A: GGCCGGCCGTTGTTGACGCAATCATGAATTTAATGCCG
4B: TCATGGGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA
5A: TAGTGCCTGTGGCACGATCCTATCAACTAACGACTGCT
5B: AAGGAAAGGCCGGCCGTTGTTGACGCAATCATGAATTT
6A: CAGCAGCTAGTGATAGGTAACCCAGGTACTGCACCACC
6B: GGACTATACGTAGCTAGCTAACCGGTTAAGCGCGCATT
a. | Diagram the two loci, showing the locations of the repetitive DNA and the relative positions and orientations of the 12 DNA sequence reads. |
b. | If possible, indicate how many copies of the CT repeating unit reside at either locus. |
c. | Are the data compatible with the alternative hypothesis that these clones actually represent two alleles of a single locus that differ in the number of CT repeating units? |
Trending nowThis is a popular solution!
Chapter 9 Solutions
Genetics: From Genes To Genomes (6th International Edition)
- You were going to sequence a rice DNA fragment whose sequence was only know at one end, as shown below. 5’ AAACGATCGAGTCGCATCCAAAATCGATACCC—unknown region 3’ TTTGCTAGCTCTGCGTAGGTTTTAGCTATGGG—unknown region After several tries, you obtained a beautiful sequencing image as shown here: The worked out well partially because you had designed a primer for sequencing the unknown region according to the following guideline: Tm is 55 – 60°C. Ensures primer had a appropriate melting temperature for PCR ans sequencing. The GC content of the primer is the same as the genome/template (rice = 60%, human/Drosophila = 45-50%). A same nucleotide cannot be more than 2 in a row, e.g. CCC, GGGGG, AAA. The secondary structure of the primer must be none or weak. No primer dimers (The primer anneals to itself). 3’ end is the most important: it should not end in A, preferably ends in GG, GC, CG or CC This website can help you design the primer: http://www.oligoevaluator.com/OligoCalcServlet…arrow_forwardIn a genome project, the following genomic DNA sequences were obtained. Assemble the sequences into a contig. Using the assembled sequence, perform a BLASTn search. Does the search produce sequences similar to your assembled sequence? 5’ TCGGGGTCCTGGGATCTCATCACTGCAGCGC 3’ 5’ACTGCAGCGCTTTCCCAGCGGGCGGTGGTAC 3’ 5’GGGCGGTGGTACTCGGGAAGTCAGGAGTGTT 3’ 5’AGGAGTGTTTAAAACCTGGGGACTGGTTTTG 3’ 5’TGGTTTTGGGGGCGCTGAAGGCAGCGCAGGA 3’arrow_forwardAssume 2x108 reads of 75 bps long are obtained from a next-generation sequencing experiment to sequence a human genome. Suppose the length of the human genome is 3x109 bps. What is the depth (i.e., coverage) of the sequencing?arrow_forward
- The following DNA sequences were used to generate a contig from a genome sequencing project. ttcagattttccccg gctaaagctccgaa gccattaacgcc tttagcatactacggcgtta aaaaccggggaaaat tccgaatcggtcattcaga How long is the fully assembled contig?arrow_forwardThe human genome contains approximately 106 copies of an Alusequence, one of the best-studied classes of short interspersed elements(SINEs), per haploid genome. Individual Alu units share a282-nucleotide consensus sequence followed by a 3@adenine@richtail region [Schmid (1998)]. Given that there are approximately3 * 10^9 base pairs per human haploid genome, about how manybase pairs are spaced between each Alu sequence?arrow_forwardWith a few exceptions, interspersed repetitive DNA in the human genome has no known biological function. Explain in a few sentences what interspersed means. Name and describe one interspersed repetitive element. Provide information on about how much of the human genome consists of this one repetitive element (copy number and/or percent of genome).arrow_forward
- Arabidopsis thaliana has among the smallest genomes in higher plants, with a haploid genome size of about 100 Mb. If this genome is digested with BbvCl, a restriction enzyme which cuts at the sequence CCTCAGC GGAGTCG 1. approximately how many DNA fragments would be produced? Assume the DNA has a random sequence with equal amounts of each base.arrow_forwardIn order to target a specific region of genomic DNA with CRISPR, researchers must include a guide RNA containing a 20-basepair long spacer sequence that matches the DNA sequence at the target site. (i) How many possible guide RNA spacer sequences are there? (ii) One of the possible risks of genetic engineering methods is “off-target” editing, where a modification of the genome occurs in a part of the genome other than the target site. Imagine you design a 20-basepair guide RNA spacer sequence to target a specific portion of the Zebrafish genome, which is 1.7 billion nucleotides long. Assuming all nucleotides are equally common, estimate the probability that your spacer sequence occurs in at least one other position in the Zebrafish genome.arrow_forwardAbout 60% of the base pairs in the human genome are AT. If the human genome has 3.2 billion base pairs of DNA, about how many times will the following restriction sites be present? a. BamHI (recognition sequence is 5′–GGATCC–3′) b. EcoRI (recognition sequence is 5′–GAATTC–3′) c. HaeIII (recognition sequence is 5′–GGCC–3′)arrow_forward
- The restriction endonuclease NciI recognizes and cuts the five-base-pair sequence 5’- CC(G/C)GG-3’ [where (G/C) means either G or C will work at that position]. (1) How often, on average, would this sequence occur in random DNA? Assume the DNA contains 25% each of A, G, T & C. (2) After digestion, Nci1 leaves a one-base 5’ overhang. Write/draw the cut site/digested products.arrow_forwardWhen the cDNA was sequenced by the Sanger method utilizing ddCTP, the following products were obtained: Tetranucleotide Hexanucleotide Nonanucleotide Decanucleotide Dodenucleotide Octadecanucleotide Nonadecanucleotide 21-nucleotide 6c. What is the sequence of the bases in the mRNA coding for the peptide above? Thearrow_forwardThe human genome (3.4Gb) would be 2.3 metres long if stretched linearly. In not more than 200 words, explain how a genome of this size is fit into a cell if minuscule proportionsarrow_forward
- Biology: The Dynamic Science (MindTap Course List)BiologyISBN:9781305389892Author:Peter J. Russell, Paul E. Hertz, Beverly McMillanPublisher:Cengage Learning