Concept explainers
Eukaryotic genomes are replete with repetitive sequences that make genome assembly from sequence reads difficult. For example, sequences such as CTCTCTCTCT .(tandem repeats of the dinucleotide sequence CT) are found at many chromosomal locations, with variable numbers (n) of the CT repeating unit at each location. Scientists can assemble genomes despite these difficulties by using the paired-end sequencing strategy diagrammed in Fig. 9.9. In other words, they can make libraries with genomic inserts of defined size, and then sequence both ends of individual clones.
Following are 12 DNA sequence reads from six cloned fragments analyzed in a genome project. 1A and 1B represent the two end reads from clone 1, 2A and 2B the two end reads from clone 2, etc. Clones 1–4 were obtained from a library in which the genomic inserts are about 2 kb long, while the inserts in clones 5 and 6 are about 4 kb long. All of these sequences have their 5′ ends at the left and their 3′ ends at the right. To simplify your analysis, assume that these sequences together represent two genomic locations (loci; singular locus), each of which contains a (CT)n repeat, and that each of the 12 sequences overlaps with one and only one other sequence.
1A: CCGGGAACTCCTAGTGCCTGTGGCACGATCCTATCAAC
1B: AGGACTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCT
2A: GTTTTTGAGAGAGAGAGAGAGAGAGAGAGACCTGGGGG
2B: ACGTAGCTAGCTAACCGGTTAAGCGCGCATTACTTCAA
3A: CTCTCTCTCTCTCTCTCTCTCAAAAACTATGGAAATTT
3B: TAGTGATAGGTAACCCAGGTACTGCACCACCAGAAGTC
4A: GGCCGGCCGTTGTTGACGCAATCATGAATTTAATGCCG
4B: TCATGGGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA
5A: TAGTGCCTGTGGCACGATCCTATCAACTAACGACTGCT
5B: AAGGAAAGGCCGGCCGTTGTTGACGCAATCATGAATTT
6A: CAGCAGCTAGTGATAGGTAACCCAGGTACTGCACCACC
6B: GGACTATACGTAGCTAGCTAACCGGTTAAGCGCGCATT
a. | Diagram the two loci, showing the locations of the repetitive DNA and the relative positions and orientations of the 12 DNA sequence reads. |
b. | If possible, indicate how many copies of the CT repeating unit reside at either locus. |
c. | Are the data compatible with the alternative hypothesis that these clones actually represent two alleles of a single locus that differ in the number of CT repeating units? |
Trending nowThis is a popular solution!
Chapter 9 Solutions
Genetics: From Genes to Genomes
- You were going to sequence a rice DNA fragment whose sequence was only know at one end, as shown below. 5’ AAACGATCGAGTCGCATCCAAAATCGATACCC—unknown region 3’ TTTGCTAGCTCTGCGTAGGTTTTAGCTATGGG—unknown region After several tries, you obtained a beautiful sequencing image as shown here: The worked out well partially because you had designed a primer for sequencing the unknown region according to the following guideline: Tm is 55 – 60°C. Ensures primer had a appropriate melting temperature for PCR ans sequencing. The GC content of the primer is the same as the genome/template (rice = 60%, human/Drosophila = 45-50%). A same nucleotide cannot be more than 2 in a row, e.g. CCC, GGGGG, AAA. The secondary structure of the primer must be none or weak. No primer dimers (The primer anneals to itself). 3’ end is the most important: it should not end in A, preferably ends in GG, GC, CG or CC This website can help you design the primer: http://www.oligoevaluator.com/OligoCalcServlet…arrow_forwardHuman genomic libraries used for DNA sequencing are often made from fragments obtained by cleaving human DNA with Haeiii in such a way that the DNA is only partially digested; that is, not all the possible HaeIII sites have been cleaved. What is a possible reason for doing this?arrow_forwardIn addition to the standard base-paired helical structures, DNA can form X-shaped hairpin structures called cruciforms in which most bases are involved in Watson–Crick pairs. Such structures tend to occur at sequences with inverted repeats. Draw the cruciform structure formed by the DNA sequence TCAAGTCCACGGTGGACTTGC.arrow_forward
- Assume 2x108 reads of 75 bps long are obtained from a next-generation sequencing experiment to sequence a human genome. Suppose the length of the human genome is 3x109 bps. What is the depth (i.e., coverage) of the sequencing?arrow_forwardIn a genome project, the following genomic DNA sequences were obtained. Assemble the sequences into a contig. Using the assembled sequence, perform a BLASTn search. Does the search produce sequences similar to your assembled sequence? 5’ TCGGGGTCCTGGGATCTCATCACTGCAGCGC 3’ 5’ACTGCAGCGCTTTCCCAGCGGGCGGTGGTAC 3’ 5’GGGCGGTGGTACTCGGGAAGTCAGGAGTGTT 3’ 5’AGGAGTGTTTAAAACCTGGGGACTGGTTTTG 3’ 5’TGGTTTTGGGGGCGCTGAAGGCAGCGCAGGA 3’arrow_forwardWhy do geneticists studying eukaryotic organisms often construct cDNA libraries, whereas geneticistsstudying bacteria almost never do? Why might bacterial geneticists have difficulties constructing cDNA libraries even if they wanted to?arrow_forward
- You know from your calculationsthat only a small proportion of the human genome is represented, even when the entire class results areconsidered. Therefore, the chance of finding a particular single-copy gene in your library is very small.Outline a strategy for constructing a genomic DNA library more representative of the entire humangenome. You will need to consider alternative vectors and the efficiency of transformation of thebacterial cells.arrow_forwardThe following figure shows a screen shot from the UCSC Genome Browser, focusing on a region of the human genome encoding a gene called MFAP3L. (Note hg38 refers to version 38 of the human genome RefSeq)a. Describe in approximate terms the genomic location of MFAP3L.b. Is the gene transcribed in the direction from the centromere-to-telomere or from the telomere-to-centromere?c. How many alternative splice forms of MFAP3L mRNA are indicated by the data?d. How many different promoters for MFAP3L are suggested by the data?arrow_forwardThe following figure shows a screen shot from the UCSC Genome Browser, focusing on a region of the human genome encoding a gene called MFAP3L. (Note hg38 refers to version 38 of the human genome RefSeq)a. Describe in approximate terms the genomic location of MFAP3L.b. Is the gene transcribed in the direction from the centromere-to-telomere or from the telomere-to-centromere?c. How many alternative splice forms of MFAP3L mRNA are indicated by the data?d. How many different promoters for MFAP3L are suggested by the data? (please do not copy and paste the answer from below. i don't think it is correct. a. MFAP3L is mostly found in the nucleus in the genome. It is found on chromosome 4 reverse strand. The protein produced by the gene is found in the cell membrane, and it is positioned on the membrane with the carboxyl side of the protein facing the cytosol. b. The MFAP3L gene is transcribed from the telomere to the centromere. c. According to the data, there are 11 different splice forms…arrow_forward
- Shown below are several next-generation sequencing reads from a sample you have. Which of the following is the most likely candidate for the original linear piece of DNA present in the sample that created the sequence reads shown below? 5' GGGCATTA 3' 5' TACGAACA 3' 5' ATACCGGGC 3' 5' ACCGTACG 3' 5' AACATACC 3' Question 4 options: 5' ATTAACCGTACGAACATACCGGGC 3' 5' GGGCATTATACGAACAATACCGGGC 3' 5' ACCGGGCATTAACCGTACGAACAT 3' 5' AACATACCGGGCATTAACCGTACG 3' 5' GGCATTAACCGTACGAACATACCG 3' 5' ACCGTACGAACATACCGGGCATTA 3'arrow_forwardIf you compare the frequency of the sixteen possible dinucleotide sequences in the E. coli and human genomes, there are no striking differences except for one dinucleotide, 5ʹ-CG-3ʹ. The frequency of CG dinucleotides in the human genome is significantly lower than in E. coli and significantly lower than expected by chance. Why do you suppose that CG dinucleotides are underrepresented in the human genome? (hint: The C in the CG pair is often methylated). Explain how this observation has an impact on the cells immune response.arrow_forwardBelow is a sequence of DNA. 5'-ttaccgataattctctctcccctcttccatgattctgattaaagaaggcgagaacgaaactatttgttaatacc-3' How many "reading frames" can be identified for this sequence? How many "open reading frames" can be identified for this sequence? What is the frame of the longest ORF?arrow_forward
- Biology: The Dynamic Science (MindTap Course List)BiologyISBN:9781305389892Author:Peter J. Russell, Paul E. Hertz, Beverly McMillanPublisher:Cengage Learning