hands-on-session-2.1-bcftools

pdf

School

Florida State University *

*We aren’t endorsed by this school

Course

II

Subject

Computer Science

Date

Oct 30, 2023

Type

pdf

Pages

18

Uploaded by faisalmd929

Report
Hands-on Session 3: bcftools Xian Mallory
bcftools A program for variant calling and manipulating files in the Variant Call Format (VCF) and its binary counterpart BCF . All commands work transparently with both VCFs (uncompressed) and BCFs ( BGZF-compressed ). Manual for bcftools https://samtools.github.io/bcftools/bcftools.html Examples of how to use bcftools can be found in http://samtools.github.io/bcftools/howtos/index.html
Installation This webpage shows how to install the software: http://samtools.github.io/bcftools/howtos/install.html RCC already installed bcftools. Use `module load gnu` to load it if `which bcftools` does not locate the bcftools that you are using.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Usage of bcftools Extracting information from bcftools ( https://samtools.github.io/bcftools/howtos/query.html ) bcftools query Filtering ( https://samtools.github.io/bcftools/howtos/filtering.html ) bcftools query –e’FILTER=“.”’ Viewing a vcf file ( https://samtools.github.io/bcftools/bcftools.html#view ) bcftools view Intersection of two vcf files ( https://samtools.github.io/bcftools/bcftools.html#isec ) bcftools isec Merging two or more vcf files ( https://samtools.github.io/bcftools/bcftools.html#merge ) bcftools merge
Data preparation (1) Login to RCC, and run the following command to copy the data to your home directory. cp -r /gpfs/research/scratch/xfan2/bcftoolsSession ./ In your folder (~/bcftoolsSession/), you can find a file named chr22.vcf.gz ”, which is a vcf files containing variants from hundreds of samples (all healthy genomes) on chr22. “chr22.vcf.gz” has a corresponding index file, named “chr22.vcf.gz.csi”. In the next few slides, we will try to query and view “chr22.vcf.gz” from a variety of perspectives.
bcftools query for extracting information Extracting the samples ( -l ) bcftools query -l chr22.vcf.gz | less How many samples are in this file?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
bcftools query for extracting information (con’d) Extracting the positions ( -f : specifying format) bcftools query -f '%POS\n' chr22.vcf.gz | less Extracting the positions with a certain format bcftools query -f 'pos=%POS\n' chr22.vcf.gz | less Extracting the chromosome, positions, reference and alternative base(s) bcftools query -f '%CHROM %POS %REF %ALT\n' chr22.vcf.gz | less
bcftools query for filtering Only output those lines whose reference value is not “A” ( -e , --exclude) bcftools query -e'REF="A"' -f'%CHROM %POS %REF\n' chr22.vcf.gz | less
bcftools query for filtering (con’d) Only output those lines whose QUAL > 20 ( -i , --include) bcftools query -i'QUAL>20' -f'%CHROM %POS %QUAL\n' chr22.vcf.gz | less
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
bcftools view for outputting a certain region To use bcftools view, we need to index the vcf file first. Notice that your folder already contains the index file (.csi) as it takes time to run this command. An alternative way to create an index file is “tabix chr22.vcf.gz”, which would create a .tbi file. bcftools index chr22.vcf.gz “bcftools view” can select a certain region like “samtools view” does. bcftools view -H chr22.vcf.gz 22:16000000-17000000 | less Use –S in “less” command to switch the view to one row per line. -H here means no header.
bcftools view for viewing header only (-h) bcftools view -h chr22.vcf.gz | less If no –H and –h , header will be printed together with the data section.
bcftools view for viewing selected sample(s) only (-s) bcftools view -H -s NA12878 chr22.vcf.gz 22:16000000-17000000 | less
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Now let’s switch to a different data set for testing ` bcftools isec ` and ` bcftools merge `.
Data preparation (2) In your folder (~/ bcftoolsSession/), you will find the vcf files for a family, father (HG003) , mother (HG004) and son (HG002) , for the variants that come only from chr22. This is a dataset publicly available and from Ashkenazim trio who are Jews. For each sample, you will have *.chr22.vcf.gz (compressed vcf file) and *.chr22.vcf.gz.csi (index file), in which * could be father, mother or son. Now let’s make a comparison between father and son first.
bcftools isec for comparing two vcf files (con’d) Given two vcf files, the following command can create the intersection and the complements between the variants from the two given vcf files. bcftools isec -p comp_ashtrio_father_son son.chr22.vcf.gz father.chr22.vcf.gz Go into the folder “comp_ashtrio_father_son” and examine the four files just created.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
bcftools isec for comparing two vcf files (con’d) In folder “comp_ashtrio_father_son”, there are four files, 0000.vcf variants private to son 0001.vcf variants private to father 0002.vcf variants from son but shared by both son and father 0003.vcf variants from father but shared by both son and father In 0002.vcf, the column after FORMAT shows the data from son only. In 0003.vcf, the column after FORMAT shows the data from father only. Note : the entries from father and son are not merged yet in the files created by `bcftools isec`.
bcftools merge for merging two vcf files Now we do the merge. bcftools merge -O z -o father.son.chr22.vcf.gz father.chr22.vcf.gz son.chr22.vcf.gz Here “-o father.son.chr22.vcf.gz” specifies that the output file is “father.son.chr22.vcf.gz”. “–O z” specifies that the output file (father.son.chr22.vcf.gz) should be in a compressed vcf format. In the merged file, check the genotypes for chr22:15271151, the one that was reported by both 0002.vcf and 0003.vcf by `bcftools isec` (shared by father and son). Also check this position for father.chr22.vcf.gz and son.chr22.vcf.gz, respectively.
bcftools merge for merging multiple vcf files (con’d) Merge vcf files from father, mother and son, so that there will be extra three columns after the FORMAT column. bcftools merge -O z -o trio.chr22.vcf.gz father.chr22.vcf.gz mother.chr22.vcf.gz son.chr22.vcf.gz Now in the merged file, check the genotypes for chr22:15271151 again. You can see that the second column after the FORMAT column, (corresponding to mother’s) does not report anything. Is this situation biologically possible? What does this indicate?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help

Browse Popular Homework Q&A

Q: What are the types of static testing tools?
Q: Define the following operation, called BOTH, on languages: BOTH(A, B) = {w | w ≤ A and w ¤ B} E…
Q: Were the early Puritans of New England tolerant? Explain.
Q: How many moles of Agl will be formed when 75.0 mL of 0.300 M AgNO3 is completely reacted according…
Q: 6.3x-2y + 2z = -2 x+6y-2z = -2 x+2y=0
Q: Consider a loop branch that branches nine times in a row, then is not taken once. What is the…
Q: Evaluate the following definite integrals: i. ∫0  -1  (xe^(-x2+2)) dx  ii.∫1   0  (x^4+7e^x− 3) dx
Q: A salesman sold ​$250 more than the rest of the sales staff. If the sales total for the day was…
Q: 1. Draw a sketch of the square and cube functions. What are the similarities and differences in the…
Q: 1. For the structural diagram below find the reactions at points A & B PIN 20k.ft ZOK 30° | 3 — 3 4K…
Q: 7/8x - 12 = -1/8x - 2
Q: According to Masterfoods, the company that manufactures M&M’s, 12% of peanut M&M’s are brown, 15%…
Q: Arden wants to invest in a fashion brand. What discount rate should she use to compute the PV of her…
Q: Were the early Puritans of New England tolerant? Yes or No and Explain.
Q: The figure shows a rectangular array of charged particles fixed in place, with distance a = 46.1 cm…
Q: A bag contains four red marbles, two green ones, one trans- parent one, three yellow ones, and two…
Q: 1. Name these compounds: a) b) (C6H5)2CHCH₂CH₂CH HÅOH (C6H3)2CHCH₂CH₂CONa+ c) d)…
Q: If the linear equation graphed below was shifted three units to the left, what would be the eq CO -5…
Q: Due to ageing of a pipeline, its carrying capacity has decreased by 25%. The corresponding increase…
Q: 22. A bowl contains twenty cherries, exactly fifteen of which have had their stones removed. A…
Q: Nuthatch Corporation began its operations on September 1 of the current year. Budgeted sales for the…
Q: Write a in the form a = a-T+anN at the given value of t without finding T and N. r(t) = (3t + 3)i +…