Skip to content.
CCB > CCBSIGS > SigFlow > PipelineWorkflows > PipelineWorkflows_BioinfoMAQ

Bioinformatics/Genomics Pipeline Workflows: Step I Analysis: Mapping and Assembly with Qualities (MAQ), SAMtools, Bowtie, CNVer Workflows

Overview

This page contains the first step of a genomics data analysis protocol designed and implemented by Federica Torri, Fabio Macciardi and IvoDinov to process large number of sequence data outputted by the Illumina sequencing pipeline. See Step II analysis (GATK/QC/Cleaning) here.

This protocol is implemented using the LONI Pipeline environment and includes the following types of computational resources:

Sequence Analysis Protocol Outline

Alignment and Assembly

  • Conversion of solexa fastq in sanger fastq format
    • Input: fastq reads files output of Illumina sequencing pipeline (sequence.txt files)
    • Tool: Mapping and Assembly with Qualities (MAQ) (sol2sanger option):
    • LONI INstallation: /ifs/ccb/CCB_SW_Tools/BioinformaticsGenetics/MAQ_BWA_2010
    • Output: sequence.fastq file
  • Conversion of fastq in a binary fastq file (bfq)
    • Input: sequence.fastq file
    • Tool: MAQ (fastq2bfq option)
    • Output: sequence.bfq file
  • Conversion of the reference genome (fasta format) in binary fasta
    • Input: reference.fasta file (to perform the alignment)
    • Tool: MAQ (fasta2bfa option)
    • Output: reference.bfa file
  • Alignment to a reference genome
    • Input: sequence.bfq, reference.bfa
    • Tool: MAQ (map option)
    • Output: alignment.map file
  • Conversion map file to bam file
    • Index the reference genome
      • Input: reference.fa
      • LONI Installation: /ifs/ccb/CCB_SW_Tools/BioinformaticsGenetics/samtools/samtools-0.1.10/samtools
      • Tool: samtools (faidx option)
      • Output: reference.fai
    • MAQ2SAM
      • Input: alignment.map file
      • Tool: samtools (maq2sam-long option)
      • Output: alignment.sam file
    • SAM to full BAM
      • Input: alignment.sam, reference.fai file
      • Tool: samtools (view -bt option)
      • Output: alignment.bam file
    • Remove duplicated reads
      • Input: alignment.bamfile
      • Tool: samtools (rmdup)
      • Output: alignment.NODUPS.bam file
    • Sort .bam
      • Input: alignment.NODUPS.bam file
      • Tool: samtools (sort option)
      • Output: : alignment.NODUPS.sorted.bam file
    • MD tag
      • Input: alignment.NODUPS.sorted.bam file
      • Tool: samtools (calmd option)
      • Output: : alignment.NODUPS.sorted.calmd.bam file
    • Indexing the .bam file
      • Input: alignment.NODUPS.sorted.calmd.bam file
      • Tool: samtools (index option)
      • Output: alignment.NODUPS.sorted.calmd.bam.bai file

Copy Number Variants (CNV) CALLING: three different path

  • ERDS/SVA path
    • Input: alignment.NODUPS.sorted.calmd.bam (see STEP#5e)
    • Tool: samtools and erds combined
    • Output: .gsap file (visualization of CNVs in Sequence Variant Analyzer)
    • Need Clarification Here!!! /ifs/ccb/CCB_SW_Tools/others/Bioinformatics/ERDS_2010/erds1.01 – how to use this to generate *.gsap file???
  • BOWTIE/CNVer/SAVANT path
    • BOWTIE alignment
      • Need Clarification Here!!! /ifs/ccb/CCB_SW_Tools/others/Bioinformatics/Bowtie_CrossBow_2010/bowtie-0.12.7
      • Input: sequence.fastq file (see STEP#1 Output)
      • Tool: bowtie
      • Output: alignment.bowtie file
    • CNVer CNV call
      • /ifs/ccb/CCB_SW_Tools/others/Bioinformatics/CNVer_2010
      • Input: alignment.bowtie file
      • Tool: CNVer
      • Output: .cnv file
    • Visualization: Is visualization necessary? What are the computational/processing steps here?
    • Coverage track production
      • Input: alignment.NODUPS.sorted.calmd.bam (see STEP#5e)
      • Tool: SAVANT genome browser
      • Output: alignment.genome.cov.bam file
    • Formatting the cnv file in cnv.bed (CNV visualization)
      • Input: .cnv file
      • Tool: SAVANT genome browser
      • Output: .cnv.bed project
    • Visualization
      • Input: alignment.NODUPS.sorted.calmd.bam, alignment.genome.cov.bam, .cnv.bed files
      • Tool: SAVANT genome browser
      • Output: .savant project

CNVseq path

  • STEP#1
    • Input: alignment.NODUPS.sorted.calmd.bam (see STEP#5e)
    • Tool: samtools (view option)
    • Output: .hits file
  • STEP#2
    • Input: .hits file
    • Tool: CNVseq+R
    • Output: .hits.cnv file

Detailed Workflow Usage & Specifications

Describe the Complete Pipeline Graphical Workflow!!!
MAQ_SAMtools_Bowtie_Integrated_112310.png MAQ_SAMtools_Bowtie_Integrated_Cranium.png MAQ_SAMtools_Bowtie_Integrated_Cranium_G.png

MAQ References

Acknowledgments

  • These workflows are designed, developed, tested and validated by a number of investigators at UCLA LONI (IvoDinov, AlenZamanyan, AlexGenco, LONI Pipeline Team, ArthurToga); UCI Human Genetics (Fabio Macciardi, Federica Torri, Harry Mangalam); USC/ZNI (Andrew Clark, Jim Knowles) and USC Epigenome Center (Ben Berman, Zack Ramjan); and BIRN (Joseph Ames, John Nylander, Ravi Madduri, Carl Kesselman).
  • This work was funded in part by the National Institutes of Health through Grants U54 RR021813, P41 RR013642, R01 MH71940, U24-RR025736, U24-RR021992, U24-RR021760 and U24-RR026057.
  • Members of the Laboratory of Neuro Imaging (LONI), the Biomedical Informatics Research Network (BIRN), the National Centers for Biomedical Computing (NCBC) and Clinical and Translational Science Award (CTSA) investigators, NIH Program officials, and many general users have contributed with beta-testing the Pipeline and for providing useful feedback about its state, functionality and usability.

See also