fq2SortedBAM: OpenOmics’ Genomics Secondary Analysis Pipeline
fq2SortedBAM: OpenOmics’ Genomics Secondary Analysis Pipeline
Overview:
The pipeline takes input fastq files and produces sorted BAM file through the following stages:
- Sequence Alignment: bwa-mem2 for short reads, mm2-fast (Accelerated Minimap2) for long reads (PacBio, ONT)
- SAMSort (Using SAMTools)
Modes:
fq2SortedBAM supports 4 different modes:
sortedbam
: It takes fastq reads files and reference genome as input and outputs sorted BAM fileflatmode
: It takes fastq reads files and reference genome as input, and outputs multiple (equal to the number of ranks created) unsorted SAM filesfqprocessonly
: Custom mode, not for general usemultifq
: Custom mode, not for general use
Use Docker
Docker build:
wget https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/releases/download/3.0/Source_code_with_submodules.tar.gz
tar -xzf Source_code_with_submodules.tar.gz
cp Open-Omics-Acceleration-Framework/pipelines/fq2sortedbam/Dockerfile .
cp Open-Omics-Acceleration-Framework/pipelines/fq2sortedbam/config.yaml <inputdir>
docker save fq2bam:latest > fq2bam.tar ## this step is optional
docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -t fq2bam .
Docker run:
docker load -i fq2bam.tar ## optional, if the image is build on the same machine or is already loaded
docker run -v <readsdir>:/readsdir -v <outdir>:/outdir -v <refdir>:/refdir fq2bam:latest python run_fq2sortedbam.py --ref /refdir/<reference> --reads /readsdir/<read1> /readsdir/<read2> --output /outdir/<outBAMfile>
Note:
<readsdir>: Location of the local directory containing read files read1 & read2
<refdir>: Location of the local directory containing reference sequence file ref
<outdir>: Location of the local directory for output files SAM/BAM
Use Source Code
Installation:
wget https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/releases/download/3.0/Source_code_with_submodules.tar.gz
tar -xzf Source_code_with_submodules.tar.gz
cd Open-Omics-Acceleration-Framework/pipelines/fq2sortedbam/
bash install.sh <onprem/cloud> ## onprem mode: Manually install the depenendies present in basic_setup_ubuntu.sh as it needs sudo access
Run:
python run_fq2sortedbam.py --ref <reference> --reads <read1> <read2> --output <outBAMfile>
General Notes:
- Individual pipeline tools are present in applications folder
- To understand various parameters to these tools, you can access their
man
page - You can setup the parameters of these tools using
params
commandline option - Understand all the parameters to fq3sortedbam using “-h” option
- fq2sortedbam supports the following aligners:
a. DNA short read alignment using bwa-mem2
b. DNA long read alignment using mm2-fast (minimap2)
c. RNA short read alignment using STAR aligner
d. bwa-meth based alignment