fq2vcf: OpenOmics Deepvariant based Variant Calling Pipeline
Overview:
OpenOmics’ fq2vcf is a highly optimized, distributed, deep learning-based short-read germline variant calling pipeline for x86 CPUs.
The pipeline comprises of:
- bwa-mem2 (a highly optimized version of bwa-mem) for sequence mapping
- SortSAM using samtools
- An optimized version of DeepVariant tool for Variant Calling
The following figure illustrates the pipeline:
<img src="https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/blob/main/images/deepvariant-fq2vcf.jpg"/a></br>
Using Dockerfile (Single Node)
1. Download the code :
git clone --recursive https://github.com/IntelLabs/Open-Omics-Acceleration-Framework.git
cd Open-Omics-Acceleration-Framework/pipelines/deepvariant-based-germline-variant-calling-fq2vcf/
2. Build the Docker Images
Part I: fq2bams
docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -t fq2bams -f Dockerfile_fq2bams .
Part II: bams2vcf
docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -t bams2vcf -f Dockerfile_bamsvcf .
3. Run the Dockers
Notes:
is expected to contain the bwa-mem2 index. You can index the reference during the run by enabling "--rindex" to fq2bams commandline.
```bash
docker run --volume :/refdir :/readsdir :/outdir fq2bams:latest python run_fq2bams.py --ref /refdir/ --reads /readsdir/ /readsdir/ --output /outdir/
docker run --volume :/refdir :/indir