Runtime and accuracy metrics for all release models

WGS (Illumina)

Runtime

Runtime is on HG003 (all chromosomes).

Stage Time (minutes)
make_examples ~103m
call_variants ~185m
postprocess_variants (with gVCF) ~48m
total ~336m = ~5.6 hours

Accuracy

hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held out while training.

Type TRUTH.TP TRUTH.FN QUERY.FP METRIC.Recall METRIC.Precision METRIC.F1_Score
INDEL 501715 2786 1188 0.994478 0.997733 0.996103
SNP 3306844 20652 4262 0.993794 0.998713 0.996247

See VCF stats report.

WES (Illumina)

Runtime

Runtime is on HG003 (all chromosomes).

Stage Time (minutes)
make_examples ~6m
call_variants ~1m
postprocess_variants (with gVCF) ~1m
total ~8m

Accuracy

hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held out while training.

Type TRUTH.TP TRUTH.FN QUERY.FP METRIC.Recall METRIC.Precision METRIC.F1_Score
INDEL 1019 32 10 0.969553 0.990467 0.979898
SNP 24981 298 49 0.988212 0.998043 0.993103

See VCF stats report.

PacBio (HiFi)

Runtime

Runtime is on HG003 (all chromosomes).

Stage Time (minutes)
make_examples ~154m
call_variants ~201m
postprocess_variants (with gVCF) ~56m
total ~411m = ~6.85 hours

Accuracy

hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held out while training.

Starting from v1.4.0, users don’t need to phase the BAMs first, and only need to run DeepVariant once.

Type TRUTH.TP TRUTH.FN QUERY.FP METRIC.Recall METRIC.Precision METRIC.F1_Score
INDEL 501629 2872 2771 0.994307 0.994725 0.994516
SNP 3324633 2862 1852 0.99914 0.999444 0.999292

See VCF stats report.

ONT_R104

Runtime

Runtime is on HG003 ultra-long reads (all chromosomes).

Stage Time (minutes)
make_examples ~782m
call_variants ~266m
postprocess_variants (with gVCF) ~67m
total ~1115m = ~18.58 hours

Accuracy

hap.py results on HG003 ultra-long reads (all chromosomes, using NIST v4.2.1 truth), which was held out while training.

Type TRUTH.TP TRUTH.FN QUERY.FP METRIC.Recall METRIC.Precision METRIC.F1_Score
INDEL 444208 60293 42612 0.88049 0.915553 0.897679
SNP 3320812 6683 9294 0.997992 0.99721 0.997601

See VCF stats report.

Hybrid (Illumina + PacBio HiFi)

Runtime

Runtime is on HG003 (all chromosomes).

Stage Time (minutes)
make_examples ~150m
call_variants ~178m
postprocess_variants (with gVCF) ~41m
total ~369m = ~6.15 hours

Accuracy

Evaluating on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held out while training the hybrid model.

Type TRUTH.TP TRUTH.FN QUERY.FP METRIC.Recall METRIC.Precision METRIC.F1_Score
INDEL 503347 1154 2003 0.997713 0.996225 0.996968
SNP 3323945 3550 1535 0.998933 0.999539 0.999236

See VCF stats report.

How to reproduce the metrics on this page

For simplicity and consistency, we report runtime with a CPU instance with 64 CPUs This is NOT the fastest or cheapest configuration.

Use gcloud compute ssh to log in to the newly created instance.

Download and run any of the following case study scripts:

# Get the script.
curl -O https://raw.githubusercontent.com/google/deepvariant/r1.5/scripts/inference_deepvariant.sh

# WGS
bash inference_deepvariant.sh --model_preset WGS

# WES
bash inference_deepvariant.sh --model_preset WES

# PacBio
bash inference_deepvariant.sh --model_preset PACBIO

# ONT_R104
bash inference_deepvariant.sh --model_preset ONT_R104

# Hybrid
bash inference_deepvariant.sh --model_preset HYBRID_PACBIO_ILLUMINA

Runtime metrics are taken from the resulting log after each stage of DeepVariant. The runtime numbers reported above are the average of 5 runs each. The accuracy metrics come from the hap.py summary.csv output file. The runs are deterministic so all 5 runs produced the same output.