DeepTrio training data

WGS models

version Replicates #examples
Child model    
1.1.0 4 HG001/NA12891/NA12892 trios
7 HG005/HG006/HG007 trios
3 HG002/HG003/HG004 trios
566,589,652(1)
1.2.0 (Same model as 1.1.0)  
1.3.0 (Same model as 1.1.0)  
1.4.0 4 HG001/NA12891/NA12892 trios
7 HG005/HG006/HG007 trios
3 HG002/HG003/HG004 trios
704,228,446
1.5.0 (6)4 HG001, 3 HG002, 3 HG003, 3 HG004, 7 HG005, 6 HG006, 6 HG007, 4 NA12891, 4 NA12892 704,228,358
Parent model    
1.1.0 7 HG005/HG006/HG007 trios
3 HG002/HG003/HG004 trios
315,847,934
1.2.0 (Same model as 1.1.0)  
1.3.0 (Same model as 1.1.0)  
1.4.0 7 HG005/HG006/HG007 trios
3 HG002/HG003/HG004 trios
457,374,516
1.5.0 (6)3 HG002, 3 HG003, 3 HG004, 7 HG005, 6 HG006, 6 HG007 457,374,464

WES models

version Replicates #examples
Child model    
1.1.0 27 HG001/NA12891/NA12892 trios
6 HG005/HG006/HG007 trios
7 HG002/HG003/HG004 trios
18,002,596
1.2.0 (Same model as 1.1.0)  
1.3.0 (Same model as 1.1.0)  
1.4.0 27 HG001/NA12891/NA12892 trios
6 HG005/HG006/HG007 trios
6 HG002/HG003/HG004 trios
27,776,416
1.5.0 (6)9 HG001, 7 HG002, 7 HG003, 7 HG004, 8 HG005, 8 HG006, 8 HG007, 9 NA12891, 9 NA12892 27,791,954
Parent model    
1.1.0 6 HG005/HG006/HG007 trios
6 HG002/HG003/HG004 trios
4,131,018
1.2.0 (Same model as 1.1.0)  
1.3.0 (Same model as 1.1.0)  
1.4.0 6 HG005/HG006/HG007 trios
6 HG002/HG003/HG004 trios
13,036,995
1.5.0 (6)6 HG002, 6 HG003, 6 HG004, 8 HG005, 8 HG006, 8 HG007 13,036,998

PACBIO models(2)(3)

version Replicates #examples
Child model    
1.1.0 1 HG005/HG006/HG007 trio
8 HG002/HG003/HG004 trios
397,610,700
1.2.0 1 HG005/HG006/HG007 trio
8 HG002/HG003/HG004 trios
406,893,180(4)
1.3.0 2 HG005/HG006/HG007 trio
10 HG002/HG003/HG004 trios
539,382,124(5)
1.4.0 (Same model as 1.3.0)  
Parent model    
1.1.0 1 HG005/HG006/HG007 trio
8 HG002/HG003/HG004 trios
386,418,918
1.2.0 1 HG005/HG006/HG007 trio
8 HG002/HG003/HG004 trios
392,749,204(4)
1.3.0 2 HG005/HG006/HG007 trio
10 HG002/HG003/HG004 trios
533,353,050(5)
1.4.0 (Same model as 1.3.0)  

(1): We include HG002/HG003/HG004 for training WGS model, but only using examples from the region of NIST truth confident region v4.2 subtracting v3.3.2.

(2): We use the entire HG002/HG003/HG004 trio for PacBio model training.

(3): PacBio training data contains training examples with haplotag sorted images and unsorted images.

(4): In v1.2.0, we updated the NIST truth versions we used for training.

(5): In v1.3.0, we included PacBio Sequel II Chemistry v2.2 data in the training dataset. And we updated to NIST truth version to v4.2.1.

(6): Starting in v1.5.0, for clarity, we report the number of unique BAM files used. Note that this doesn’t mean all the trios were paired together to produce training data.