Tuesday, June 14, 2016

The IGS Implementation of BOLT

Bruce Golden
Theta Solutions

Over the last 50 years we have had evolution of the statistical methods used to calculate genetic predictions, EPDs, for livestock. What drove the evolution of these methods? Knowledge of statistical models? New methods? Data? Enabling computer technology? Golden states that he believes the drive for better models has been a desire to increase the accuracy of prediction.

Golden and Garrick had written grants to write genetic prediction software in the past. This avenue appears to have dried up, so they decided to start a company, Theta Solutions, in order to fund the development of genetic prediction. The latest genetic prediction runs contained 46,000 animals with genomic data.

Theta Solutions uses graphical processing units, originally built for video gaming, to have a high performance computer at a relatively low cost. The BOLT software focuses on custom turnkey analyses, once the system is set up all one needs to do is feed it data.

Using non-GPU computing, Golden can solve 51 million equations in 1649 seconds. The fastest GPU implementation took 78 seconds.

Why do we use a Bayesian sampler for solving mixed models?

  • No accuracy approximation bias 
  • Can get PE covariance
  • Can apply marker selection methods
  • Can include prior information

With traditional methods, it took 23 seconds per sample, with new implementation can do a sample in 2 seconds. (Gibbs sampling is kind of like turning a statistical crank over and over to solve very complex equations, each sample is one turn of the crank.) They also parallelized the sampling, further speeding up the process. This parallelized processing is like working cattle with 100s of chutes rather than a single cute.

There are three ways to combine genomics with traditional EPDs,

  • blending Genomic BLUP (combine pedigree prediction with genomic prediction, two separate analyses)
  • single-step Genomic BLUP (combine pedigree relationships and genomic relationships, one analysis)
  • hybrid model (single step with marker effects)

Single-step genomic models outperform traditional EPDs. But, the hybrid model outperforms both models, especially for unproven animals. The purpose of the hybrid model is to squeeze more information out of the data.

Currently looking at a data set with 6 million pedigree records, 4.8 million birth weight records, and 1.9 million post weaning gain records, 46,402 genotyped animals and used 44,414 SNP markers.

Hybrid models allow

  • Marker selection models
  • multiple components i.e. maternal effects
  • Multiple traits different markers for different traits
  • Extral polygenic effects
  • MSRP approach (identifying SNPs with effects across traits and breeds)
IGS analysis enhancements and refinements
  • Superior marker effects model
  • Superior accuracy computation
  • New stayability approach
  • New breed effects model
  • Carcass traits solved together with birth weight
  • New method for external EPDs
Decker's Take Home Message
The use of a hybrid model is simply improving methods for computing EPDs. These new predictions will be looked at and scrutinized by many sets of eyes. The breed associations and their partners know how important accuracy and reliability are. 

You may understand very little about this post. There are no boogie men or tricks with this method. It is simply a better way to estimate accurate EPDs from data. 
Post a Comment