Monday, December 4, 2017

AHA Educational Session 2017: Paving the Genetic Path

Dorian Garrick
Theta Solutions LLC

The Theta Solutions LLC is made up of Dr. Bruce Golden, Dr. Dorian Garrick, and Dr. Daniel Garrick. They have developed the BOLT software for genetic and genomic evaluations.

The American Hereford Association formed an advisory committee to check the new genetic evaluation system. The advisory committee looked at the process during development. The advisory committee included:

  • Joe Ellis
  • Jack Holder
  • Lee Haygood
  • Paul Bennett
  • Mitch Abrahamsen

Suppose we had 100 progeny (i.e. offspring) on 1 bull. You might look at that bull and decide you like him or you don’t like him. But, that bull is just an envelop that carries genetic information. What the bull looks like really doesn’t matter, what matters is what his progeny look like. The way to look at the genetic value, or breeding value, of the bull is to look at his offspring. But, there are lots of environmental effects that influence the performance of the offspring. One example is the age of the dam. We can adjust (i.e. correct) for these effects before we measure his genetic merit. If we had thousands of progeny on a bull, his progeny performance would be his genetic merit. We use a statistical process called BLUP to account for uncertainty in the prediction. This is what produces EPDs.
We have now shifted to using DNA information in animal breeding and genetic prediction. DNA is made of four different molecules, called bases. These four bases are represented by A, C, G, and T. When DNA is copied between parents and offspring, mistakes are made. For example, an A could be replaced with a G. There are proteins that go in and correct these errors. But, some errors still slip through. These are what we call new mutations. The most common mutations are single base pair changes, we call these SNPs.
We can look at associations between SNP types (AA, AB, or BB genotypes) and the predigree based EPD.  When we compare (i.e. regress) the EPD to the SNP genotype, we can estimate the effect of each SNP (which tags a chunk of DNA).
We can test thousands of SNPs at a time using a DNA test called a SNP chip. This SNP technology was developed for human medical research applications. In the BOLT software, they find the effect of thousands of SNPs, and then sum up the SNP effects to produce a genomic prediction.
The AHA has made improvements in genomic prediction between 2010 and 2012. Dr. Garrick pointed out that he called out Hereford breeders for not genotyping more animals. The Hereford breeders stepped up to the plate and genotyped more animals so GE-EPDs could be launched.
Now we had a problem. We had a genetic prediction based on pedigree data and genomic predictions based on DNA data. They could combine these two numbers using a weighted approach. More weight was put on the pedigree EPD if the animal had more progeny information.
We can have a model about how the performance is expressed. Cattle producers may talk about this in terms of breeding and feeding equals the performance. In statistical animal breeding, we use mixed models. We have the phenotype (performance) that is a function of environmental effects such as dam age or contemporary group, the animal effect which is the breeding value and then an unknown effect (the residual).
For a genomic prediction we have the SNP genotypes, the DNA variant effects (i.e. allele effects), and an extra effect that can’t be explained by the SNP DNA markers. This is the model for a genotyped animal. But what do we do for animals that aren’t genotyped?
We can use a process called imputation to infer the genotypes of animals that have not been DNA test based on relatives that have been genotyped. Imputation adds another term for the imputation uncertainty (or error).
In single-step BLUP, they combine genomic and pedigree information to measure relatedness. They get one number out of the evaluation, the estimated breeding value.
In the BOLT model, they explicitly model the SNP effects. This can allow them to better understand and diagnosis how the statistical model is working. This also allows them to create different models, using different SNPs, for different traits.
The BOLT software is based on the super hybrid marker model.
Dr. Mahdi Saatchi was a post-doctoral fellow in Dr. Garrick’s group, when Garrick was still at Iowa State University. They identified a set of SNPs that were predictive in multiple breeds. These are the SNPs that are used in the AHA single-step prediction.
The current PACE evaluation fit all traits simultaneously, which required estimating the relationship between all the different traits. These require a linear relationship between increase in one trait and increases in a different trait.
There are now 9 different models in the Hereford evaluation. Only traits that are related to the same economically relevant traits are grouped together.
There has also been data pruning that has gone on. The evaluation now only uses data that was recorded in the Whole Herd Reporting program. This improves the genetic prediction because only unbiased, complete data is used.
BOLT uses MCMC which looks at multiple plausible values in thousands of iterations of analyses. At the end, we take the average of these plausible values which gives us the breeding value. Through these MCMC interations, we could look at the spread in the plausible values. These gives us the Prediction Error Variance, which is a measure of the uncertainty of the prediction. 

Dr. Bruce Golden
Theta Solutions LLC
Dr. Golden addressed five topics:
  • Using Genomic Data
  • Date Cutoff
  • Effects of Accuracy Calculation
  • New and Improved traits
  • Genetic Trends

There are 26,154 genotypes paid for by producers. There are about 50,000 total genotyped animals in the evaluation.
In the evaluation, they have to do marker selection (which SNPs to include in the model). For a lot of SNP effects, if the birth weight effect goes down, the weaning weight effect also goes down. However, there are markers that have an effect on birth weight but have no predicted effect on weaning weight.
In Theta Solutions analyses, single-step BLUP is more accurate than pedigree estimates. However, doing marker selection in the super hybrid models gives you an increase in accuracy over the single-step BLUP model.
Genotype information can give you the same amount of information based on 3 to 15 progeny, depending on the heritability of the trait.
Accuracy is simply a measure of the certainty of the prediction. Zero is bad, one is perfect. Accuracy is harder to calculate and harder to understand. Accuracy is really a measure of risk that the EPD will change with more data.
Breeders are accustomed to full sibs not having very different EPDs. However, with genomic EPDs we can see full siblings to be very different.
Between two full sibs with very different EPDs, they see that 35% of their SNP genotypes are different. They got very different gene samples from their parents (see eBEEF fact sheet for more information).
There is now a new accuracy calculation. There is now no approximation bias in the accuracy calculation. They now use an exact method to estimate the prediction error variance. Because of the evolution in hardware and the MCMC approach, they can now directly measure accuracy (problem is now tractable). The accuracies reported in the new system are going to be lower. The evaluation is not worse, we have done a better job of measuring accuracy. Old accuracy values were approximations and were biased to be larger than they should have been. New accuracies are unbiased, lower, and direct measures.
The new evaluations only uses whole herd Total Performance Records data. They only used observations from January 1, 2001 and forward. They are using pedigree through great-grand parents.
New method is a better measure on accuracy. Breeders will need to recalibrate their eyes to these new accuracy numbers.

Cow Fertility
Cow fertility test to be lowly heritable. However, cow fertility has a big impact on economic selection indexes because it is a huge driver of profitability. Genomics solves the lowly heritable problem. We can now make genetic progress for cow fertility.
There have been lots of measures of cow productivity

  • Days to Calving
  • Calving Interval
  • Cow Longevity
  • Stayability
  • Random Regression Sustained Cow Fertility

We now use the random regression Sustained Cow Fertility. In this analysis, we use all of the available data, cows of any age contribute to the analysis. There are now simultaneous solutions to all ages of cows. There is handling of missing values, such as cows becoming donors (rather than a cow becoming a donor being a “failure” it is now treated as missing).
The Hereford data has observations from 3 years-of-age to 12 years-of-age. A disposal code (reason cow was culled) allows the Hereford Sustained Cow Fertility to focus on fertility and not other reasons for failure.
Hereford is also going to use a new calving ease EPD. This now uses a random regression model, which allows contemporary groups with no variation to be used in the analysis. This evaluation uses scores from all ages of dam and all birth weight records. It predicts EPDs for females as 2 years old. The comparison of the genetic trend between the old model and the new model were very different for Calving Ease Direct. The trends for Calving Ease Maternal were very similar.

Dorian Garrick 
If the sire has an AA genotype at a SNP, and the dam has a BB at a SNP, the calf must be AB (A from sire, B from dam). If the calf is AA, then the dam's genotype doesn't match the calf. If we find lots of these, then the parentage is not verified.
In the past we used blood types to verify parentage. We then used microsatellites, which are lengths of DNA repeats. In cattle, we first used 12 microsatellites, which was later updated to 24 markers. We then switched to SNPs, first using 100 SNPs, which was later updated to 200 SNPs. Theta Solutions genotype pipeline uses all possible markers that animals are genotyped for. They look for animals where the genomic data doesn't match the pedigree data.

Shane Bedwell 
AHA Director of Breed Improvement
Stacy Sanders
AHA Director of Records Department 
Hereford now has a lower DNA price. For $38, breeders get profile, parentage, abnormalities and genomic profile. They can get a combo test for $58, which includes profile, parentage, abnormalities, genomic profile, and Horn/Polled status (previously $85). An add-on Horn/Poll test costs $30, so buying the combo test saves $10. Animals that have previously been tested for parentage can be upgraded to GE-EPDs for $20.
Testing bulls is important from a marketing standpoint. It allows a seedstock producer to sell bulls with more confidence to their commercial customers. From a breed improvement standpoint, it is important to genotype heifers and cows.
Hereford producers can now be paid for using the Allflex TSU Tissue Sample Unit. The unit costs $2 to buy and the producer gets a $4 credit after the animals have been DNA tested.
Dr. Mike MacNeil has updated the AHA Economic Indexes. These updated indexes have been reviewed by Matt Spangler, Larry Kuenen and Bruce Golden.
There are two maternal indexes. The older indexes were driven largely by scrotal circumference. However, scrotal circumference was not a direct measure of female fertility, but only an indicator.
The new maternal index will now have a fertility/longevity trait. Sustained Cow Fertility will now drive the bus.
There has been significant progress in the genetic trend of Hereford. The index will now include Dry Matter Intake (DMI) and Carcass Weight (CW). Ribeye Area (REA), Marbling (MARB), and Back Fat (BF) will be included in the index for balance. The index is now accounting for inputs through the DMI EPD.
[Residual Feed Intake, and other efficiency measures, can rerank animals based on how you look at it. Feed intake is the economically important trait. The appropriate way to do this is through an index.]
The plan is to update economic indexes as the new BOLT evaluation is launched.
The American Hereford Association has been running both evaluations in parallel. The AHA is planning to switch to single-step and new indexes hopefully in mid-November.
Today it takes 20 to 50+ days for progeny records to influence their parent’s EPDs. Going forward, there will be no interim evaluations. There will now be full parent evaluations weekly. Evaluations will be launched midnight on Saturdays. Results of these evaluations will be released Sunday nights. It will now take 9 to 15 days for data to impact an evaluation. Ultrasound scan data now needs to be done 4 weeks or more before a sale. DNA samples for GE-EPDs need to be submitted 6 weeks or more before a sale.
Jack Holden was part of the advisory board and felt the BOLT numbers much better tracked with the performance in his herd. Calving ease has a lot more spread and better matches.
Joe Ellis felt the new numbers and indexes would be more relevant in the commercial industry. Some animals will look better and some will look worse.
Paul Bennett says genetic evaluation has never been perfect, and never will be perfect. But, the analysis has improved. There will be some cattle that are surprised winners. This evaluation has had its day in court and the questions have been answered.
Post a Comment