Combining Multi-type Data to Improve Genomic Prediction

Modern plant breeding programs collect several data types such as weather, images, and secondary or associated traits besides the main trait (e.g. grain yield). Genomic data is high-dimensional and often over-crowds smaller data types when naively combined to explain the response variable. There is a need to develop methods able to effectively combine different data types of differing sizes to improve predictions. In this work, we develop a new three-step statistical method to predict multi-category traits by combining three data types — genomic, weather, and secondary trait and address the various challenges in this problem. We achieved at 8% reduction in prediction accuracy while reducing the model complexity by over 90% compared to machine learning methods such as random forest and SVM.

Watch the video to learn more about our work and results! This work was presented at the INFORMS 2021.

Vamsi Manthena
Vamsi Manthena
Data Scientist II | Ph.D. Statistics

Data Scientist | Statistical Consultant