Gene-based Modeling

With the rapidly increasing availability of data on DNA sequences of individual cultivars or breeding lines, there is growing interest in using this incredible data resource to improve crop model development and applications. Similarly, advances in understanding of the control of plant processes at the molecular level suggests opportunities to strengthen how mechanisms are represented in crop models. These interests have given rise to a broad area of activities termed “gene-based modeling.” Topics of interest to DSSAT users and system modelers can pertain to four activities:

  • Estimation of genotype-specific model parameters (GSPs)
  • Improved representation of crop processes
  • Guiding genetic dissection of crop processes through analysis of GSPs as phenotypes
  • Use of genetic data for genetically realistic sensitivity analyses

We discuss these briefly with examples largely drawn from use of the CSM model or its predecessors.

As background, it is instructive to consider that all models of individual crops implicitly represent genetic information, but that the scale of genetic detail may differ. Six levels are readily recognized (White and Hoogenboom, 2003):

  1. Generic model with no reference to species.
  2. Species-specific model with no reference to genotypes.
  3. Genetic differences represented by genotype-specific parameters.
  4. Genetic differences represented by effects of specific alleles, with gene action represented through linear effects on model parameters (GSPs).
  5. Genetic differences represented by genotypes, with gene action explicitly simulated based on knowledge of regulation of gene expression and products.
  6. Genetic differences represented by genotypes, with gene action simulated at the level of interactions of regulators, gene-products, and other metabolites.

True gene-based modeling is found in levels 4 through 6. The GeneGro model for dry bean (White and Hoogenboom, 1996) was the first crop model to consider action of known loci and operated at level 4. A modification of GeneGro that explicitly represented an effect of a gene conditioning temperature sensitivity of the photoperiod response (Hoogenboom and White, 2003) thus contained elements of levels 4 and 5.

Estimation of genotype-specific model parameters

The genotype-specific model parameters (GSPs) for phenology, organ size, partitioning or other traits are usually estimated through calibration to field data using methods that minimize an objective function for goodness of fit (e.g., Acharya et al., 2017). This process potentially confounds the GSP estimates with errors in other model inputs and the modeled processes, reducing model accuracy. Attempts to replace GSPs with information on the genetic makeup of cultivars started in the early 1990s with the GeneGro model for common bean (White and Hoogenboom, 1995; Hoogenboom et al., 1997). The basic approach was to assign genetic effects via discrete values, typically 1 for dominant and 0 for recessive alleles. The GSPs were estimated as a linear function of expected genotypes for loci thought to affect processes quantified by the GSP. Variants of the approach readily accommodated loci with multiple alleles (e.g., in sorghum; White et al., 2007) or cases where multiple gene copies were present (e.g., in bread wheat, a hexaploid; White et al, 2008).

Improved representation of crop processes

In theory, if mechanisms were well understood, the effects of specific loci could be directly encoded into the model equations. Unfortunately, molecular mechanisms seldom are well-enough understood to allow inferring quantitative effects on modeled responses. Two promising efforts to meld molecular information with simulation are in phenology of Arabidopsis (Welch et al., 2005) and wheat (Brown et al., 2013). Planned research with common bean, soybean and sorghum models of CSM seek to extend the established approaches to larger sets of common bean, soybean and sorghum lines and attempt to introduce more mechanistic representations of genetic effects by capitalizing on the rapidly increasing amounts and quality of genotyping data (e.g., Schmutz et al., 2014; Langewisch et al., 2017) and insights on mechanisms (e.g., Weller and Ortega, 2015).

Guiding genetic dissection of crop processes

While conventional modeling seeks estimates of GSPs for downstream applications (e.g., in decision support of crop management), an important modification to the basic approach is to estimate GSPs for a large population of lines (e.g., recombinant inbred lines or a diversity panel) and then estimate effects of genetic loci for each GSP to support gene-discovery. The values of a given GSP are valid phenotypes, so the method is analogous to establishing marker-trait associations for any quantitative trait. To date such studies have confirmed that GSP data from mapping populations can be mapped and used to identify loci of interest (e.g., Yin et al., 1999; Zhang et al., 2017). Proponents of this approach suggest that most quantitative traits are the result of combined effects of numerous loci with small effects, as concluded for example in maize phenology (Buckler et al., 2009) and height (Peiffer et al., 2014), so there is minimal benefit in considering known loci. However, evidence from natural systems on the magnitude of effects of adaptive loci are ambiguous (Dittmar et al., 2016). A logical compromise is to account for known loci in the process of estimating marker-trait relations for GSPs.

An important consideration for modelers is that to accurately relate markers (or more correctly chromosome intervals) with traits, one needs to work with much larger populations than are usually considered by crop modelers. The minimum suggested population size is 200 lines. Furthermore, one typically needs to evaluate the lines in multiple environments to obtain useful variation in phenotypes. These creates two challenges. The first is how to phenotype large numbers of lines accurately, and the second is how to automate estimation of GSPs. The first concern has led to widespread research on high-throughput phenotyping (White et al., 2012). The second is reflected in development of DSSAT tools such as GLUEselect for parameter estimation.

Use of genetic data for genetically realistic sensitivity analyses

Some of the earliest applications of crop modeling related to concerned examining how canopy traits might affect crop productivity. As models became more mechanistically detailed, researchers conducted sensitivity analyses to examine potential tradeoffs among traits and thus provide insights to guide crop improvement (e.g., Hoogenboom and White, 1988). Often, such work was done to propose an “ideal plant type” or “ideotype” (Donald, 1968; White, 1998), whereby a wide range of plant traits are varied to identify the combination that maximizes yield for the target population of environments.

Three key issues need to be addressed to ensuring that such modeling exercises do not degenerate into a meaningless numerical exercise. The first is that the range of genetic variation simulated should relate to reasonable expectations for available or attainable genetic diversity. This would normally be inferred from variation seen in GSPs but could include “what if” scenarios for more extreme phenotypes. The second is that inherent tradeoffs among traits are properly represented. For example, there is little value in simulating variation in specific leaf area without considering likely correlated effects on assimilation rate. The final, most challenging aspect, is ensuring that the model is capable of simulating responses that are relevant for crop improvement.

Gene-based modeling offers the prospect of allowing researchers to conduct sensitivity analyses that explore the known or likely parameter space of genetic variability. Furthermore, where multiple effects of a locus (i.e., pleiotropy) are modeled, tradeoffs or compensatory behavior will explicitly be considered. An example of exploring genetic variation via gene-based modeling is given in White and Hoogenboom (2005), where effects of major loci in common bean were tested for different warming scenarios.


Acharya, S., Correll, M., Jones, J.W., Boote, K.J., Alderman, P.D., Hu, Z., Vallejos, C.E., 2017. Reliability of genotype-specific parameter estimation for crop models: insights from a Markov Chain Monte-Carlo estimation approach. Transactions of the ASABE 60, 1699.

Brown, H.E., Jamieson, P.D., Brooking, I.R., Moot, D.J., Huth, N.I., 2013. Integration of molecular and physiological models to explain time of anthesis in wheat. Annals of Botany 112, 1683-1703.

Buckler, E.S., Holland, J.B., Bradbury, P.J., Acharya, C.B., Brown, P.J., Browne, C., Ersoz, E., Flint-Garcia, S., Garcia, A., Glaubitz, J.C., 2009. The genetic architecture of maize flowering time. Science 325, 714-718.

Dittmar, E.L., Oakley, C.G., Conner, J.K., Gould, B.A., Schemske, D.W., 2016. Factors influencing the effect size distribution of adaptive substitutions. Proceedings of the Royal Society B: Biological Sciences 283.

Donald, C.M., 1968. The breeding of crop ideotypes. Euphytica 17, 385-403.

Hoogenboom, G., White, J.W., Acosta-Gallegos, J., Gaudiel, R., Myers, J.R., Silbernagel, M.J., 1997. Evaluation of a crop simulation model that incorporates gene action. Agronomy Journal 89, 613-620.

Hoogenboom, G., White, J.W., 2003. Improving physiological assumptions of simulation models by using gene-based approaches. Agronomy Journal 95, 82-89.

Peiffer, J.A., Romay, M.C., Gore, M.A., Flint-Garcia, S.A., Zhang, Z., Millard, M.J., Gardner, C.A.C., McMullen, M.D., Holland, J.B., Bradbury, P.J., Buckler, E.S., 2014. The Genetic Architecture of Maize Height. Genetics 196, 1337-1356.

Welch, S., Roe, J., Das, S., Dong, Z., He, R., Kirkham, M., 2005. Merging genomic control networks and soil-plant-atmosphere-continuum models. Agricultural Systems 86, 243-274.

White, J.W., Hoogenboom, G., 1996. Simulating effects of genes for physiological traits in a process-oriented crop model. Agronomy Journal 88, 416-422.

White, J.W., 1998. Modeling and crop improvement. In: Tsuji, G.Y., Hoogenboom, G., Thornton, P.K. (Eds.), Understanding Options for Agricultural Production. Kluwer Academic Publishers, Dordrecht, The Netherlands, pp. 179-188.

White, J.W., Hoogenboom, G., 2003. Gene-based approaches to crop simulation: past experiences and future opportunities. Agronomy Journal 95, 52-64.

White, J.W., Hoogenboom, G., 2005. Integrated viewing and analysis of phenotypic, genotypic, and environmental data with “GenPhEn Arrays”. European J Agron 23, 170-182.

White, J., Hoogenboom, G., Ottman, M., 2007. Modeling phenology of sorghum based on known maturity (Ma) loci. Farming Systems Design 2007. Proc. Int. Conf., Catania, Italy, pp. 10-12.

White, J.W., Herndl, M., Hunt, L.A., Payne, T.S., Hoogenboom, G., 2008. Simulation-based analysis of effects of Ppd and Vrn loci on flowering in wheat. Crop Sci. 48, 678-687.

White, J.W., Andrade-Sanchez, P., Gore, M.A., Bronson, K.F., Coffelt, T.A., Conley, M.M., Feldmann, K.A., French, A.N., Heun, J.T., Hunsaker, D.J., Jenks, M.A., Kimball, B.A., Roth, R.L., Strand, R.J., Thorp, K.R., Wall, G.W., Wang, G., 2012. Field-based phenomics for plant genetics research. Field Crops Research 133, 101-112.

Yin, X., Stam, P., Dourleijn, C.J., Kropff, M.J., 1999. AFLP mapping of quantitative trait loci for yield-determining physiological characters in spring barley. TAG 99, 244-253.

Zhang, L., Gezan, S.A., Vallejos, C.E., Jones, J.W., Boote, K.J., Clavijo-Michelangeli, J.A., Bhakta, M., Osorno, J.M., Rao, I., Beebe, S., 2017. Development of a QTL-environment-based predictive model for node addition rate in common bean. Theoretical and Applied Genetics 130, 1065-1079.