Crop growth models simultaneously integrate nonlinear interactions between soil, water, plant, and weather factors as well as management practices to determine crop productivity, environment stresses, and water and nutrient needs (Hodges et al., 1998; Newton et al., 2007). Regional climate processes, including mesoscale circulations, cloud formation, radiative transfer, as well as energy and water recycling, strongly depend on surface heat and moisture fluxes that are controlled by vegetation and soil storage. In particular, vegetation plays a major role in determining precipitation, interception, runoff, and the removal of moisture from the soil by transpiration (Dirmeyer, 1994; Xue et al., 1996; Liang et al., 2003). Thus, all current global and regional climate models incorporate comprehensive land surface models to predict surface–atmosphere exchanges of energy, momentum, water, and other mass constituents (Dickinson et al., 1993; Sellers et al., 1996; Dai et al., 2003). They represent key interactions between surface and atmospheric processes. More recent land surface models depict vegetation dynamics that include the C and N cycles (Foley et al., 1998; Dickinson et al., 2002; Chen et al., 2004; Bonan and Levis, 2006). These models, however, focus more on large-scale processes and long-term changes rather than the detailed daily variation characteristics of crop growth. While typical crop growth models give a more complete treatment of the plant life cycle evolution, they lack the feedback processes that are essential to regional climate variations. Thus, these climate and crop models need to be coupled to better understand the interactions between regional climates, ecosystems, and agricultural crops (Tsvetsinskaya et al., 2001; Lu et al., 2001; Osborne et al., 2009). Better integration of crop growth models with climate models can help provide the needed understanding and predictive ability.
Our overarching goal was to couple the mesoscale regional CWRF model (cwrf.umd.edu; Liang et al., 2012b) with the cotton growth model GOSSYM (Gossypium—the cotton genus) to better predict climate–cotton interactions. There exist many challenges to reach this goal. In the past, crop growth models were generally treated as “black boxes,” where full two-way interactions were lacking or, if included, consistency with the host climate models in the representation of the physical processes and soil properties that determine surface evapotranspiration, runoff, and the energy budget was not imposed (e.g., Heinemann et al., 2002; Doherty et al., 2003; Challinor et al., 2005; Xiong et al., 2008). Like most crop growth models, GOSSYM has been developed, calibrated, and evaluated on the basis of site-specific measurements. Its application and resulting credibility across a broad region with geographically distributed grids have yet to be established. Given the driving weather or climatic conditions, the original GOSSYM has approximately an additional 85 tunable parameters for users to calibrate at a farm site (Boone et al., 1993). Specification for such a long list of unknown parameters is extremely difficult to accomplish and constitutes a major obstacle that introduces large uncertainties to the spatially distributed modeling of climate–crop interactions. Model evaluation, while critical, has been limited and most often subjective, mainly due to the lack of comprehensive observations and coupled predictive systems (Thorp et al., 2007; Xiong et al., 2008).
Therefore, our specific objectives to achieve the goal of coupling GOSSYM with CWRF for distributed modeling were threefold. First, we fundamentally redesigned the software to change the original GOSSYM from running over time at a specific location to integration over space in sequential times. Second, we generalized the estimation of adjustable parameters and the specification of soil properties that are geographically distinct. This involved reducing the list, by defining most of the parameters with best available physical representations and observational estimates, to the final two parameters: the initial NO3 amount in the top 2 m of soil and the ratio of irrigated water amount to potential evapotranspiration. These parameters were derived through inverse modeling that minimizes simulation errors for observed cotton yields across the United States. These two parameters were validated indirectly by comparing the model total irrigated water amounts with U.S. statewide observations. Third, we conducted retrospective simulations to evaluate the ability of the redeveloped GOSSYM to reproduce historical cotton yield variations as driven by realistic climate conditions. This study, the first in a series, addressed the first and second objectives, with a focus on model development, including software reengineering, physics improvement, and parameter determination. A companion study (Liang et al., 2012a) fulfilled the third objective, where spatiotemporal variations of cotton yields and their responses to climate stresses simulated by the redeveloped GOSSYM were compared with observations.
MATERIALS AND METHODS
Given important differences in physics formulation and coding structure between CWRF and GOSSYM, both software engineering and physics modeling for the critical counterparts required appropriate modifications to ensure consistent model coupling and realistic representation of climate–crop interactions.
Brief Description of CWRF and GOSSYM
The CWRF is the climate extension of the WRF (Weather Research and Forecasting) model (Skamarock et al., 2005). This extension has been continuously refined with numerous crucial modifications to improve interactions among land–atmosphere–ocean, convection–microphysics, and cloud–aerosol–radiation and system consistency throughout all process modules (Liang et al., 2005a,b,c,d, 2006, 2012b). An essential CWRF component is the Conjunctive Surface–Subsurface Process (CSSP) model. It predicts 11 layers of soil temperature and moisture (liquid or ice), five layers of snow (mass, cover, and age), and a two-leaf (sunlit or shaded) canopy of temperature, radiation, and photosynthesis–stomatal resistance (Dai et al., 2003, 2004). The CSSP also integrates horizontal water movement (across grids) as surface and subsurface runoff that results from rainfall excess, saturation depletion, and lateral flows due to resolved and subgrid topographic controls (Choi et al., 2007; Choi and Liang, 2010). As such, the CSSP, which is most relevant to the coupling with GOSSYM in the present study, has outperformed major land surface models in predicting soil temperature and moisture distributions, terrestrial hydrology variations, and land–atmosphere exchanges (Yuan and Liang, 2011a). It also allows the CWRF to produce pronounced downscaling skill for seasonal–interannual precipitation prediction across the continental United States (Yuan and Liang, 2011b).
Among the 15 published cotton growth models (Jallas, 1998), GOSSYM is comprehensive and widely used in commercial agriculture to aid in making crop management decisions. The model formulation, development, and application of GOSSYM have been well documented (Baker et al., 1983; McKinion et al., 1989; Baker and Landivar, 1991; Boone et al., 1995; Hodges et al., 1998; Reddy et al., 1997, 2002). It is a mass-balance dynamic model that simulates C, N, and water processes in the plant and soil root zone throughout the cotton life cycle. It predicts crop growth (with detailed plant chemistry, morphogenesis, and phenology) and soil responses to environmental stresses, primarily from heat, water, C, and nutrients. These stresses are determined by climate variables such as solar radiation, temperature, rainfall, and wind, as well as by soil properties and cultural practices including irrigation and fertilization.
GOSSYM Software Reengineering
The original GOSSYM source code was written in the FORTRAN 77 (F77) standard and departs substantially from the CWRF FORTRAN 90 (F90) modular designs. The F77 standard has numerous shortcomings when compared with F90. In particular, F77 includes neither the complex data type to describe the hierarchical structure of plant organic parts nor the dynamic memory allocation that is desired for actual objectives with variable dimension size such as the growing leaf, stem, root, and boll. In addition, the original code was written for simulations that are site specific and in its present configuration is not suitable for coupling with the CWRF, where simultaneous calculations across a large array of geographically distributed grids are necessary. As such, the entire GOSSYM model software was recoded to follow the CWRF F90 modular design (Xu et al., 2005). This enabled running the coupled system most effectively on a parallel supercomputing environment for long-term climate integrations across distributed grids in sequential time steps.
Soil Temperature and Moisture and Evapotranspiration
The CSSP is physically based and represents dynamic interactions between comprehensive land and atmospheric processes through an advanced planetary boundary layer parameterization. It predicts soil temperature and moisture distributions, terrestrial hydrology variations, and land–atmosphere flux exchanges by strict energy conservation and mass balance. Extensive validation against field measurements indicates that the CSSP realistically simulates observed surface energy and mass variations (Yuan and Liang, 2011a). In contrast, the respective GOSSYM components are much less comprehensive, where the model formulations are generally empirical and especially lack full interactions with atmospheric dynamics that limit model performance (e.g., Staggenborg et al., 1996; Khorsandi et al., 1997). Given the crucial controls that soil temperature and moisture and evapotranspiration have on crop growth, development, and yield, we eliminated the corresponding modules from GOSSYM and, instead, utilized the more realistic prediction provided by the CSSP.
Soil Water Movement by Macropore Flow
The GOSSYM model treats the cotton field soil on a two-dimensional mesh with 20 horizontal rows and 40 vertical layers, where mass balances of water, NO3, NH4, and organic matter are maintained and root growth rates are predicted (Hodges et al., 1998). Water movement is simulated by Darcy fluxes into each 50-mm-thick layer, starting at the surface via infiltration of rainfall and irrigation and moving downward by vertical diffusion and drainage to a 2-m depth. Meanwhile, soil moisture is transported upward to the atmospheric boundary layer through surface evaporation and plant transpiration of water uptake from the crop root zone. Within each layer, water is moved laterally using the same flux equations according to the soil water gradient and diffusivity. Any soluble chemicals, primarily NO3–N, are assumed to move with the water fluxes.
The original GOSSYM does not consider the macropore flow effect that is especially important in the U.S. Southeast, where the soil contains a high percentage of sand with low soil moisture holding capacity. We thus incorporated a macropore flow mode to allow rapid drainage of the soil, following Theseira et al. (2003). This mode was designed to limit the actual soil water holding capability, increase water stress, and consequently reduce crop yields. We kept the soil grid structure and water movement calculation from GOSSYM to easily and intuitively incorporate the irrigation and fertilization methods (sprinkler, furrow, drip, broadcast, sidedress, foliar, and their alternatives). The resulting total soil moisture tendency from field water movement and the new macropore flow is updated at the same CSSP time interval to provide consistent calculations, such as evaporation and runoff from CSSP, if fully coupled with the CWRF.
Vegetation and Soil Property Distributions
The CSSP uses the USGS land use or land cover classification with 24 categories (Liang et al., 2005a,b). Figures 1a and 1b illustrate the geographic distribution of the USGS categories identified as the dominant type (Table 1) in each CWRF 30-km grid and the USDA county-level harvested cotton acreage averaged during 1979 to 2005. We eliminated cotton growth in minor production areas (e.g., Kansas) where the yield within a 30-km grid was <165 kg ha−1 and where there were missing data during 1979 to 2005. Across the U.S. Cotton Belt, the prevailing USGS categories are shrubland, grassland, and their mixture (southwestern states: California, Arizona, New Mexico, and Texas) as well as cropland and woodland mosaics (Lower Mississippi River basin: Oklahoma, Louisiana, Arkansas, Missouri, Mississippi, and Tennessee) and dryland cropland and pasture (southeastern states: Alabama, Florida, Georgia, South Carolina, North Carolina, and Virginia). We first acquired the USDA data archive to define where and when cotton grows and whether it is irrigated. The CSSP vegetation was then assigned to one of the four cropland categories (dryland cropland and pasture, irrigated cropland and pasture, cropland–grassland mosaic, cropland–woodland mosaic) by the cotton dominance in each grid.
|1||urban and built-up land|
|2||dryland cropland and pasture|
|3||irrigated cropland and pasture|
|4||mixed dryland and irrigated cropland and pasture|
|9||mixed shrubland and grassland|
|11||deciduous broadleaf forest|
|12||deciduous needleleaf forest|
|13||evergreen broadleaf forest|
|14||evergreen needleleaf forest|
|19||barren or sparsely vegetated|
|24||snow or ice|
The GOSSYM model requires inputs at each cotton plot for the soil depth and water table as well as vertical profiles that include the following: (i) sand and clay percentages; (ii) bulk density; (iii) volumetric water content at saturation, field capacity, permanent wilting point (water potential of –1.5 MPa), and air dry (soil humidity in equilibrium with the surrounding air); (iv) the water potential at field capacity; and (v) diffusivity at the permanent wilting point. Direct measurements of these parameters are laborious and their general application across the U.S. Cotton Belt is limited. In contrast, CSSP uses Penn State University 1-km soil characteristics data to define the depth to bedrock as well as the sand (Fig. 1c) and clay profiles (Liang et al., 2005a,b). These quantities are then used to calculate all the other soil properties (Saxton and Rawls, 2006). The identical CSSP representation for the geographic distributions of soil properties was incorporated into GOSSYM to ensure consistent coupling with CWRF and eliminate subjective calibration.
Cultural Practice Specifications
Cotton was planted in 2010 on >4.47 million ha in the U.S. Cotton Belt (Fig. 1b), especially the northern Texas High Plains, Mississippi Delta, irrigated valleys of California and Arizona, and southeast coastal states. Approximately 4.36 million ha were harvested. In addition to planting location and acreage, numerous other cultural or management practices must be specified in each 30-km model grid. In particular, timely irrigation and maintenance of soil fertility are necessary to ensure productivity and profitability because cotton is sensitive to both water stress and N loss caused by water stress (Reddy et al., 1997; Gerik et al., 1998). There is, however, no systematic data archive that includes detailed records of these practices, which differ substantially among individual farms. Thus, we adopted the following approximations:
Cotton Type, Density, and Period. Two major types of cotton (upland and pima) are grown in the U.S. Cotton Belt, where upland cotton covers 5.24 million ha and pima cotton covers the remaining 0.08 million ha, as averaged during 1979 to 2005. Given its predominance, only upland cotton was considered in this study. Because state- and county-level data records are often missing, we chose a mid-season cotton cultivar. A sensitivity analysis showed that the other cotton cultivars usable in GOSSYM resulted in mean yield differences mostly within ±20% of mean yields using the mid-season cultivar under the identical cultivation parameters calibrated for the mid-season cultivar. The prevailing cultural information on the plant row spacing and density as well as planting and harvest dates are from the Agricultural Statistics Board (1997) and State Cooperative Extension together with published literature. They differ among the states (Table 2). For example, row spacing ranges from 0.762 m in Missouri to 1.016 m in Arizona, Texas, and Oklahoma, while the density (number of plants per meter of row) varies between 4.9 in Virginia and 12.4 in New Mexico. The planting and harvest dates vary widely among the states, where each state has a harvest window of up to 1 mo. We used the middle of each window as the planting and harvest dates in each state. As such, the longest growing season spans the period from 30 April to 12 November in New Mexico. This is the window during which the GOSSYM integration for each year is conducted. Neither plant growth regulators nor pressure from weeds, diseases, insect pests, or catastrophic weather events was applied anywhere.
|State||Unirrigated||Irrigated||Unirrigated||Irrigated||Harvest date||N fertilization|
|m||no. m−1||kg ha−1|
|Alabama||1.003||1.003||11.745||11.745||9 May||5 Oct.||93.071|
|Arizona||1.016||1.016||9.843||9.843||15 Apr.||25 Oct.||167.827|
|Arkansas||0.965||0.965||9.022||9.022||12 May||18 Oct.||90.893|
|California||0.986||0.965||10.466||10.466||22 Apr.||23 Oct.||145.530|
|Florida||0.914||0.914||8.202||8.202||30 Apr.||16 Oct.||80.960|
|Georgia||0.914||0.914||8.202||8.202||10 May||25 Oct.||89.903|
|Louisiana||0.983||0.983||9.875||9.875||10 May||9 Oct.||94.325|
|Mississippi||0.965||0.965||10.663||10.663||13 May||20 Oct.||115.819|
|Missouri||0.762||0.762||12.402||12.402||12 May||17 Oct.||73.700|
|New Mexico||1.007||1.007||12.415||12.415||30 Apr.||12 Nov.||71.225|
|North Carolina||0.965||0.965||9.843||6.562||10 May||26 Oct.||85.360|
|Oklahoma||1.016||1.016||11.483||11.483||31 May||16 Nov.||42.493|
|South Carolina||0.965||0.965||9.843||6.562||10 May||29 Oct.||96.140|
|Tennessee||1.003||1.003||11.893||11.893||12 May||18 Oct.||91.718|
|Texas||1.016||1.016||9.843||11.483||21 May||1 Nov.||63.118|
|Virginia||0.838||0.838||4.921||4.921||30 Apr.||10 Nov.||80.960|
Cotton Area. The planting and harvest areas vary greatly from year to year. Both values were derived from the National Agricultural Statistics Service (NASS) archive at the county level. The GOSSYM model currently does not consider cotton mortality: when planted, it always grows but may produce little or no leaf, seed, and lint yield. For the purpose of validation, we only considered the harvest acreage. Figure 1b shows the processed harvest acreage distribution for each 30-km grid averaged for 1979 to 2005.
Fertilization. No county-level records of actual fertilizer application rates and dates for cotton are available. State-level data for annual N fertilizer rates (per fertilized cotton acre receiving N) during 1964 to 2005 are archived at the USDA Economic Research Service. Missing records in this database were replaced with regional averages reported by the USDA (Potter et al., 2006, Table 34). Table 2 lists the N rate for each state, averaged across 1979 to 2005. All N fertilizer amounts, which vary among states and from year to year, are applied once at the beginning of each growing season.
Initial Soil Fertility. This includes residual or initial N and organic matter content in the soil. The geographic distribution of the annual organic matter content is specified as one of the CWRF built-in surface boundary conditions (Liang et al., 2005a). The initial N is an essential input factor for most crop models and is generally measured on site before planting. Such measurements, however, are not made publically available. Thus, for this study, we chose the initial NO3 amount in the top 2 m of soil in the root zone to be an adjustable parameter.
Irrigation. The annual irrigated areas for cotton were derived mainly from the NASS archive at the county level. The planting or harvesting fractional area of a county is treated as being irrigated or unirrigated as indicated by the practice record. When this information was not explicitly recorded, the irrigated area of cotton was estimated using the NASS Farm and Ranch Irrigation Survey (FRIS) Census of Agriculture from 1984, 1988, 1994, 1998, and 2003. Constant extrapolation before 1984 and after 2003 and intercensus, linear interpolation between 1984 and 2003 were used to produce the irrigation map for each year.
Figure 1d illustrates the irrigated area percentage distribution averaged for 1979 to 2005. Irrigation was applied mostly in the southwestern states (California, Arizona, New Mexico, and Texas), while other states required notably less irrigation or were entirely rainfed. Past studies (Richardson and Reddy, 2004; Potter et al., 2006) irrigated 19.1 mm of water once the water stress factor fell below the threshold of 0.75. (The water stress factor, used only to trigger irrigation, is the ratio of plant water supply to demand, ranging from zero for the most severe stress to one for no stress [Baker et al., 1983]. In addition, GOSSYM uses the leaf water potential, which is a function of both root zone water availability and the prevailing atmospheric moisture demand, to determine the water stress effects on crop growth expansion, photosynthesis, and reproduction [Reddy et al., 1997]).
An examination of the limited cotton farmer records from the archive at the USDA-ARS showed that the amount of irrigated water varied substantially, ranging from as little as 12.7 to 76.2 mm in Texas to as much as 25.4 to 254 mm in California and New Mexico, and that the dates of irrigation were irregular.
Sensitivity experiments on irrigation applications using trial and error suggest that it is desirable to specify a unique water amount, stress threshold, and separation period set for each farm. This is impossible at present, given the lack of actual data for the date of application, amount of water, and method of irrigation (sprinkler, furrow, drip, or their alternatives). Instead, we adopted an automatic approach based on the cotton daily water stress factor and potential evapotranspiration. A specified amount of water, as an adjustable fraction of the local potential evapotranspiration, is applied once the water stress factor falls below the threshold of 0.75. Thus, we chose the ratio of the irrigated water amount to potential evapotranspiration as another critical adjustable parameter.
EXPERIMENTS AND RESULTS
Model Parameter Estimation
The two final adjustable parameters for the redeveloped GOSSYM are: (i) the initial NO3 abundance in the top 2 m of soil in the root zone; and (ii) the ratio of the irrigated water amount to potential evapotranspiration. Both parameters vary widely across the U.S. Cotton Belt. There are, however, no credible data to specify them correctly. Our strategy was to estimate both parameters through inverse modeling by optimization that minimizes local root mean square errors (RMSEs) of annual cotton yields simulated from observations during 1979 to 2005.
The inverse modeling or optimization algorithm is well accepted and widely used in calibrations of crop growth models (Liu et al., 1989; Jones and Carberry, 1994; Calmon et al., 1999; Irmak et al., 2000; Gauch et al., 2003; Thorp et al., 2007; Xiong et al., 2008). For example, Liu et al. (1989) derived the phenological coefficients for maize (Zea mays L.) through inverse modeling by closely matching the CERES-maize simulated dates for silking and maturity to the observed dates. Thorp et al. (2007) calibrate the saturated soil water conductivity of the bottom soil layer and effective tile drainage rate in the CERES-maize model to minimize errors between simulated and observed yields.
The use of this optimization in the current study required a large ensemble of GOSSYM standalone simulations. They were driven by the same set of the most realistic climate conditions during 1979 to 2005, while the two adjustable parameters listed above were altered across the ensemble to cover their respective conceivable ranges. This standalone approach eliminated complications from any CWRF climate biases and nonlinear feedbacks and thus effectively confined the optimization result by reducing the physical and numerical deficiencies specific to GOSSYM.
Annual cotton yield data for each county available from the NASS archive were used as the reference to evaluate GOSSYM simulations. Below we describe the driving climate conditions and model experiments to identify formulation deficiencies, improve irrigation modeling, and achieve system optimization. These efforts aimed to minimize cotton yield errors to derive best estimates of the two adjustable parameters.
Driving Climate Conditions
The climate conditions driving the standalone GOSSYM simulation were taken from the North American Regional Reanalysis (NARR) (Mesinger et al., 2006). The NARR provides a long-term, consistent, high-resolution climate data set that represents the best available proxy of observations. The NARR adopts a 32-km grid, close to that of CWRF, and provides 3-h atmospheric and land surface data, including precipitation, evapotranspiration, runoff, radiation flux, surface air temperature, humidity and wind, and soil temperature and moisture. Daily minimum and maximum temperature differences between NARR and the analysis based on surface (1.5–2-m) measurements from 7235 National Weather Service cooperative stations across the continental United States were generally small, having RMSEs within 1°C. The NARR biases in daily accumulative precipitation are relatively large, especially in the southeastern United States. As such, we replaced the daily precipitation with an objective analysis based on gauge measurements from the same cooperative stations (see Liang et al.  for the data source and analysis procedure).
Original GOSSYM Deficiencies
Doherty et al. (2003) applied GOSSYM to simulate geographic distributions of cotton yields across the U.S. Southeast. They used the original GOSSYM, however, as a “black box” model (i.e., without the modifications outlined above) and did not provide a direct model validation as driven by realistic surface and climatic conditions. Specifically, they presented the results under the following conditions: (i) one of three representative soil types (loamy sand, sandy loam, or silt loam); (ii) with or without irrigation application of an unstated amount (perhaps the default 19.1 mm) of water once the water stress factor dropped below 0.75; (iii) planting on 1 May and harvesting on 20 November; (iv) CO2 concentration fixed at 330 μL L−1; (v) generic fertilizer application of 165 kg ha−1 NO3; (vi) climate variations during 1960 to 1995 simulated by global and regional climate models (containing important biases from observations). Their resulting cotton yields were substantially greater than the observations, with statewide overestimations ranging from 30 to 120%.
A similar experiment using the original GOSSYM was conducted in this study to simulate cotton yield distributions across the entire U.S. Cotton Belt during 1979 to 2005 under more realistic conditions that include observed CO2 concentrations and climate variations, fertilizer application rates, and irrigation area percentages (all distributed on the CWRF 30-km grid), as well as state-specific planting and harvest dates. These driving conditions are identical to those specified for the redeveloped GOSSYM except that a single sandy loam profile was chosen everywhere, no macropore flow was included, the initial soil NO3 abundance was specified as 165 kg ha−1, and irrigation was applied as 19.1 mm of water whenever the stress level was <0.75, following Doherty et al. (2003).
Figure 2 illustrates serious deficiencies of the original GOSSYM in simulating cotton yields as measured by the relative mean biases  and standard deviation ratios (σs/σo), where and σ are the mean and standard deviation of cotton yields during 1979 to 2005 and the subscripts s and o represent simulations and observations, respectively. The relative mean biases measure the mean error of the simulated yields and were scaled or normalized by the observed yield. The standard deviation ratios are used to assess the simulated and observed yields’ interannual variability. The relative mean biases shown in Fig. 2a reveal that the model overestimated cotton yields by 90 to 135, 49 to 132, 27 to 122, 27 to 135, and 92% across the southeastern states, the Lower Mississippi River basin, the southwestern states, the individual states, and the entire U.S. Cotton Belt, respectively. The use of loamy sand and silt loam soil profiles did not substantially alter this outcome, although there were some small regional contrasts. In addition, the standard deviation ratios in Fig. 2b indicate that model cotton yield interannual variability was approximately 138, 90, and 43% greater than observations in Arizona, Texas and Oklahoma, and Louisiana and Mississippi, respectively, but 19, 21, and 27% less in Arkansas and Tennessee, the southeastern states, and New Mexico, respectively. These results clearly show that the original GOSSYM contains substantial biases. This necessitated incorporation of the key improvements outlined above into the revised GOSSYM as well as an enhanced irrigation scheme and optimal parameter derivation as described below.
Improved Autoirrigation Modeling
The redeveloped GOSSYM produced a much more realistic simulation of the long-term mean cotton yield distribution across the U.S. Cotton Belt. It still largely overestimated the cotton yield interannual variability, however, mostly for irrigated areas in the Southeast. Data analysis showed that the existing automatic irrigation scheme, based only on the static ratio of irrigated water amount to potential evapotranspiration under a fixed water stress threshold (e.g., Hillel and Guron, 1973; Fischbach and Somerholder, 1974), was likely to be the major cause for this overestimation. During severe drought years, simulated yields were much smaller than observations, suggesting that plants in the model were constantly subjected to water stress. To simulate a realistic long-term mean, the model has to produce higher than observed yields during wet years to balance the deficits in dry years, which results in a large overestimation of interannual variability. Given that irrigation is applied once per model integration interval (currently daily) when the initial water stress drops below the threshold, the irrigated water amount in the present automatic scheme is insufficient under drought conditions for the model to maintain the relatively wet soil that is required for high observed productivity. This problem is particularly serious in the Southeast, where rainfed and irrigation practices often coexist, but less severe in the Southwest, where irrigation is the predominant practice.
It is more desirable to determine the irrigation amount based on the actual evapotranspiration (ET) and surface plus subsurface runoff. These quantities, however, are not known before crop growth prediction. Instead, empirical estimates of the ratio to ET (runoff was not considered) were applied to incorporate crop characteristics and average soil conditions (Jensen and Heermann, 1970; Allen et al., 1998). In reality, farmers probably irrigate with a larger amount of water per application in dry years to reduce the overall water stress. We therefore modified the irrigation amount ratio to include an enhancement factor that increases with increasing departure of the threshold from actual water stress. The irrigated water amount, I, is expressed as a function of the daily water stress, Sw (0.0–1.0, where the value is inversely proportional to stress), on cotton growth and potential evapotranspiration, Ep:where α is a geographically distributed coefficient for optimization to minimize local model yield biases from observations, β is the enhancement factor (=3.0), and Sc is the water stress threshold (=0.75) below which irrigation starts. This scheme was designed to reduce cotton yield interannual variability from the previous irrigation scheme, where the irrigated water amount was based only on ET and not related to current water stress, i.e., without the max[ ] term in Eq. .
Optimization to Minimize Yield Errors
The redeveloped GOSSYM was then subject to adjustment by an optimization procedure that focused on the two unknown parameters: (i) the ratio of irrigated water amount to potential evapotranspiration (across the irrigated area only); and (ii) the initial soil NO3 abundance (everywhere). This was accomplished via inverse modeling to determine the irrigation ratio and initial NO3 values that minimized model biases of observed yields, averaged across the entire integration period (1979–2005) at each CWRF grid. Given Eq. , the determination of the irrigation ratio is simplified to the estimate of parameter α. Note that other irrigation parameters, including the stress threshold Sc, the enhancement factor β, and the application interval, also play important roles for the areas suffering from water stress, especially in the southwestern states. Unfortunately, there exist no data to determine the accuracy of these adjustable factors and hence they are prescribed at present.
Given that all of the above driving conditions are realistically specified, the time-varying annual cotton yields, Yt, at each grid modeled by the redeveloped GOSSYM depend only on the local initial soil NO3 abundance, N0, and irrigation water ratio parameter, α, both of which were assumed to be time invariant in this study. Ideally, the numerical optimization is achieved by searching from the unlimited ranges of N0 and α for the values that minimize local RMSEs between the modeled and observed Yt. In actual agricultural practices, the amounts of irrigated water and applied fertilization are constrained by the invested costs relative to the final yields. Thus, it is more realistic to construct the optimization to account for the cost effectiveness or efficiency of N and water application. Following the definition of agronomic efficiency (Molden, 1997; Novoa and Loomis, 1981; Grismer, 2002), we calculated the local N productivity, NP, and water productivity, WP, as functions of N0 and α:where δ denotes small increments of the adjustable parameters N0 and α.
The constrained optimization then finds the appropriate thresholds for NP and WP, below which further increases in N0 and α will not result in sufficient Yt gains to justify the N and water use. This was done by using 1979 to 2005 mean yields. At present, the redeveloped GOSSYM applies all fertilized N (assuming in NO3–N form) once at the planting date, while the amounts vary according to the state records (Table 1). We assumed that these actual amounts were determined by farmers as they assessed (based on past experiences) cotton NO3 need after accounting for initial soil fertility. As a first approximation, we have factored yearly initial soil fertility variations into the fertilization decision made by farmers and hence made N0 constant with time. Moreover, the initial soil fertility incorporates fertilization differences including timing, amount, and composition between the reality and the model.
Figure 3a depicts the NP statistics averaged for 1979 to 2005 over the Southwest, Mid-South, and Southeast regions in terms of varying N0 from 110 to 440 kg ha−1 (in increments of 27.5 kg ha−1) for α values of 0.05, 0.50, and 1.00. The results indicate that NP decreases in each region as N0 increases and α decreases; however, there exist important regional differences in NP sensitivity to α variations. The greatest sensitivity occurred in the Southwest, where productivity exceeded that of any other region when α = 1.0 but was zero when α = 0.05. Although productivity in the Mid-South is also sensitive to α, NP was less than that across the Southwest for the larger α values, while there was a low level of productivity when α = 0.05 and N0 was <250 kg ha−1. The value of NP across the Southeast was less sensitive to α variations than in the other regions, and there was substantial productivity when α = 0.05. The results show that, while cotton growth is a function of both N0 and α, the importance of these stress factors varies among regions. For the Southwest and Mid-South (mainly Texas), rainfall is insufficient and irrigation is the primary water source for crops. Thus, water availability is the key limiting factor for NP, regardless of N0. In the Southeast, however, where rainfall is much greater, irrigation has a relatively small impact on productivity. Thus, NP in the Southwest responds strongly to increased irrigation, while productivity in the Southeast is much greater than in the other regions when irrigated water is low. Given interstate and regional differences, we chose the NP optimization threshed to be 0.75 kg cotton kg−1 NO3. This corresponds to the lowest observed value reported by Bronson et al. (2003).
Figure 3b shows the WP statistics averaged during 1979 to 2005 across the Southwest, Mid-South, and Southeast in terms of varying α from 0.1 to 0.9 in increments of 0.05 for N0 values of 440, 330, and 220 kg ha−1. The results indicate substantial regional WP differences, where productivity was greatest in the Southwest and least in the Southeast. Productivity in the Southwest increased sharply from 971 to 4275 kg ha−1 for all N0 values as α increased from 0.1 to 0.15, and then decreased when α exceeded 0.20. The WP in the Mid-South increased from 1100 to 2123 kg ha−1 for all N0 when α increased from 0.1 to 0.15, and then decreased when α exceeded 0.15. Maximum productivity in the Southeast, 709 kg ha−1, occurred when α was 0.1. The WP then decreased as α increased. In addition to these regional differences, productivity in each region was greatest (least) when N0 = 440 kg ha−1 (220 kg ha−1), where WP sensitivity to N0 variations was greatest (least) in the Southwest (Southeast). These results illustrate that productivity in the Southwest and Mid-South is highly dependent on water availability when α is small. When irrigated water is sufficient, however, additional water leads to a decrease in WP. This is particularly true in the Southeast, where rainfall is sufficient to maximize water production, and the addition of irrigated water causes WP to decrease for all α. These results are consistent with those of Zwart and Bastiaanssen (2004), who reported WP increases with fertilization for wheat (Triticum aestivum L.) yields in Niger, Syria, and Uruguay, and Singh et al. (2010), who found significant WP rises for cotton seeds in response to increased fertilization rates of 80 to 200 kg ha−1 under deficit irrigation. Finally, note in Fig. 3b that, when α becomes large, WP approaches 100 kg ha−1. This means that for large α, an increase in irrigated water amount will not significantly impact water use efficiency. Thus, we chose the WP optimization threshold to be 100 kg ha−1 to ensure a high sensitivity of cotton yields to the irrigation water ratio.
Optimal Parameter Distribution and Validation
To examine the responses of the redeveloped GOSSYM to the optimization method described above, we conducted five experiments:
CNTL: optimize both the irrigation ratio and initial NO3 with WP >100 kg ha−1 and NP >0.75 kg cotton kg−1 NO3
OPTI: optimize the irrigation ratio only and hold the residual NO3 constant at 176 kg ha−1
OPTN: optimize the initial soil NO3 only and apply the original GOSSYM irrigation scheme, i.e., apply irrigation with 19 mm of water whenever the water stress level is <0.75 following Doherty et al. (2003)
ULMT: optimize both the irrigation ratio and initial soil NO3 without any limitation on WP and NP
NOOP: apply the original GOSSYM irrigation scheme and initial soil NO3 of 176 kg ha−1 instead of optimizing them by inverse modeling
The relative mean biases and ratios of standard deviations of simulated to observed cotton yields are shown for each experiment in Fig. 4. The CNTL simulation performed well, where the long-term mean modeled yields were within ±10% of observations in the 30-km grids across major areas of the U.S. Cotton Belt (Fig. 4a). Compared with the CNTL mean biases, both OPTI (Fig. 4b) and OPTN (Fig. 4c) reduced the underestimations in north-central Texas, but they increased overestimations by 4 to 17 and 6 to 15%, respectively, in the southeastern states, with averages of 12 and 7%, respectively. Moreover, OPTI significantly underestimated long-term mean yields along the Mississippi River and Sacramento Valley in California by 11 to 27%, and OPTN overestimated yields by about 17% in areas of North Texas along the New Mexico border. The NOOP simulation (Fig. 4e) systematically underestimated the observed yields in the southwestern states and along sections of the Mississippi River but overestimated yields in the other regions.
The OPTI underestimations that occurred in heavily irrigated regions were caused primarily by insufficient soil NO3 because the irrigated water amounts are adjustable, while the overestimations in the southeastern states indicate that the soil initial NO3 of 176 kg ha−1 is too large for this region. The major cotton-producing areas in the southeastern states and North Texas along the New Mexico border are rainfed, with only limited areas receiving irrigation. The OPTN overestimations show that the old irrigation scheme is not suitable and that irrigated cotton yields are far too large and disproportionately impact grid area-weighted yields. The NOOP result also shows that use of the old irrigation scheme and initial NO3 of 176 kg ha−1 generates excessive yield overestimations by 17 and 38% in the southeastern states and Texas rainfed fields, respectively, as well as substantial underestimations in the more heavily irrigated regions of the southwestern states (38% in California, 65% in Arizona, and 18% in the Mississippi Delta).
Among the five experiments, ULMT (Fig. 4d) produced the best results because it optimized both the irrigation ratio and initial NO3 without placing limitations on WP and NP. Compared with the CNTL experiment, ULMT produced slight improvements in the Southeast and Texas by reducing the underestimation by approximately 3%. This indicates that constraints on water and N productivities in the mathematical and statistical optimizing processes, which account for the actual cropping practices, do not decrease model performance substantially. The improvements generated by OPTI and OPTN in Texas also resulted from the removal of WP and NP constraints, respectively. By optimizing only one key parameter, however, either the irrigation ratio or initial NO3, both OPTI and OPTN simulated larger biases than the control experiment in key regions of the U.S. Cotton Belt.
When the experimental long-term standard deviations were compared with observations, CNTL (Fig. 4f) produced overestimations of 28 to 118% along the Mississippi River, southeastern states, and the southern part of Arizona, with an average of 64%. On the other hand, the average standard deviation was 27% smaller in New Mexico, the Texas High Plains, and southeast Texas. This pattern was produced in all experiments. The reduced variability probably occurred because the observed consequences of pest damage and improved management practices were not considered.
Figure 5 illustrates the geographic distribution of initial soil NO3 abundance in the top 2 m of soil in the root zone and the ratio of irrigated water amount to potential evapotranspiration derived from inverse modeling in the CNTL experiment. The initial soil NO3 abundances were <103 kg ha−1 in the southeastern states and the Texas High Plains but >210 kg ha−1 along the Mississippi River and in the southwestern states. The ratio of irrigated water amount to potential evapotranspiration obtained by optimization did not vary with time. The heaviest irrigation, where the ratio of irrigated water amount was >0.8, occurred along the Arizona–California border. The areas with the greatest irrigation normally coincide with those that have largest initial soil NO3 abundances. Moderate irrigation (irrigated ratio of 0.6–0.7) is located in the Sacramento and San Joaquin valleys in California and isolated areas of the Mid-South states, while relatively little irrigation (irrigated ratio <0.6) occurs across the remainder of the U.S. Cotton Belt.
Given that accurate geographical distributions of observed initial soil NO3 and irrigated water amount are not available, we compared the simulated state-level irrigated water amount with data from the 1984, 1988, 1994, 1998, and 2003 USDA FRIS to indirectly validate the optimization method used in this study (Fig. 6). We did not validate the initial soil NO3 because credible initial NO3 information is unavailable and its yearly variation is factored into the farmers’ fertilization decisions. The results showed that simulated annual irrigated water amounts were in good agreement with the FRIS data in the Southwest and Mid-South during the survey years. In the Southeast, however, FRIS irrigated water amounts had larger uncertainties due to the small fraction of irrigated cotton and large spatial and interannual variations in irrigation practices. Thus, in some survey years, the FRIS irrigated water amounts were zero, such as for Tennessee in 1984, 1988, and 1994. We assumed that the differences between modeled and FRIS irrigated water amounts in the Southeast resulted mainly from FRIS data uncertainties.
Result Improvement from the Original to Redeveloped GOSSYM
When compared with the original GOSSYM (Fig. 2a), the redeveloped GOSSYM (Fig. 4a) substantially reduced long-term yield mean biases from 92 to 4% when averaged across the entire U.S. Cotton Belt. The standard deviation (Fig. 2b vs. 4f) overestimations in Arizona and southwest Texas were reduced from 138 and 90% to 33 and –21%, respectively. In addition, the redeveloped (original) model overestimated (underestimated) standard deviations in the southeastern states by 71% (27%). The larger value (i.e., 71% vs. 27%) does not imply that the redeveloped model has diminished capabilities. In fact, the underestimation produced by the original GOSSYM in this region probably occurred because cotton experiences small N and water stresses (without imposing physical constraints) that cause substantial mean yield overestimations that approximate the optimal values for the cotton cultivar (Fig. 2a). The standard deviation overestimation by the redeveloped model occurred because cotton yields were overestimated (underestimated) during wet (dry) years. In wet (dry) years, the irrigated cotton fraction in a grid is smaller (larger) and less (more) irrigated water is required. Given that the southeastern states contain a mixture of rainfed and irrigated cotton fields, the new autoirrigation scheme can only partially solve this problem because the actual irrigated cotton fraction varies annually and has a large uncertainty due to a lack of creditable data.
Figure 7 compares the spatial frequency distributions of all cotton-grid pointwise relative mean biases, standard deviation ratios, and correlation coefficients of the original and redeveloped GOSSYM. For the relative mean bias, the occurrence of a larger frequency closer to 0.0 on the horizontal axis indicates that the model simulates more grids with smaller errors relative to observations, and hence is more realistic. The redeveloped GOSSYM has a peak frequency of 40% at the relative mean bias of –0.05, while the peak frequency of the original model is 7.5% at the relative mean bias of 1.05. This result clearly shows that the redeveloped GOSSYM substantially improved the simulation of long-term mean cotton yields. For the standard deviation ratio, simulation agreement with observations improves as the peak frequency approaches 1.0 on the horizontal axis. The frequency distribution difference of this ratio between the redeveloped and original GOSSYM is relatively small when compared with their relative mean bias contrasts. Thus, the two models produced interannual yield variability with comparable magnitudes.
A more important capability in modeling variability is the temporal correspondence between GOSSYM and observed anomalies from 1 yr to another. This can be measured by correlating the time series. For this measure, the closer the peak frequency is to 1.0 on the horizontal axis, the better the model captures the observed variations. The redeveloped GOSSYM generates its peak frequency at 17% when the correlation coefficient is 0.65, whereas the original model has its peak value at 16% when the correlation coefficient is 0.25. Based on the Student t-test and assuming yearly independence, correlation coefficients >0.32 are statistically significant at the 0.05 level. Significant correlations were simulated by the redeveloped and original GOSSYM in 87 and 40% of harvest grids, respectively. Therefore, the redeveloped GOSSYM substantially improved the simulation of the observed yield anomaly interannual variability.
DISCUSSION AND CONCLUSIONS
This study focused on the development of the geographically distributed cotton growth model GOSSYM for coupling with the regional climate model CWRF. The redeveloped GOSSYM includes improved physics and interface development, where its key parameters are determined by an optimization method. In particular, model formulation consistency between GOSSYM and CWRF is incorporated for the physical representations of vegetation and soil properties, soil temperature and moisture, evaporation and transpiration, and photosynthesis radiation, as well as the numerical implementation of the dynamic coupling and software engineering. The redeveloped GOSSYM then integrates the best available cultural practice specifications. As a result, the geographic distribution of irrigation practices and initial soil N content remain unknown due to the lack of credible observations.
To close the system, we incorporated an improved autoirrigation model that computes the irrigated water amount once the water stress factor falls below a threshold of 0.75 on the basis of the simultaneous water stress and the ratio of the irrigated water amount to potential evapotranspiration. Hence, inverse modeling (with minimized modeled cotton yield errors) is used to optimize the geographic distributions of the final two key parameters: the ratio of the irrigated water amount to potential evapotranspiration, as a surrogate for the unknown actual irrigation practice (stress threshold, water amount, and application interval), and the initial soil NO3 abundance in the top 2 m of soil. The optimization also includes water and N productivities to account for actual cropping practices and showed that N productivity decreases as N0 increases and α decreases. The sensitivity to α varied across the U.S. Cotton Belt and was strongest (weakest) in the Southwest (Southeast), where water productivity is greatest (least). Productivity in each region increased when α was small and then decreased to a near constant value when α became large. Given regional variations in WP and NP, we chose the NP optimization threshold to be 0.75 kg cotton kg−1 NO3 and the WP optimization threshold to be 100 kg ha−1.
Inclusion of the above improvements and optimizations in the redeveloped GOSSYM produced good agreement with observations, where simulated 1979 to 2005 mean yields were within ±10% of observed values across almost the entire U.S. Cotton Belt. In contrast, the redeveloped GOSSYM with the old irrigation scheme and fixed initial soil NO3 of 176 kg ha−1 generated substantial biases, with systematic cotton yield overestimations in the southeastern states, Mississippi River Delta, and Texas High Plains (heavily irrigated regions). Sensitivity studies further showed that optimizing only the irrigation ratio or initial NO3 increased the mean biases across major cotton growing areas of the U.S. Cotton Belt, including the southeastern states, northern Texas along the New Mexico border, and the Sacramento Valley in California. Introducing the water and N productivity constraints to the optimization process to account for actual cropping practices did not substantially impact the results.
The redeveloped GOSSYM realistically reproduced the geographic distribution of mean annual yields across the U.S. Cotton Belt. Good agreement between simulated annual irrigated water amounts and observations from FRIS data in the major irrigated cotton regions illustrates the improvements that resulted from use of the new irrigation scheme and optimized irrigation ratio. In the companion study (Liang et al., 2012a), the redeveloped GOSSYM was used to study the responses of cotton growth to climatic stresses in the U.S. Cotton Belt and thereby to evaluate its ability to reproduce the observed cotton–climate relationships. The interactive GOSSYM, as coupled with the CWRF model, can be used in future studies to provide more credible and detailed depictions of cotton–climate interactions. The coupled modeling system will have wide applications in the fields of precision farming and policy development as they relate to food security.