Farmland productivity estimation based on vegetation indexes from remote sensing data

. Ensuring food security is a long-term and arduous task. Timely and accurate grasp of grain production capacity information can provide favourable data support for the nation to formulate macroeconomic plans and food policies. With the development of remote sensing technology, it has been widely used in crop yield estimation models. In this paper, the yield of spring maize in Da’an of Jilin province was estimated based on vegetation indexes calculated from Landsat-8 images. The results have shown that the fitting degree and estimation accuracy of yield estimation models at tasselling stage are significantly better than those at milk stage. Among these vegetation indexes, the model based on GNDVI has better fitting degree and estimation accuracy. This paper can provide reference for the post construction evaluation of high standard farmland in China.


Introduction
Ensuring food security is a long-term and arduous task. Timely and accurate grasp of grain production capacity information can provide data support for the nation to formulate related policies. With the development of remote sensing technology, remote sensing has been widely used in crop yield estimation models. For example, Sakamoto et al. [1] used multi-temporal remote sensing data and crop phenology characteristics to establish a statistical relationship between crop yield and vegetation indexes to estimate crop yield, and high estimation accuracy was obtained. J.Q. Ren. et al. [2] took American maize as the research object and each state in USA as the yield estimation area, and selected the best model by developing a relationship between NDVI and estimated maize yield of each state in 2011, and predicted the maize yield per unit. The results showed that the relative error of maize yield was only 2.12%. L.Y. Liu et al. [3] carried out statistical analysis on the ground spectral data and wheat yield data of each growth period, and built the yield estimation model of each growth period by analyzing the correlation coefficient curve, which displayed a higher accuracy. L. Bai et al. [4] measured the reflectance of cotton canopy at different stages with hyperspectral remote sensing data, and analyzed the relationship between spectral reflectance and yield. With the continuous emergence of high temporal and spatial resolution remote sensing data, remote sensing shows more and more incomparable advantages in crop yield estimation. It has become an inevitable trend to combine remote sensing data with traditional statistical data, meteorological data and agronomic data to estimate productivity. In this paper, the yield of spring maize in Da'an City of Jilin Province was estimated by using vegetation indexes from Landsat 8 satellite remote sensing data, to provide reference for the post construction evaluation of high standard farmland in China.

Study area
Da'an (44°57'~ 45°45' N, 123°8' ~124°21' E) (Fig. 1) is a world-famous golden maize zone and a national grain base, situated in hinterland of Songnen plain, the northwest of Jilin province. Selection of Da'an was motivated by its typicality and existing field data. Da'an is about 4879km2, characterized by a continental monsoon climate. In an average year, annual mean temperature is 4.3℃ and annual accumulated temperature is 2921.3℃, and 3012.8 hours of sunshine and a precipitation total of about 413.7mm can be expected across a year. According to the statistical yearbook of Da'an for the past five years, spring maize is the staple crop, accounting for 87% of the total planting area. In this study, maize yield was estimated. there are significant differences in the accuracy of crop yield estimation models based on vegetation index at different growth stages, with the highest accuracy at tasseling stage, followed by the milk stage. Thus, Landsat 8 data taken on July 21, 2017 (tasseling stage) and August 22, 2017 (milk stage) were selected in the study. After radiometric calibration, atmospheric correction with FLAASH and mosaicking, the remote sensing data were then geometrically corrected based on land use change survey data. Besides, clouds were removed with FMask cloud detection function of Envi, as there were several clouds on Lansat 8 images at milk stage.
(2) Vector data of land use change survey. This study obtained the cultivated land data of the land use change survey in Da'an in 2017, including dry land, irrigated land and paddy field. It should be noted that paddy fields are mainly planted with rice, while dry land and irrigated land are planted with maize. Therefore, the map spots of paddy field were excluded from the remote sensing images, and the vegetation indexes of dry land and irrigated land were then calculated. According to the investigation, there was no field interplanting in Da'an, and crop species can be distinguished based on the land change survey data (Genovese G et al., 2001). Therefore, the mixed pixel decomposition of spring maize was not considered.
(3) Survey data of spring maize yield. This paper adopted the method of household survey to obtain the spring maize yield data of 98 samples in 2017.

Vegetation index
Combining Landsat 8 images and vegetation characteristics of spring maize and referring to some literatures [5][6][7], six vegetation indexes closely related to crop yield were selected in the study, which were given in Table 1.

Vegetation Indexes Equation
The Green Normalized Difference Vegetation Index (GNDVI)

SIPI = (NIR-B)/(NIR+B)
Enhanced Vegetation Index 2 (EVI2) R is red band reflectance, G is green band reflectance and NIR is near-infrared reflectance; L is a soil adjusted coefficient. Generally, an L=0.5 is the default value, which is used for correcting for the influence of soil brightness. x is an adjusted coefficient, and an x=0.16 is the default value, which can optimize L.

Spring maize yield estimation model based on remote sensing technologies
Linear regression model was adopted in the study, which applied least squares to establish a statistical relationship between spring maize yield and vegetation index. The regression line is: Where b is a constant, a is the regression coefficient, VI (vegetation index) is the value of the independent variable, and Y is the value of the dependent variable (yield). 69 samples (70%) were randomly chosen from the whole 98 samples using Geostatistical Analyst tool of ARCMAP, to develop a model, and 29 samples (30%) were used for validation.

Accuracy assessment
R2 (determination coefficient) and RMSE (root-meansquare error) were adopted for accuracy assessment. The higher R2 is, the better the model fitting degree is. The lower the RMSE, the higher the model accuracy is.

Results and analysis 4.1 Preliminary analysis of sample data
Among the 98 samples, the minimum yield was 2175 kg/ hm2 and the maximum yield was 7500 kg/ hm2. Fig. 2 showed the frequency histogram of sample yield with a normal distribution curve. Wholly speaking, the distribution was skewed. The spring maize yield was concentrated at 5,000~6,500 kg/hm2, which was representative to a certain extent.  Fig. 3 showed the R2 and RMSE of models built at tasseling stage and milk stage, S1 represented tasseling stage and S2 represented milk stage. The R2 at tasseling stage was greater than 0.6, with the highest value of 0.75. The R2 at the milk stage were less than 0.2. RMSE at tasseling stage was about 600 kg/ hm2, and RMSE at milk stage was between 900 kg/ hm2 to 1000 kg/ hm2. It can be seen that the fitting degree and evaluation accuracy of the yield estimation model at tasseling period are significantly better than that of models at the milk period.

Analysis of yield estimation models based on different vegetation indexes
Regression equations were established after remote sensing data taken on tasseling period was further analyzed, and the result were shown on Table 2. In terms of model fitting degree, the model built on GNDVI provided the highest R2 (R2=0.756), while other models provided a slightly lower R2 (all are greater than 0.6), so the overall fitting was good. In terms of model estimation accuracy of training samples, the model built on GNDVI provided with the lowest RMSE (RMSE=531.74), while RMSE of the rest were above 600. In model estimation accuracy of testing samples aspect, the model developed on SIPI provided with the lowest RMSE, while the rest provided slightly higher RMSE. In general, models based on GNDVI was good at fitting degree and yield estimation accuracy. The estimated yield distribution diagrams of spring maize based on the GNDVI models were shown in Fig. 4.

Conclusions
In this paper, linear regression models were established to estimate the spring maize, and the performance of different vegetation indexes were compared. The result indicates that the fitting degree and evaluation accuracy of the model built on tasseling period data are better than that of the model built on milk period data. Among all vegetation indexes, the model based on GNDVI exhibits better performance on fitting degree and estimation accuracy. However, when the actual yield of samples is lower than 5500 kg/hm2, the model will overestimate the yield; otherwise, the model will underestimate the yield.

Discussions
This study attempted to develop multiple linear regression models using multiple vegetation indexes, but the regression model can not meet the statistical requirements due to collinearity among the indexes. Ongoing work can be focused on two aspects. First, in terms of model construction, the linear model is currently used to fit the complex relationship between spring maize yield and vegetation indexes, and the artificial neural network algorithm can be explored to improve the fitting accuracy of the model in the future. Second, in terms of selection of remote sensing data, GF-2 data with higher spatial resolution and GF-6 data carrying red-edge band can be adopted in the next step, which may improve the estimation accuracy.