Extraction of Refined Crop Type Over Agriculture Region of Heilongjiang

. The spatial distribution of fine crop types at regional scale is required by numerous research communities. The traditional approach with limited time-phases is hard to capture the signatures presented within different growth period of various crop types. With the improvement of understanding on the phenology feature of major crops and the accumulation of satellite-based observations, there is a chance to distinguish detailed crop clusters with elevated accuracy. In this work, we investigated the phenological feature of four dominant crops (soybean, wheat, maize, and paddy) in multi-spectrum space through ~800 representative crop samples within typical agriculture regions of Heilongjiang using MODIS daily surface reflectance product covering related growth period of 2005-2018. Features with the high degree of separation among land cover clusters are screened out to construct the model in identifying typical crop types in terms of weighted temporal features and classification scheme, which is applied to extract the crop map of Heilongjiang province in 2019. The results show higher accuracy achieved over main agriculture region of soybean, wheat, maize, and paddy, and reduced accuracy over field of wheat or other mixed crops at MODIS pixel scale. Our validation shows the overall accuracy of 0.9816 and kappa coefficient of 0.9702 through the comparison with ~3000 random selected ground sites. The preliminary application of the presented approach performs well via the capture on valid phenology features of major crops within dominant agriculture region of Heilongjiang, with the potential to serve the extraction of fine crop types over wide agriculture regions.


Introduction
Accurate distribution of crop types is one of important datasets for food security, agricultural management and policy development, environmental sustainability and land surface ecosystem monitoring [1][2].With the accumulation of satellite-based observation in medium or high resolution, there is a chance to develop effective methods in retrieving crop information and distinguishing various crop types, thus provide the potential refined crop map for user communities [3].Numerous studies demonstrated the importance of multitemporal information in crop identification.Supervised methods have been used most widely in multi-temporal classification, like decision tree [3], random forest [4], support vector machine (SVM) [5] and other nonparametric methods [6].These methods can process multiple time-phase images to identify various land cover clusters.However, there is still a lack of the investigation on phenology features of different crops, especially in the satellite-based thematic retrieval.
Time series classification method has great potential in crop identification, due to the ability to capture unique growth rhythm and intrinsic phenological signatures [7].
In this study, we used MODIS data to extract a variety of time series features, and established a time series feature classification model to improve the accuracy of crop identification.Heilongjiang (HLJ) is the northernmost province of China, plays the significant role in the supply of grains and agriculture management.The total sown area is 147,700 km 2 , which is mainly distributed in Songnen Plain (SNP) and Sanjiang Plain (SJP).Now, percentages of plant area for soybean, maize and paddy respectively have occupied up to 40% ,13% and 12% of the total grains' amount within China.Wheat was widely planted in northern HLJ before 2010, but the current planting area has been reduced to less than 1% of the total planting area of the province [9][10].HLJ has a continental monsoon climate.The accumulated temperature (>10℃) is 1800-2800 ℃, and the frost-free period is between 100-150 days.The annual rainfall averages 300-700mm, concentrated mainly from June to August [10].

Study Area and Data
MODIS provides operational standard products at 500m in daily, 8-day, and monthly data suits.We adopted the daily MODIS surface reflectance dataset MCD43A4 (collection 6) Nadir Bidirectional Reflectance Distribution Function (BRDF)-Adjusted Reflectance (NBAR) dataset during the growth period of 2005-2019, and as well as the related quality assessment layer to screen clear observations with less aerosol contamination.
We collected 3880 representative crop samples over the main agriculture region of HLJ from 2005 to 2018, according to the ground true data in 2018 and highresolution remote sensing images (Fig. 1).In this study, we randomly selected 20% crop (~900) samples to construct the fuzzy membership model.The remaining 80% (~3000) samples were used to verify the classification accuracy.Due to the total sown area of HLJ almost unchanged in 2019-2020 [10], the GlobalLand30 of 2020 was used to screen out the nonagricultural region to tighten this work on the identification of fine crop types.In addition, the 10-m crop maps [11] in Northeast China in 2019 was used for cross-comparison.

Time series Vegetation Index
Vegetation Index is frequently considered as the enhanced signature of crop types than individual reflectance band, which combines multiple bands of reflectance to highlight vegetation information and suppress disturbance from non-vegetation surface clusters.We investigated phenology features of soybean, maize, paddy, and wheat through seven MODIS spectral bands and nine selected indices (shown in Table 1) that have been widely used in detecting different biophysical characteristics of vegetation, especially on the change of chlorophyll content, water content and canopy structure.

B B B B 
Normalized Difference Water Index, NDWI [13]

Separability Index (SI)
We adopted SI to examine the separability of above selected factors in different growth period among crops.Somers defined SI as the ratio of inter-class spectral heterogeneity to intra-class spectral heterogeneity in 2010 [14].For the spectrum of the two object types, SI increases when the intra-class spectral variance (∆intra) decreases and the inter-class distance (∆inter) increases, whereas decreases [10].We calculated SI for each pair of crop types during DOY 100-300, and a global SI using the average method to describe the separability of time series features among all types.The formulas are as followed: 1.96 where SIij is the SI of type i and type j (i, j = 1, 2, …, n); n is the number of classes; v refers to a band or a VI in this study; d refers DOY; μ is the mean value of features; σ is the standard deviation of features; n is the number of all types; M is the number of class pairs.

Fuzzy Membership Model (FMM)
The FMM was used to evaluate the similarity of features between unclassified sample to target type.Fuzzy membership degree, from 0 to 1, is used to describe the degree of similarity between a sample to target class [15].The formula is as followed: where xk is the feature value of sample k; sik means the fuzzy membership of sample k to class i; v, d, μ and σ are similar as the above formulas.

Multi-stage Weighting Model (MWM)
The MWM was used to screen out the features with the high degree of separation among land cover clusters, and identify the crop types according to the multi-stage results.In each stage, a normalized SI matrix was used to weight the FMM, and returned membership degrees of the unclassified sample belonging to target types.The weighting formulas are as followed: where pik means the probability of sample k belonging to class i; SIN is the normalized matrix of all SI (removing the values when the feature of sample i miss observation or the value is below the threshold of t1); m means the m-th (the m times) judgment currently in progress; s, v and d are similar as the above formulas.
Firstly, we used the global SI to weight fuzzy membership matrix and summed up the weighted matrix for identifying the sample type as the first classification stage (m=1).The first stage returns the weighted membership degree which means the similarity degree of sample k belongs to each type.When the membership degree of class i is higher than t2 (t2 is the value of s when xk-μ=2σ) and higher t1 than the membership degree of other types, it is considered that sample k belongs to the type i.When the membership degree of type i is greater than t3, the possibility of sample k belonging to type i the is marked +1, and the possibility belonging other type is marked 0. Otherwise, demanding the first stage results invalid.
Secondly, we evaluated the possibility of belonging to all paired crop types (m>1).When the membership degree of class i is higher t3 than the membership degree of another, the possibility with the higher membership type is marked as +1, and the possibility with the lower membership type is marked as -1 Otherwise, the possibility of the judgment in m-th stage is marked as 0.
Finally, the possibilities of sample k belonging to each type are summed together.When the weighted membership degree returned in any stage is less than t4 (t4 is the value of s when xk-μ=3σ), the possibility of sample k belonging to this type will become 0. When there is a possibility greater than 0, take the category with the maximum possibility as sample k.If the minimal probability is less than or equal to 0, marked sample k as non-crop type.In this study, the parameters t1, t2, t3 and t4 in classification were set as 0.7788, 0.3000, 0.1000 and 0.3679.

Crop separability
Fig. 3. shows a global SI chart (a) for multiple crops and six SI charts (b-g) for each pair of crops (soybean, wheat, maize, and paddy) during the growth period of DOY 100-300 (x-axis) at seven spectral bands and nine indices (y-axis).The higher SI value means the higher separability for target crop types over 16 indices and 201 time points, and the corresponding grid becomes increasingly red.In this study, most of features with high SI values exhibit as the NIR-or SWIR-related indices, which are respectively linked with crop biomasses at the green-up and maturing stages, or the water content.The pairwise SI between wheat and other crop types (Fig. 3. b, e, f) perform highest value about 1.8-2.4,especially to soybean and maize (b, e), and almost all of VI can effectively distinguish wheat from other crops, during DOY160-210.To reflectance features, the highest SI of reflectance features is only concentrated in low vegetation cover periods (DOY140-160), and the highest value is about 1.5, which is still much higher than the maximum SI between soybean and maize.
The SI between soybean and maize (Fig. 3. c) are relatively lower than other crop pairs with maximum 1.3.NDVI is the most widely used VI in crop classification, but the highest SI value is only 0.8.The higher SI between soybean and maize (1.0-1.3) is mainly showed in SWIR-related bands and VI (SWIR1, SWIR2, LSWI, Fig. 3.Estimated SI values of each pair of crops during the growth period of DOY 100-300 (x-axis) at seven spectral bands and nine indices (y-axis).
In this part, the four crop types have showed a great possibility to distinguish each other by the time series features.There is the highest separability between wheat and other types, due to the growth and the dry leaf period of wheat are earlier 30 days approximately than other crops.Similarly, paddy is extractable, due to the spectral features during irrigation and transplanting period (DOY140-160).The SI between soybean and maize is slightly lower, but the time series features with relatively high SI last for more than 40 days, which may give helps to effectively distinguish them.

Classification results
We applied the above presented method to classified 3103 random selected ground sites during 2005-2018 and the overall agriculture region of HLJ in 2019 using MODIS daily data of DOY 100-300.Table 2. shows the classification results of 3103 random selected ground samples.Each column of the matrix represents the indicates in actual crop types, and the row of the matrix represents the indicates in predicated crop types.In case of the classification results during 2005-2018, the overall accuracy is 0.9816, and the kappa coefficient is 0.9702.shows the extraction results for four crop types over the agriculture region of HLJ in 2019.Maize is the most sown crop in HLJ, followed with paddy, and they are mainly distributed in SJP and the south of SNP.Soybean is mainly distributed in the north of SNP.Wheat is the least sown crop and sporadically distributed in the north of HLJ.The 10-m crop maps in Northeast China in 2019 was used for cross-comparison, as showed in Fig. 5.In the soybean dominant planting region (Fig. 5. a1, a2), the extracted soybean area of MODIS-based 500-m crop map was underestimated compared to the soybean area of 10-m crop map.It has the same performance in maize dominant planting areas.For most of paddy dominant planting regions, the extracted paddy distribution in this study is highly consistent with the paddy distribution of 10m reference map.In some boundary of maize field and paddy field, numbers of MODIS pixels with mixed crop types were divided into soybean, although soybean is rarely planted in this area (Fig. 5. b1, b2).Besides, the classification results performed weak over the field of wheat or other mixed crops at MODIS pixel scale, because of the increase of fragmentation crop fields which are far smaller than 500m MODIS pixel.

Conclusions
We developed a physiological feature determined classification model at daily intervals during the growing season, and we classified four major crops (soybean, wheat, maize, and paddy) in case study of agriculture region of HLJ.The results show higher accuracy achieved over main agriculture region of soybean, wheat, maize, and paddy, and reduced accuracy over field of wheat or other mixed crops at MODIS pixel scale.
Compared with the traditional multi temporal classification method, our model potentially improves the degree of automatic extraction of fine crop categories in the large region scale, due to the classification mechanism is interpretable and flexible.In the following study, we are going to continuously contribute efforts to improve the model and the identification of the misclassified targets.

Fig. 2 .
Fig. 2. The spectral reflectance and vegetation index (VI) of crops during the growth period of DOY 100-300 (x-axis), which the solid lines represent the mean value of features, and the translucent zones represent the standard deviation of features.

Fig. 3 .
Fig.3.shows a global SI chart (a) for multiple crops and six SI charts (b-g) for each pair of crops (soybean, wheat, maize, and paddy) during the growth period of DOY 100-300 (x-axis) at seven spectral bands and nine indices (y-axis).The higher SI value means the higher separability for target crop types over 16 indices and 201 time points, and the corresponding grid becomes increasingly red.In this study, most of features with high SI values exhibit as the NIR-or SWIR-related indices, which are respectively linked with crop biomasses at the green-up and maturing stages, or the water content.Fig. 3. (a) shows the global SI of four crop types with highest SI value of 1.4.The global SI higher than 1.0 is mainly during DOY 210-280, especially in NIR, SWIR2 and nine VI.However, the highest SI value during 140-160 (0.8-1.0) is much lower than the value of some pairwise SI, especially to Fig.3.d and g (>1.5).The pairwise SI between wheat and other crop types (Fig.3.b, e, f) perform highest value about 1.8-2.4,especially to soybean and maize (b, e), and almost all of Almost all samples of wheat and paddy were correctly classified.Soybean and maize samples were slightly misclassified.The user accuracy of soybean (0.9330) is lower than the producer accuracy (0.9953), means the soybean area extraction by the model may be overestimated.Relatively, the maize area extraction by the model may be underestimated.In this study, about 40% agricultural samples were correctly classified by the first classification stage in MWM, and 60% agricultural samples were correctly classified by the 2-7 classification stage in MWM.

Fig. 4 .
Fig. 4. The extracted crop map (500m) of HLJ in 2019 Fig.4.shows the extraction results for four crop types over the agriculture region of HLJ in 2019.Maize is the most sown crop in HLJ, followed with paddy, and they are mainly distributed in SJP and the south of SNP.Soybean is mainly distributed in the north of SNP.Wheat is the least sown crop and sporadically distributed in the north of HLJ.The 10-m crop maps in Northeast China in 2019 was used for cross-comparison, as showed in Fig.5.In the soybean dominant planting region (Fig.5.a1, a2), the extracted soybean area of MODIS-based 500-m crop map was underestimated compared to the soybean area of 10-m crop map.It has the same performance in maize dominant planting areas.For most of paddy dominant planting regions, the extracted paddy distribution in this study is highly consistent with the paddy distribution of 10m reference map.In some boundary of maize field and paddy field, numbers of MODIS pixels with mixed crop types were divided into soybean, although soybean is rarely planted in this area (Fig.5.b1, b2).Besides, the classification results performed weak over the field of wheat or other mixed crops at MODIS pixel scale, because of the increase of fragmentation crop fields which are far smaller than 500m MODIS pixel.

Table 2 .
The confusion matrix of classification results for samples in HLJ during2005-2018 (the Soy-means Soybean).