Study on desertification reversal factors in Maowusu sandy land in China

Germany Abstract: Based on the data of China's economic and social big data platform from 2000 to 2019, this paper studies and analyzes the development status and influencing factors of desertification in Maowusu sandy land in China. Based on the exploratory analysis (EFA) and the dummy variable regression model (DVRM), the research result shows that the annual precipitation is the main climate factor affecting the vegetation coverage in this area. And for every 100 mm increase in annual precipitation, vegetation coverage will increase by 10%. In addition, the annual average temperature also has a significant impact on the vegetation coverage. For every 1 ℃ increase in the annual average temperature, the vegetation coverage will increase by 2.5%. Analysis of policy factors shows that the policy effects of the 2005 National Desert Control Plan (2005-2010) and the 2011 National Desert Control Plan (2011-2020) etc. can increase vegetation coverage by 3.4% and 4.7%respectively compared with the base period level in 2000. The study reveals the important role of climate and policy factors in the reversion of desertification in Maowusu sandy land. The study is of great significance and value to desertification management and related policy-making in


Introduction
In order to build the forest shelter belt in northern Shaanxi, early research on desertification of Maowusu carried out field surveys of desert natural conditions and established some experimental stations. Researchers only found out that the important role of human factors through the difference of desertification degree. Since then, most scholars have used satellite remote sensing technology to move from the initial static measurement to more dynamic research [1][2]. Most of the studies suggest that the desertification of Maowusu sandy land is in a reverse situation [3][4]. The corresponding follow-up study also proves that from the 1980s to the first decade of the 21st century, the reversion of desertification in Maowusu sandy land still shows a positive development trend in anti-desertification in general [5]. Further research also uses geographic information system technology, while simultaneously showing the spatial and temporal trends of the reversal of desertification development in Maowusu sandy land, and visually proves the structural reversal of desertification [6][7].In earlier studies of climate change in the Maowusu sandy land, the study basically reached a consistent conclusion of the trend of warming and drying [8][9][10]. According to this, most scholars have affirmed the important role of human factors in the reversion of desertification, but only a few researchers have analyzed the corresponding changes in vegetation coverage caused by different landuse mode, industrial structure, policy factors and other influencing factors [7]. In the 21 st century, the global climate has undergone major changes, some researchers believe that the precipitation and temperature are rising in sync, and the precipitation plays a decisive role in the reversion of Maowusu desertification [11][12]. In these researches, response variables often use vegetation coverage or forest coverage to represent different degrees of desertification, and to analyze the impact of sand desertification reversion.
In recent years, it can be found that most of the studies are only based on remote sensing technology for visual display, and then through the review of the literature to obtain more subjective factors to judge the impact of sandy desertification reversal and put forward corresponding suggestions, none of which has been quantitatively analyzed and discussed of the factors affecting the reversal of it [13]. Other studies only show the influence of human factors and natural factors on desertification through a single correlation analysis. Even if some studies believe that there are only natural or human factors, they cannot be measured scientifically from a quantitative perspective, nor can they clarify the interaction mechanism and role between different factors [14].
Based on above study, this paper tries to introduce some dummy variables and their interaction terms, through quantitative analysis, measures the impact of human factors and natural factors on the reversion of desertification in Maowusu sandy land, especially using the systematic method in System Engineering Theory and comparative analysis method in Public Policy Theory tries to measure the influence of the policy on desertification reversal factors to better support for desertification control decision-making and desert ecosystem management in China [15][16].

data source
This research is mainly based on the data of China's economic and social big data platform [17] and China's economic and social development statistical database [18]. According to the economic and social development statistics database of Shaanxi Province, Inner Mongolia Autonomous Region and Ningxia Hui Autonomous Region [19][20], this study selects and analyzes the data of some cities around the Maowusu sandy land. Due to the following two reasons: (1) due to the limitation of database statistics, a number of key indicators in many regions before 2000 and after 2015 are seriously missing. Considering the coverage of the selected data and the availability of data, the research period selected in this study is 2000-2015; (2) since 2000, major national ecological projects such as the project of returning farmland to forests and grasslands, and the pilot project of wind and sand source treatment in Beijing and Tianjin etc. have been carried out successively, to avoid estimation errors caused by such ecological projects, taking into account the impact of policies, the starting point of data is determined to be 2000. In addition, in 2005, the State Council approved the National Desertification Control Plan (2005)(2006)(2007)(2008)(2009)(2010), and later in 2011 also approved the National Desertification Control Plan (2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020) [21], and issued related policy. Therefore, 2005 and 2011 were selected as important time nodes of national desertification prevention and control, and three different time periods are divided to analyze the policy impact of different time periods of desertification prevention and control projects [22][23]. In the analysis, this study chooses 2000-2015 as the research period. The study areas involved were all within the Maowusu Sandy Region, including Etuokqianqi 1 , Etuokeqi, Wushenqi, Yijinhuoluoqi, Yanchi County, Yuyang District, Shenmu, Hengshan, Jingbian and Dingbian county and etc. The main data are shown in Table1. 1 Qi, a county level administration region in Mongolia, where most residents are Mongolians.

variable description
Based on the existing research, according to the objectives of this study, the main variables selected are as follows: (1) Vegetation coverage (v_ratio): refers to the percentage of vegetation area in a region over the total land area. Vegetation coverage is an important indicator of the richness of forest resources, greening level and ecological balance. When calculating the vegetation coverage, the vegetation area includes the area of arbor forest and bamboo forest land with a canopy density of 0.2 or more, the area of shrub forest land specially stipulated by the state, the area of farmland forest network, and the forest coverage area of trees by village, house, water and roadside. It also includes the herb cover area [9].The unit is %.
(2) Annual rainfall (rain): refers to the depth of accumulated water that falls to the horizontal plane within the statistical time of the year, if there is no leakage, no loss, and no evaporation. Its unit is mm.
(3) Annual average air temperature (temp): temperature refers to the temperature of the air. It is measured by a louver 1.5m from the ground, and its unit is Celsius (° C). The average annual temperature is the sum of the 12-month monthly average temperature divided by 12, and the monthly average temperature is the sum of the average temperature of each day of the month and divided by the number of days in the month.
(4) Logarithm of regional GDP (lngdp): regional GDP is the final product of a region's production activities within a certain period of time. There are three forms of expression: value form, income form, and product form. In actual accounting, it is calculated by three methods: production method, income method and expenditure method, and it reflects the regional GDP and its composition from different aspects. In this paper, according to the general practice of the general literature, the logarithmic value of regional GDP (lngdp) is used as an explanatory variable to analyze the impact of economic development in different regions on the reversal of desertification [24].
(5) Logarithm of the total output value of agriculture, forestry, animal husbandry and fishery in the region (lnprod): includes the planting industry and the industrial part of farmers' households in the total agricultural output value, among which the planting industry refers to the cultivation of food crops, cash crops and vegetables, fruits and fruits. In addition, the output value of forestry includes forest culture and management, forest products, and timber harvesting at and below the village level; and the output value of animal husbandry includes livestock, poultry raising, and hunting. Finally, the fishery output value was calculated uniformly, and the fishery output value was mainly calculated according to the production method. This article also follows the common practice in related literatures, using the logarithmic value of the total output value of agriculture, forestry, animal husbandry and fishery in the region (lnprod) as an explanatory variable to reflect the production level of agriculture, forestry, animal husbandry and fishery in the region [25]  (6) Logarithm of per capita disposable income of residents in rural pastoral areas (lnp_inc): refers to the sum of residents' final consumption expenditure and savings, that is, the income that residents can use for free disposal. It includes both cash income and in-kind income. According to the source of income, disposable income includes four items: wage income, net operating income, net property income, and net transfer income [26]. For the part of residents in rural pastoral areas, that is, the disposable income of residents in rural pastoral areas, this article also uses the common practice of relevant literature, regarding the logarithm of disposable income of residents in rural pastoral areas (lnp_inc) as an explanatory variable to analyze the impact of income on desert prevention and control effects [26].
(7) Policy influencing factors (policy_1 and policy_2): since 2005 and 2011 are considered as two important time points and as two important years of national desert prevention and control policies [22], this article introduces two dummy variables, policy_1 and policy_2.The former represents the policy effect of the National Desert Control Plan (2005)(2006)(2007)(2008)(2009)(2010),and the latter represents the policy effect of the National Desert Control Plan (2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020),and since 2000 the project of returning farmland to forests and grasslands, and the pilot project of wind and sand source treatment in Beijing and Tianjin etc. major national ecological projects have been carried out which obviously affects the ecological environment of Maowusu sandy land area, and then use them to evaluate the policy Impact [27]. In the analysis, the sample data from 2000-2004 was classified as policy 0 state, and its dummy variables were assigned to policy_1 = 0 and policy_2 = 0. The sample data from 2005-2010 was classified as policy1 state and its dummy variables were assigned to policy_1 = 1 and policy_2 = 0. the final sample data from 2011-2015 is classified as policy 2 status, and its dummy variables are assigned policy_1 = 0 and policy_2 = 1.For the three policy states, the following table gives the descriptive statistical analysis results of the statistics of each variable ( Table 2).

The relationship between variables
First, this study uses the GGally package of R language to analyze the correlation of each variable, and then visually display the correlation. The final product of graphic drawing is shown in Figure 1. The graph clearly shows the general relationship between the variables through the concise form of the lower triangle, with gradient colors and pearson correlation coefficients. Among them, lngdp, lnprod, and lnp_inc show an exceptionally strong positive correlation, and together reflect the economy characteristic factors, that is, the level and degree of economic development; and the correlation between rain and temp is relatively high, which to some extent represents the impact of natural environmental factors; in addition, the relationship between v_ratio and the other variables is not very close, here, we use factor analysis to further analyze the relationship between other variables.

Economic and natural environmental factors analysis
Before factor analysis, the KMO and Bartlett's testis required. The results show that Kaiser-Meyer-Olkin Measure of Sampling Adequacy is 0.725, which is greater than the standard of KMO value of more than 0.6 required by factor analysis [25], which indicating that factor analysis can be carried out furtherly. First, exploratory factor analysis is performed on six variables: v_ratio, rain, temp, lngdp, lnprod, and lnp_inc, which can be seen from the scree plotwith the reference line (see Figure 2).The figure shows that when eigenvalue=1 and component number greater than 2, the scree plot shows a significant turning point and become stable, so it is safe to extract 2 factors. Second, the variance analysis of the extracted 2 factors was further conducted, and the total variance explained table of the extracted factors was presented in Table 3. It can be seen from the table that, when a factor is extracted, the extraction sums of squared loadings of the factors is 71.293, and the cumulative variance is also 71.293.When the two factors are extracted, the extraction sums of squared loadings of the factors is 15.184, the cumulative variance is 86.477, and the cumulative variance after rotation is also equal to 86.477. The extracted factors can well explain the original 6 variables. Therefore, it is reasonable and safe to extract two factors, that is preliminary named economic factors and natural environmental factors. Third, based on the above results, factor score coefficient matrix and rotated loading plots are further calculated. The specific results are shown in Table 4 and Figure 2.   Further, see the calculation results in Table 4 that variable rain has an important influence on principal component 2 with a score coefficient of 0.984, while the other five variables, that is v_ratio, temp, lngdp, lnprodand lnp_inchave an important influence on principal component 1 with score coefficients of 0.269, 0.206, 0.216, 0.219 and 0.232 respectively. These two principal components can also be a rough summarized as natural environmental factors (fa_nat_enviro) and economic factors (fa_economy) respectively ( Figure  2),this further verifies the result of factor analysis extraction. Therefore, the factor equation can be written as following: (1) Accordingly, through factor analysis, the dimension was reduced, and it was found that natural environmental factors and economic factors had important relations with vegetation coverage (v_ratio).

Dummy variable regression
Dummy variables are an artificially quantified coding form of qualitative things, which can introduce classification and ordering into regression analysis [25].
Existing literature points out that regression analysis and analysis of variance are essentially the same type of statistical method, the difference lies in the difference between the continuity of the independent variables and the classification, and the regression analysis is more superior in the calculation process and interpretation strength. Therefore, the dummy variable regression model (DVRM) agrees with the analysis of variance and has more analytical power. In the dummy (2) Table5 shows the OLS estimation results ofv_ratio. Without any control variable, the status of policy_1 relative to policy 0 can significantly increase the v_ratio by 0.040 (p <0.05), and the status of policy_2 relative to policy 0 can significantly increase the v_ratio by 0.055 (p <0.01). After the introduction of control variable rain, the above two numbers decreased to 0.039 and 0.052 respectively. Further introducing the temp control variable, the two effects continued to decrease to 0.034 and 0.047, but there were still significant effects. It can be found that in Model D, the above-mentioned effect value has been greatly improved; but after stepwise regression, it is found that the AIC value of the model controlling fa_economy is -765.23, and the AIC value of the model is -766.08 after removing the control variable [26]. Therefore, the stepwise regression method deletes the fa_economy control variable, and the analysis should still be based on model C. The revised determination coefficient of model C is 0.172. In the significance test of its regression equation, that is, the F test, the explanatory variables included in the model combined have a significant effect on the explanatory variables (p <0.05). In addition, t test results show that variable rain has a positive effect on v_ratio of 0.001 at a significance level of 1%, and temp has a positive effect of 0.025 on v_ratio at a significance level of 1%. It is worth mentioning that the two coefficients of the dummy variables, policy_1 and policy_2, mean that relative to the state of policy 0, after controlling other variables, policies 1 and 2 can increase v_ratioby 0.034 and 0.047, respectively. In summary, the two policies represented by dummy variables, annual precipitation (rain), and average annual temperature (temp) all have significant positive effects on vegetation coverage.
On the basis of Model C, in order to explore the influence of other factors under different policy conditions, the interaction effects of dummy variables are considered. Four interaction items are constructed through two dummy variables, policy_1 and policy_2, and two continuous variables, rain and temp: p1_rain, p1_temp, p2_rain, and p2_temp. The initial model can be expressed as: (3) Similarly, the variable selection is performed by stepwise regression. The AIC value of this initial model is -770.24, and the AIC value after automatically deleting the two variables p1_rain and p1_temp is -772.54. Therefore, the final model introduced the variable p2_rain obtained by multiplying policy_2 and rain, and the variable p2_temp obtained by multiplying policy_2 and temp. For p2_rain, except that the observation value with a value of 1 of policy 2 is still the same as the rain value, the values of the p2_rain variables of the other observation samples are all 0, and the value of the variable p2_temp is the same. The final model can be expressed as: The regression results of the model are listed in Model E in the above table. As a whole, the revised determination coefficient is increased by 0.214. After adding the interaction term, the originally significant policy_2 and rain are no longer significant, and temp still maintains a strong level of significance. The remaining p2_rain, p2_temp, and policy_1 reject the null hypothesis that the coefficients are 0 at the significance levels of 1%, 5%, and 10%,respectively. This result shows that the annual average temperature still has a significant positive impact on vegetation coverage. In addition, the interaction of annual precipitation (rain) and annual average temperature (temp) under policy_2 will affect vegetation coverage in different directions.

Regression diagnosis
First, test the normality hypothesis. The QQ diagram is shown in the lower left ( Figure 3). This method plots the quantile of the normal distribution over the quantile of the residual as a scatter plot. If the error obeys the normal distribution, it will be concentrated near the 45 ° line. It can be seen that most of the sample points are closer to the 45 ° straight line, but very few sample points have large deviations. In addition, in order to understand the distribution of data more intuitively, this article also draws a histogram and a kernel density curve, as shown in the lower right figure (Figure 3). It can be seen that the distribution of the residuals does not completely conform to the form of a normal distribution, and there is a certain skewness. Because the way of visualization is more subjective, the JB test and D′ Agostino test are further selected. The JB test uses the sample estimates of the skewness and excess kurtosis of the residual term {e 1 , …, e n }, and then calculates the weighted average of its square as the test statistic, and its degree of freedom is 2:

Figure 3 Test graph for normality
Secondly, calculate the skewness and kurtosis of the v_ratio of the dependent variable, and use the sample size, skewness, and kurtosis values to calculate the JB is 2.453 and get the corresponding p value according to theχ^ 2 (2) distribution. D′ Agostino test is an improved method of normal test. It designs more complex statistics and provides calculations by the official program of Stata. In the end, the p-values of the two tests were 0.293 and 0.154, so they accepted the null hypothesis of normal distribution at a significance level of 5%.
Although the model (3) deletes the fa_economy control variable by stepwise regression method, in order to observe the collinearity effect more intuitively, this article also uses VIF (Variance Inflation Factor) to detect, which was calculated by the multiple determinable coefficients determined by multiple explanatory variables after auxiliary regression, namely: Generally speaking, if VIF <10 is required, the collinearity problem should not be concerned. Based on model E, four models are derived and the VIF sizes of rain, temp, policy_1, and policy_2 are calculated. For model (1), since no variable changes were changed to model E, it can be seen that the interaction terms of the two dummy variables and policy_2 are much larger than 10, so there is a serious collinearity problem. Next, the interactive terms of dummy variables are deleted one by one for the three models. Only after the two interactive terms are finally completely deleted, the VIF value and the square root of each variable can reach the corresponding requirements. This also shows that after variable elimination, there is no collinearity problem in model (4) or model C.
The specific test results are shown in the following table (Table 6): As for the system bias problem of model setting, if there is a nonlinear term in the equation, the marginal effect of explanatory variable on the explanatory variable will be related to the explanatory variable itself or other explanatory variables, resulting in the missing bias of variables.
In this paper, firstly, through reset test, the nonlinear term is introduced into the equation and the significance of its coefficient is tested. In the study, the quadratic term, the cubic term and the quadratic term of each explanatory variable are respectively introduced, namely The test hypothesis is H_0: δ_2 = δ_3 = δ_4 = 0. If the F test rejects the null hypothesis, then there is a problem of missing high-order terms in the model. The results show that the p-value is 0.741, indicating that there is no omission of higher-order nonlinear terms, and the original model can be used directly.
This article then uses the join test to perform the following regressions: The null hypothesis that the coefficient of the square of the fitted value is 0 is tested. The test results are shown in the following table (Table 7). It can be seen from Table 7 that the squared term of the fitted value (f_hatsq) is not significant, and it has no explanatory power for the explanatory variables, which also shows that the model does not have setting errors. In order to further test for heteroscedasticity, this article first draws a residual plot (Figure 4). It can be seen from the figure that as the fitted value of the explanatory variable changes, the variance of the disturbance term does not change significantly, but only exists slightly decreasing trend. Therefore, further using the White test, the results are shown in the following table (Table 8). In Table 8, the p-value is 0.005, indicating that the null hypothesis of homoscedasticity should be strongly rejected, and heteroscedasticity is considered to exist.  In addition, the more applicable BP test is adopted because it weakens the normal distribution assumption of the disturbance term to an independent homoscedastic distribution, but the test results show that the p-value is 0.865. Therefore, the null hypothesis of homoscedastic distribution should be accepted. This conclusion is inconsistent with the White test, but in general, if there may be a heteroscedastic distribution, we should use robust standard error. Because the ordinary standard error in the case of heteroskedasticity will greatly underestimate the true standard deviation of the coefficient, leading to wrong statistical inference. Therefore, this paper uses robust standard error to re-regress model C, and the results are shown in Table 9.
From the regression results, it can be seen that even if regression is performed through the robust standard error, the four explanatory variables of policy_1, policy_2, rain, and temp all have a positive effect on v_ratio at different significance levels, and the difference between them and common standard error regression almost only exists in their values. In addition, the goodness of fit of the model is 0.193, and in the F test of the model, the p value also reaches 0.000, indicating that the overall fitness of the model is also strong. (1) According to preliminary descriptive statistics, it can be found that from 2000 to 2015, the vegetation coverage of the Maowusu Sandy Land in China has maintained an upward trend, which means that the degree of desertification in the Maowusu Sandy Land has also shown a reverse trend.
(2) The logarithm of three variables that GDP in the Maowusu sandy Land, the total output value of agriculture, forestry, animal husbandry, and fishery, and the per capita disposable income of residents in rural pastoral areas can be reduced to an economic characteristic factor, but both analysis of variance and the regression analysis results show that this economic characteristic factor has no obvious effect on the desertification reversal of Maowusu sandy land.
(3) Among natural environmental factors, annual precipitation and annual average temperature have a positive effect on vegetation coverage at a significant level of 1%. The coefficient of annual precipitation means that for every 100 mm increase in annual precipitation, the vegetation coverage will increase by 10%; for every 1 degree Celsius increase in corresponding annual temperature, the vegetation coverage will increase by 2.5%. This not only confirms that the increase of temperature and precipitation has a great effect on the reversal of desertification in Maowusu Sandy Land, but also enables quantitative analysis of the impact of natural factors. It is worth noting that although the coefficient of annual precipitation is small, specifically only 0.001, this is because the annual precipitation is relatively large compared to other variables, such as the annual mean temperature data, and its standard deviation is 91.924. It is much higher than the standard deviation of the annual average temperature of 0.999. Therefore, it can be found that the relatively small coefficient of annual precipitation has a greater effect, which has a decisive effect on the desertification reversal of the Maowusu Sandy Land.
(4) Among the policy factors, two dummy variables,policy_1 and policy_2, are significant at the levels of 10% and 5%, respectively. The respective coefficients of the dummy variables mean that, relative to the status of policy 0, policies 1 and 2 can increase the vegetation coverage rate by 3.4% and 4.7%, respectively, after controlling the variables of natural environment factors. It can be seen that compared with 2000, the implementation of the two plans of the National Desertification Control Plan (2005-2010) and the National Desertification Control Plan (2011-2020) and other policies and measures to combat desertification etc. has a significant effect on promoting the reversal of desertification in Maowusu Sandy Land. This also clarifies the importance of policy factors in promoting the comprehensive reversal of desertification from a quantitative perspective.
(5) Due to the strong collinearity problem, in order to maintain the robustness of the model, this article does not continue to explore the effect of the interaction of policy factors and natural factors on vegetation coverage. Future research can be based on this. Explore the differences in the effects of various natural factors on the reversal of desertification under different policy conditions, and provide more targeted and accurate recommendations for subsequent policy implementation.