Predicting the Potential Stopover Sites of Ciconia Boyciana during Migration in China

. Ciconia boyciana is a rare migrant bird in China and is listed as endangered species by IUCN. In order to identify their important stopover sites during migration, based on the Random Forest and the MaxEnt model, the occurrence records of Ciconia boyciana in spring migration and autumn migration in 2021 and 24 environmental variables were used to simulate the potentially suitable stopover sites. Compared hotspot stopover sites with current conservation area distribution, the strategies of the conservation establishment were proposed.


Introduction
Ciconia boyciana is a rare migrant bird mainly distributed in China and is listed as endangered species by IUCN [1]. The government of China has actively protected it by establishing conservation areas, which have been largely concentrated on breeding and wintering sites. In fact, the migration period between breeding sites and wintering sites is one of the most perilous stages in the life cycle for birds, and many migratory birds cannot successfully complete long-distance journeys without replenishing expended energy at stopover sites [2]. The important step to protect C.boyciana is to figure out where stopover sites are located and translate results into action. Species distribution models (SDMs) have got a fine effect in recent studies to predict bird habitats and have a good prospect. The development of SDMs began with the BIOCLIM model [3] and was followed by the emergence of various models over the next 20 years to satisfy the growing demands of research, including DOMAIN [4], Generalized Linear Model(GLM) [5], Genetic Algorithm for Rule-set Prediction(GARP) [6], and Maximum Entropy (MaxEnt). Nowadays, MaxEnt is widely used because it has been proven to its relatively good performance and high precision with fewer requirements of input data, even with small sample sizes of as few as 10 and 25 occurrences. Given its superiority of producing useful results with sparsely sampled data, Maxent has been commonly adopted in identifying priority areas of habitat for rare and threatened species [7]. Thus, the objective of this study was to (1) identify the potential stopover sites during the migration of C.boyciana using Maxent; (2) analyze how to implement strategies based on the current conservation areas distribution of C.boyciana. Based on the results of this study, we intend to help C.boyciana complete its migration successfully and maintain the species' sustainability.

Study area
The study was performed in the natural distribution area of C.boyciana in east China, including eighteen provinces (20-50°N, 109-135°E) extending from north to south: Heilongjiang, Jilin, Liaoning, Shanxi, Hebei, Beijing, Tianjin, Shandong, Henan, Anhui, Jiangsu, Shanghai, Hubei, Jiangxi, Zhejiang, Hunan, Fujian, Guangdong. Specifically, C.boyciana 's habitats are mainly distributed along the rivers and coastline in the study area(figure 1). China has legally established conservation areas in the important habitats, to clarify the current distribution of conservation areas of C.boyciana, and a Kernel density estimation was used to calculate the density of the conservation areas in the study area in figure 2.

Environmental variables
We selected a series of environmental variables of potential biological relevance for the habitat of C.boyciana, such as climatic variables, topographical variables, and land use types. Climatic variables dataset derived from the Copernicus Climate Data Store[10] monthly averaged data in 2021 were extracted as predictors to model the likely habitats of C.boyciana. However, these variables are probably highly correlated and the redundant information introduced by strong correlations could lead to multicollinearity and overfitting in modelling [11]. In order to minimize these negative influences on outcomes, we attempted to select the important and minimally correlated climatic variables for C.boyciana by random forest algorithm and Pearson correlation analysis. The random forest(RF) algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method [12]. The random forest classifier is constructed in two parts, the random response Y, and given X, one has to predict the value of Y. In this study, variable X consists of environmental variables, and variable Y could be assigned into two classes, the value 0 (absence) or 1 (presence). According to the C.boyciana records, we randomly selected 100 absence points and 100 presence points as Y, and the corresponding climatic variables were recorded as X = (X 1 ,..., X 42 ). In the establishment of the model, it is necessary to randomly extract the constituent sample set from Y to obtain n decision trees. When nodes of each decision tree are split, m(m≤42) environmental factors are randomly selected to combine and match with the decision tree, so as to obtain the most reasonable decomposition combination. Each decision tree operates independently, while the final outcome is determined by all the trees. The key parameters of establishing the RF model are n and m. After multiple operations, it turned out that the best performance of the RF model appeared with the value of n (number of grown trees) 200 and the most reasonable value of m(number of leaves) 5. The most advanced variable importance measure available in random forests is the "permutation accuracy importance" measure. At last, the rank importance of climatic variables for C.boyciana is shown in figure 3. Based on the importance level, a Pearson correlation analysis was conducted on the climatic variables. For each set of significantly crosscorrelated variables (r ≥ 0.7), only the most important variable was kept for further analysis. To express the topographical characteristics across the study region, such as elevation and slope, a 500-m resolution digital elevation model (DEM) was downloaded from the General Bathymetric Chart of the Oceans(GEBCO) [13]. Considering C.boyciana is a wetland-obligate species, the distances between C.boyciana presence sites and water bodies were measured by Euclidean distance(a spatial analysis tool) in the software of ArcMap. The water bodies were defined by the United Nations Food and Agriculture Organization's Land Cover Classification System (LCCS) in 300m spatial resolution, downloaded from the Copernicus Climate Data Store[10]. Ultimately, 24 variables were used as environmental variables and resampled as the same geographical sizes in 0.1x0.1 degrees (Table 1).

MaxEnt model
MaxEnt software was applied to predict the presence probability of C.boyciana. Maxent describes the set of discrete geographical cells of the study area by a given space X. Each grid cell of X (x 1 , x 2 ,... x m ) is defined as an environment variable, such as temperature, precipitation, altitude, etc. The probability distribution in the study area is estimated by the maximum entropy approach subject to the environmental condition. This method is equivalent to maximum likelihood Gibbs distribution, which maximizes the probability of appropriate circumstances for the species based on the sample positions (x 1 , x 2 ,... x m ). The function of Gibbs distributionis: where P(x) is the probability function, is constant, is the function for each environmental variable and "Z" is a scaling constant that ensures P bounded in [0,1].
To assess model prediction accuracy, the species occurrence data were often partitioned into 75% for training and 25% for testing. The model accuracy is tested by the AUC value, which is the area enclosed by the receiver operating characteristic curve (ROC) and the abscissa. The AUC value is classified as fail(0.5-0.6), poor (0.6-0.7), fair (0.7-0.8), good (0.8-0.9), and excellent(0.9-1.0).

Model performance and potential stopover sites
The model provided high AUC values for both spring migration (0.991) and autumn migration (0.984) testing data, which could be rated as excellent performance. The resulting maps of MaxEnt model for the potential stopover sites during the migration of C.boyciana have presented in figure 5 and figure 6. The maps illustrated the suitability extent in the study area for C.boyciana staying as stopover sites. They have a range of values between 0 and 1, grouped into four classes: high potential (> 0.6), good potential (0.4-0.6), moderate potential (0.2-0.4), and low potential(< 0.2) [14]. As figure 4 and figure 5 show, except for the wintering sites and breeding sites, the high potential suitability stopover sites are mainly concentrated in Liaohe estuary wetlands in the Bohai Sea waters, near the Yongding River and Haihe River basin in Hebei Province, at the estuary of the Yellow River and in the Yellow River Delta Nature Reserve, in the urban coastal areas of Shandong Province, Jiangsu Province, at the estuary of the Yangtze River, southeastern Zhejiang coastal areas, coastal areas of in Fujian Province, the Pearl River system in Guangdong Province. The environment of these stopover sites is similar to the habitats of C.boyciana, and has the potential to develop into habitats. Associated with the recent reports, there are more and more C.boyciana even staying at these stopover sites to overwinter rather than the Poyang Lake and Dongting Lake, which used to provide the main wintering sites for C.boyciana. For example, Yangzhou City, Jiangsu Province, is located in the plain of the middle and lower reaches of the Yangtze River and it used to be a stopover site in the past years. However, it has been a new wintering site for C.boyciana as the wetlands environment improved in Yangzhou.

Assessing contributions of environmental variables
MaxEnt provided the jackknife procedure which enabled us to estimate the most relevant environmental variables for the stopover sites of C.boyciana. The jackknife procedure described the importance of environment variables by calculating the gain of a model using each environmental variable in isolation [11]. The gain is a measure of the fit degree of the model in MaxEnt. It indicates how closely the model is concentrated around the presence samples. The results( figure 6 and figure 7) showed that if MaxEnt use only surface pressure(sp) it achieved the highest gain, that is, surface pressure primarily impacted the stopover sites of C.boyciana. Meanwhile, elevation(El), evaporation from vegetation transpiration(evavt), and skin reservoir content (src) are also more important than other variables in predicting the stopover sites.

Discussions
As figure 1 shows, the current conservation area is primarily distributed in northeast China, south of the Yangtze River and Haihe River basin. However, the midland of the study area is in the stage of sparse conservation areas. Compared with the hotspot stopover sites region of MaxEnt results, coast areas in Shandong province and Jiangsu province contain an adequate amount of stopover sites, while obviously they have far few conservation areas. Beyond the immediate danger of food shortages and predators and slowing their migration routes, it leads to C.boyciana stopping in urban areas and facing more dangerous situations like being hunted or poisoned. Thus, conservation areas of C.boyciana here are recommended to expand the scale or establish the new. From the MaxEnt Jackknife results, new conservation reserves are supposed to give priority to elevation and vegetation cover. C.boyciana prefers plains at lower elevations with sparse trees, open grassland, or swampy land. In general, for areas where potential stopover sites overlap with conservation areas, it is recommended to continue to further improve management mechanisms, and expand the area of conservation areas with the core large wetland marsh, to stop the fragmentation of C.boyciana habitat. Stopover sites that do not overlap with conservation areas and have wetland areas should be combined with local land use planning, and give full support to C.boyciana's migration.

Conclusions
C.boyciana is threatened by various dangers during migration. Identifying available stopover sites is urgent matter to protect the population of C.boyciana. Using the MaxEnt model, we have presented the hotspot areas of stopover sites in eastern China. It is highly recommended that the coastal area of Shandong province and Jiangsu province be protected to provide migrating C.boyciana with more stopover sites for energy replenishment. To provide higher precision of potential stopover sites, we must continue to further study the impact factors of C.boyciana stopover sites, especially lights at night.