Tree height inversion combining light detection and ranging and optical remote sensing data

: Tree barriers in transmission line corridors are an important safety hazard.Scientific prediction of tree height and monitoring tree height changes are essential to solve this hidden danger. In this paper, the advantages of airborne lidar and optical remote sensing data are combined to research the method of tree height inversion. Based on glas data of lidar,waveform parameters such as waveform length, waveform leading edge length and waveform trailing edge length were extracted from waveform data by gaussian decomposition method.Terrain feature parameters were extracted from aster gdem data.The tree crown information was extracted from the optical remote sensing image by means of the mean shift algorithm. Then extract the vegetation index with high correlation with tree height.Finally, the extracted waveform feature parameters, topographic feature parameters, and crown index and vegetation index with high correlation are used as model input variables. The tree height inversion model was established using four regression methods, including multiple linear regression (mlr), support vector machine (svm), random forest (rf), and bp neural network (bpnn). The accuracy evaluation was conducted, and it was concluded that the tree height inversion model based on random forest obtained the best accuracy effect.


Introduction
In the 21st century, with the continuous emergence of high spatial resolution remote sensing images, radar, and lidar data, The appliance of remote sensing technology in power line inspections has evolved from the initial information extraction and obstacle type identification to a more refined direction [1]. Research on forest parameter extraction for transmission line corridors has begun. Among them, tree height is a significant barrier factor and a substantial safety hazard in power lines. For multi-spectral and hyperspectral images of optical remote sensing, by extracting variables such as reflectance, vegetation index, and texture factor in each band, using Pearson correlation coefficient method to extract bands or variables with high correlation with tree height, establishing traditional regression models is a more conventional method [2].
To obtain forest's horizontal and vertical structure information, Forest parameter inversion methods combining Lidar and optical remote sensing data have been rapidly developed. It has became a research direction worthy of attention in recent years. Yun Zengxin [3] and others combined quantitative evaluation of aerial area lidar and hyperspectral data to estimate the leaf area index. Meng Qingyan [4] and others combined LiDAR and multispectral data to create an urban building green space environment index.
In recent years, the research of the crown height tree model has attracted attention at home and abroad. Studies have shown that crown width has a good mathematical relationship with tree height. However, most studies have established a single analysis model of crown width and tree height.Therefore, the accuracy of tree height prediction still has room for further improvement. To sum up the shortcomings of research methods, this paper intends to make full use of airborne lidar GLAS data combined with optical remote sensing image data . Using the waveform length, the waveform leading edge length ,the waveform trailing edge length, the topographical feature parameters, the crown width, and the vegetation index with high correlation with the tree height as the model input variables,establish four tree height inversion models of multiple linear regression, support vector machine, random forest and BP neural network.Then evaluate the accuracy of the four models. The optimal tree height inversion model is optimized to estimate the tree height.

GLAS data and it's preprocessing
NASA used the first full-waveform spaceborne laser (SLA-1/2) for ground observation in 1996 and 1997. This became the hallmark of spaceborne lidar observations and showed the potential of spaceborne lidar to retrieve forest vegetation parameters. GLAS data is a lidar altimeter system, and it is carried on ICE-Sat satellite. It was originally used for real-time monitoring of ice cap changes in the Arctic and Greenland. Later it was widely used for extraction of vegetation height. GLAS data was downloaded from the US National Snow and Ice Data Center (NSIDC) website (http://nsidc.org/data/icesat/index.html). Extract the full waveform data from the GLAS data. Then pre-compress [5] the waveform, convert the voltage value, normalize the waveform, estimate the background noise of the echo waveform, and smooth the waveform. Finally, using the Gaussian decomposition method adopted by Wang Bo et al. [6] Waveform parameters such as wavelength, wave leading edge length, and wave trailing edge length were extracted from the waveform data.

Landsat data and preprocessing
Landsat image data was downloaded from the Geospatial Data Cloud website (http://www.gscloud.cn/). After obtaining Landsat image data, preprocessing such as atmospheric correction, image stitching, geometric correction, and image cropping is performed.
The spatial resolution of the data is 30m; the format of the data is Geo TIFF, reference geographic coordinates are WGS84. ASTER GDEM was used to extract the 3x3 window terrain index and terrain standard deviation to cast aside the influence of terrain factors on GLAS waveform data.

Ground measured data
The measured data comes from the website of the Forestry Resources Investigation Bureau; Each data survey point is recorded: Geographical coordinates, topography, elevation, aspect, soil name, soil layer thickness, forest species, average tree age, average DBH, average tree height, forest community, and canopy density, etc. A total of 150 sample points were collected for modeling and analysis, of which 113 samples were used for training, and 37 samples were used for testing.

Crown data extraction
Crown width Extraction of Trees Using Object-Oriented Remote Sensing Image Analysis. Use Mean Shift to set the spatial feature bandwidth hs to 10, the color feature bandwidth hr to 6, and the minimum area M to 20 [7] to extract the crown information of the same 150 samples, and then use the smoothing algorithm to smooth the extracted crown.

Data normalization
The Z-Score Normalization method was used to normalize all data, and the original data was mapped to a distribution with a mean of 0 and a standard deviation of 1.

Correlation analysis between tree height and various spectral indexes
Pearson correlation analysis, analyze the correlation between each vegetation index and tree height one by one. By estimating the covariance and standard deviation of the sample, available Sample Pearson coefficient. The formula for calculating the Pearson coefficient is as follows: According to the Pearson correlation coefficient cutoff table, When n = 60, | r |> 0.250, means significant at the 0.05 level, when | r |> 0.325, it is significant at the 0.01 level, and | r |> 0.408, it is significant at the 0.001 level. The vegetation index with a high correlation with tree height finally selected is NDVI, EVI2, NDGI, RVI. The correlation between four vegetation indices and tree height is shown in Table 2.

Regression model between measured crown amplitude and image crown amplitude
There is a linear relationship between the measured crown amplitude and the image crown amplitude [7], 113 out of 150 extracted crown samples were used to construct a regression model between steady crown images and image crowns, the remaining 37 samples are used for accuracy verification. Solving regression coefficients using stochastic gradient descent, and stop iteration when the loss value no longer decreases or reaches the set value. The resulting regression model is as follows:

CW=1.705+0.686PCW
(2) Where CW is the measured crown, and PCW is the crown of the image. The CW residual sum of squares is 32.164, and the coefficient of determination is 0.634.

Machine learning tree height inversion model
The input variables for constructing four machine learning models are the wavelength, the length of the leading edge of the wave, the length of the trailing edge of the wave, the terrain index, the standard deviation of the terrain, the measured crown amplitude obtained from the image crown amplitude inversion, and highly relevant vegetation indices NDVI, EVI2, NDGI, RVI. The detailed construction process of the model is: Multiple linear regression model: Bring the training set after preprocessing and data cleaning into constructing a multiple linear regression model, and use a stochastic gradient descent algorithm to reduce the loss function continuously. When the loss function is no longer reduced, the corresponding regression coefficient value is calculated. Finally, the coefficient of determination R2 of the model and its RMSE value on the test set are calculated.
SVM model: Building an SVM regression model, Tuning parameters C, ε, and y when modeling and use a grid search algorithm to select the best C value, epsilon value, and gamma value in SVM regression. Finally, the coefficient of determination R2 of the model and its RMSE value on the test set are calculated.
Random forest model: Building a random forest model, A 10-fold cross-validation grid search algorithm is used to obtain the optimal number of decision trees, the maximum depth of the tree, and the minimum sample size of each decision leaf node. Finally, the coefficient of determination R2 of the model and its RMSE value on the test set are calculated. BP neural network model: Building a BP neural network model, then use the cat swarm optimization (CSO) [13] algorithm to optimize the BP neural network weights and thresholds, and keep it at an optimal level. Finally, the coefficient of determination R2 of the model and its RMSE value on the test set are calculated.
The obtained model results are shown in Table 3: It can be found from Table 3 that the R2 value of the random forest algorithm is the highest value among the four algorithms both on the modeling set and the validation set, and there is no over-fitting phenomenon. And RMSE is 0.110%, which is the lowest value among the four algorithms. It is concluded that among the four machine learning algorithms, the accuracy of the tree height inversion model based on the random forest is the best.

Conclusion
By comparing the accuracy of the four models, it is found that the accuracy of the three machine learning algorithm models is significantly improved compared with the multiple linear regression model, and they show good stability. Among the four machine learning algorithms, the random forest algorithm performs the best, and the multiple linear regression performs the worst. The BPNN algorithm and the SVM algorithm perform relatively close, that is, the tree height inversion model based on the random forest achieves the best accuracy effect. Therefore, the monitoring of tree height in subsequent power inspections has particular application value.
However, there are some limitations in this article, including (1) The GLAS data used in this article, Landsat image data, and forestry survey data were acquired at inconsistent times, without considering the changes in tree height caused by the growth of trees for several years. The model brings some uncertainty. (2) NDVI is easy to saturate, and its sensitivity to high vegetation density areas is reduced. (3) The differences of different tree species, such as coniferous forest and broad-leaved forest, will also cause some uncertainty in the inverted tree height. Therefore, the establishment of different types of tree height inversion models can be used as a further research direction.