Tree bark prediction along the bole through the support vector regression technique

. Tree bark plays a protective role by surrounding the wood of a tree like a cloak. Due to its chemical composition and the possibility of its use in various fields, such as pharmaceuticals, landscape architecture, etc., tree bark receives much attention having outstanding importance for industrial utilization and markets. Tree bark is considered a valuable forest product, along with the wood volume. Thus, the accurate prediction of the bark quantity that a tree can produce is of utmost importance for the sustainable management of the forests. For this reason, the knowledge of its quantities, further enables the accurate prediction of the plain wood volume that can be produced by the forest, as well. Because of the nonlinear nature of this biological variable, its accurate quantification is a very complicated problem. Artificial intelligent methods have shown the potential to reliably predict biological variables that are non-linear in nature. In this work, the support vector regression methodology, as a nonlinear nonparametric machine learning approach, is tested for the accurate prediction of the tree bark factor in every different height of the tree bole, through easily obtained measurements on trees.


Introduction
Pine trees (Pinus brutia Ten.) have important characteristics, as far as drought tolerant and fast growth are concerned.Furthermore, they are coniferous species which are native in the Mediterranean region [1].They can also be found in planted areas, such as parks and urban forests in the same region.Pine trees belong to these species that can reach high bark thickness with ageing (Fig. 1), with mean value of about the 15% of its over-bark wood bole volume [2].Beyond to its known protective role in the survival of the tree, bark derived from different tree species can be utilized as a pollution bioindicator [3][4][5], as well.Due to bark chemical composition [6] it can be used as a raw material for the manufacture of pharmaceutical substances, as a soil conditioner along with groundcover product in garden architecture [7] and finally, as an alternative source of energy.These are the main reasons why it is essential for forest management decisions to rely on the accurate prediction of the bark quantities that a forest can produce, even in the case that tree bark can be considered just as remnant.According to previous studies [8] inaccurate predictions in bark volume can significantly and negatively influence the profit that can be derived from a forest area.
Since the variation of bark thickness along the tree bole, is a reality, accurate bark factor (BFi), which represents a bark thickness ratio, values prediction through models, can significantly contribute to the accurate and reliable diameters bark percentage (bdi%) prediction for each diameter (di), leading to the accurate prediction of the tree bole volume along with the tree bark volume of the trees.To this direction, in order for the bark volume of the trees to be accurately predicted, many different modeling methodologies have been applied till today, mostly including linear and non-linear regression models [9][10][11][12][13][14]. Lately, in the scientific area of forest research, attention is paid to the machine learning methodologies, that due to their nonparametric algorithms, are free of assumptions beset the regression modeling methodology [15].Furthermore, there is no need for a specific model form to be pre-specified, as it is essential in regression modeling; instead, in the area of the artificial intelligence, the learning methodology applied shows the ability to learn from noisy and, frequently, incomplete data, such as the ground-truth forest data.Till now, some attempts have been made, in order for the tree bark quantities to efficiently be modeled through the usage of the machine learning strength.That is, [7] compared the performance of artificial neural networks with non-linear regression models, while [16] compared the performance of the support vector machine for regression with the same technique, in both works the comparisons made with the use of pine and black alder trees bark volume prediction, respectively.[17] used nonlinear autoregressive exogenous neural network technique (recurrent neural network) for estimating the double bark thickness of oak and Scots pine trees.All the above attempts concluded that the machine learning approach has the ability to adequately describe the patterns of the preliminary data in hand.Furthermore, from the comparison of the machine learning methodologies with more traditional approaches, such as the regression analysis, the first showed superior performances than the latter.
Following the necessity of reliable models production that can positively influence the accurate prediction of the tree bark quantities in the forest, and taking into account that the bark factor (BF) is one of the most important variables that directly affect the bark volume configuration, this study aims to explore, test and finally provide an efficient modeling approach, from the new perspective of the machine learning modeling, hoping that the current approach has the potential to lead to the most accurate tree bark quantities prediction.

Study area and data collection
In order for the bark factor (BF) to be accurately predicted by an efficient model, a stem analysis dataset was created including several measurements, namely a total of 1474 data on fifty-seven standing pine trees from Seich-Sou urban forest of Thessaloniki, Greece, were taken.Urban forest of Thessaloniki lays in the area that is included by the geographic coordinates 40°37'33.0''Nand 23°00'45.0''E.It is an almost purely planted pine forest, along with some other species that can be found in the forest, as well, such as trees of Cupressus sempervirens L., Cupressus arizonica Greene, Quercus coccifera L, etc. [18].
In an effort for the measurements to be taken from trees of all different site qualities and classes in order for all the possible variation of the trees attributes to be covered, systematic sampling was used.A GPS instrument was used for the sampled trees to be located on the ground.
Tree measurements included over bark diameter at stump height (do0.3)(0.3 meters height from ground), diameter at breast height (do1.3)(1.3 meters height from ground), over bark diameters (di) at one-meter height intervals above breast height until the tree tip, double bark thickness (bi) in all relative heights (hi) where the over bark diameters were measured, and total height (tht) of the sampled trees.The instruments used for obtaining the measurements on the tree boles were the caliper, the Spiegel relaskop, the Blume -Leiss altimeter, and the Pressler's incremental borer [2,19] (Fig. 2).

Comprehensive data base construction
Different values of the bark factor (BFi) were produced in different heights of the trees boles: where doi is the over-bark diameter at i meters from ground and dui is the under-bark diameter at the same bole height (hi).
The values of the under-bark diameters (dui) were calculated as: where bi is the double bark thickness at the tree bole height (hi).
In order to predict the bark factor profile over the tree bole, the relative height (RHi) of the tree bole that the measurements were taken, is also calculated: where hdi is the tree bole height (hi) where the diameter measurement (di) was taken and tht is the total height of the tree bole.The resulting values of the ε-SVR estimation and prediction system are essential for the calculation of the bark percentage of any diameter along the tree stem, while in this way, it is possible to accurately calculate the pure (without bark) tree stem volume for any tree stem section and the relative tree bark volume by applying the following system of equations [2]: where, vui is the under-bark volume of the tree stem section i, voi is the over-bark volume of the tree stem section i, and vbi is the bark volume of the tree stem section i.
Having the quantities of equations ( 4) and ( 5), the total tree stem pure volume (under-bark) and the total bark volume can be assessed by summing all the above portions of the stem.
Arithmetic mean values, maximum and minimum values and the standard deviation (Sd) of the basic variables used as the ground-truth information for the bark factor modeling configuration, are given in Table 1.In order the predictive ability of the constructed model to be tested in new, never seen by the model in its construction phase, data, the available data set was divided randomly into two parts.That is, the fitting data set consists of the 90% of the total data in hand, and the test data set consists of the remaining 10%.

Support vector regression model (SVR) construction
Nowadays, artificial intelligence (AI) is examined in order for its effectiveness to be verified in environmental and more specifically, in forest modeling [20,21].As a subset of AI, the machine learning methodologies include the Support Vector Regression (SVR) technique [22].The outstanding characteristic of this methodology is its generalization ability.Furthermore, its potential of learning from noisy or incomplete data by detecting inherent complex nonlinear relationships between output and input variables, is a challenging ability, that worths exploration in forest modeling research [16,[23][24][25], where such kind of data are encountered.Specifically, the ε-SVR algorithm used in the present modeling effort, which is a supervised learning algorithm, attempts to find a function f(x) from these data pairs between inputs and output, so as the regression error of all training samples to be minimized by lying within an optimum width of the ε-insensitive zone [−ε, ε].In order for intricate patterns in the data to be captured by the algorithm, non-linear Radial Basis Function (RBF) kernels have been used.The RBF Kernel Support Vector regression was implemented in the scikit-learn libraries [26] and the Python programming language [27].It is worth noting that a set of the optimum values of three meta-parameters guides to the successful learning of the algorithm.These meta-parameters are the ε which controls the width of the ε-insensitive zone, the gamma parameter (γ) which is inversely proportional to the variance (σ) and used for the RBF kernel width control, and the cost parameter (C) which is a regularization parameter, known as the penalty parameter, used in order to specify the trade-off between mis-prediction against simplicity of the model [16].Each one of them separately and interactively at the same time, controls the adequate learning of the system under the kernel equation: where x1, x2 are two points or Support Vectors (SV) with an Euclidean (L₂-norm) distance between them (x1 -x2), γ = (1 / 2σ 2 ) and σ 2 is the variance.The bark correction factor (BFi) was used as the output variable of the ε-SVR modeling system.The over bark diameter at breast height (do1.3), the relative stem heights (hi) where the over bark diameters were measured, the total height (tht) of the sampled trees, the ratio between the relative height and the total height of the trees and the ratio between the over-and under-bark diameters at breast height, that are easily obtained on the tree bole, were used as the input information to the modeling system.

Model evaluation criteria
The indicators calculated for the training and the test data set were: the correlation coefficient (R), the root mean square error (RMSE) and the % root mean square error (RMSE%), and the average absolute error (AAE), of the bark factor (BFi) used as the output feature in the modeling process.In addition, the measured bark factor (BFi) values were compared with the predicted by the constructed model values.For this reason, the 45-degree line and the paired t-test were used.As a final evaluation of the statistical behaviour of the ε-SVR constructed model, its error distribution was examined, as well.

Results and discussion
The optimum combination of the meta-parameters values was selected using the tuning technique called grid-search methodology [26].The exhaustive search was performed in the range of 10 -4 to 0.30, by 0.001 concerning the configuration of the (ε) parameter, in the range of 0.05 to 1.00, by 0.05 concerning the configuration of the (γ) parameter, while the tested values of the (C) ranged from 2 to 30, by 0.1, in order for the optimum combination that led the system to the minimum loss, to be selected.
Following the grid-search methodology, the ε-SVR model produced the most accurate results, used the optimum combination of its meta-parameters with values C = 5.7, γ = 0.06, and ε = 0.001 (Fig. 3).

Fig. 3. SVR model meta-parameters optimum combination.
The evaluation statistics of the ε-SVR model that will be used for the accurate estimations and predictions of the bark factor values along the tree bole, for the fitting and the testing data set, are given in Table 2.As it is shown (Table 2), the ε-SVR manifested a remarkable adaptation not only to the training data set but to the "never seen by the model" test data set, as well.The estimations and predictions RMSE% which were 1.63% and 1.96% of the BFi mean observed values, respectively, can be considered as sufficiently small error percentages BF values.Furthermore, the proximity of each point to the 45degree line indicated that the ε-SVR constructed model, has the ability to adequately estimate and predict the bark factor values throughout the tree bole (Fig. 4).
According to the t-tests applied under the significance of α=0.05, to examine if there are any differences between the observed and a) the estimations and b) the predictions, which were both derived by the constructed ε-SVR model, the p-values were equal to 0.100 and 0.371, respectively.The p-values were greater than 0.05, meaning that there are no significant differences, for both categories.Finally, the residual distribution of the ε-SVR constructed model was also examined (Fig. 5).It is obvious (Fig. 5) that both the estimations and predictions derived are reliable, with a pick of the residuals derived around zero, while there are a few larger residual values, meaning that the constructed system is a healthy one and shows the potential to accurately produce the bark factor values of pine trees.
Using the result by the e-SVR model and following the reasoning of the equations ( 4) and ( 5), the predicted pure total tree stem volume can be summarized to Figure 6, where the stem volume under-bark it is shown as it has been calculated availing the information derived by the constructed ε-SVR model.

Discussion
In forestry, the production of reliable models that can accurately predicted the values of any tree attribute, is of vital importance, especially when there are difficulties in the collection of the ground-truth data that almost always faced in the primary data collection in the field.Furthermore, data measured on biological organisms usually include high variability and belong to several kinds of non-normal distributions.For all the above reasons, the research in the modeling area to be focused in new and smart technologies that show the potential to effectively adapt to biological organisms' attributes without any prerequisites is a challenge.To this direction, machine learning is worth of exploration in the forestry research field.
On the other hand, nowadays, there is an increasing need followed by an equally increasing interest for wood and bark quantities that can be used not only for industrial utilization, but for energy, as well.Therefore, it is of significant importance to produce models that can effectively predict the quantities of wood and bark that a forest can produce.This knowledge is essential for the sustainable management of this natural resource, that is the forest.Since bark can reach about 15% of the wood volume, it is of utmost importance for the tree bark to accurately be predicted.The modelling technique that can reliably predict the tree bark quantities would significantly improve the forest management decisions in favor both the forest and the economic benefit from the use of this product.In this way, decisions related to the amount of biomass that can be safely removed from the forest will be safer, resulting to the avoidance of the forest ecosystem degradation over time.
The machine learning methodology used in this research effort was the ε-SVR technique, that uses the the ε-insensitive loss function that ignores errors that are outside of the margin [-ε, ε], which is known as the εinsensitive tube.By solving the prediction problem at hand through the non-linear kernel radial basis function under optimum values of its meta-parameters using mapping of the nonlinearly separable available data into a linear separable future space, showed remarkable estimation and prediction abilities related to tree bark factor.
The pure volumes predicted with the use of the BFi, where very close to the observed under-bark volumes of the pine trees, with mean absolute error equal to 0.0018 m 3 with values ranged from 0.000016 m 3 to 0.0042 m 3 and standard error of the mean equal to 0.00013 m 3 (Fig. 7).

Concluding remarks
According to the described behaviour of the constructed ε-SVR model, the ability of the system to overcome outliers, noise, non-Gaussian distributions that frequently characterize the forest preliminary data measured in the field, it is clearly revealed.
Furthermore, the ε-SVR modeling technique showed remarkable adaptation to the non-linear ground-truth data, as well, producing effective and reliable bark factor prediction model.It was shown that the knowledge of bark factor values can lead to accurate estimations of bark quantities, further enables the accurate prediction of the plain wood volume that can be produced by the trees, as well.
Based on the results obtained, the use of the support vector regression technique, where the non-linear radial basis function kernels were embedded, is strongly recommended.
Finally, the results of this study can provide evidence along with a consistent basis for further research in the scientific area of forest modeling.

Fig. 1 .
Fig. 1.Pine bark of about forty years of age tree.

Fig. 2 .
Fig. 2. Instruments used for obtaining the measurements on the tree boles.

Fig. 4 .
Fig. 4. 45-degree line for the training and the test data sets.

Fig. 6 .
Fig. 6.Over bark and under bark volume as predicted using the results of the ε-SVR bark factor model.

Table 1 .
Descriptive statistics of the field measurements and the variables used for the bark factor modeling configuration.

Table 2 .
Evaluation statistics of the ε-SVR model showed the best adaptation to the available data.