Standard deviation as the optimization criterion in the OptD method and its influence on the generated DTM

Reduction of the measurement dataset is one of the current issues related to constantly developing technologies that provide large datasets, e.g. laser scanning. It could seems that presence and evolution of processors computer, increase of hard drive capacity etc. is the solution for development of such large datasets. And in fact it is, however, the “lighter” datasets are easier to work with. Additionally, reduced datasets can be exchange/transfer/download faster via internet or cloud stored. Therefore the issue of data reduction algorithms/methods is continuously relevant. In this paper authors presented the results of the study whether the standard deviation of measurement data can be used as optimization criterion in the process of dataset reduction conducted by means of the OptD method. The OptD is based on the cartographic generalization methods. In iterative process irrelevant points are being removed and those that characteristic are being preserved, what in results means more points in complex fragments of scanned object/surface and less in flat/uncomplicated area. Obtained reduced datasets were then the basis for DTMs generation. For DTMs assessment RMSE was calculated.


Introduction
DTM (Digital Terrain Model) is one of the most popular model representing the surrounding world. It is used in many fields of science and economy to study the state of existing or to simulate future events [1,2]. It can be developed, inter alia, on the basis of LiDAR (Light Detection and Ranging), which in a relatively short time provides a large amount of measurement data. However, the measurement data itself is insufficient. Additionally, measurements should include a measure of their reliability or uncertainty of measurement [3]. The uncertainty of measurement data is discussed in the Guide to the expression of uncertainty in measurement developed by the International Organization for Standardization (ISO) in consultation with a number of global scientific and technical organizations [4,5]. Uncertainty of measurement is expressed e.g.: by measurement error, which represents the difference between the measured value x and the real x0. The measurement error defined in this way is not a measure of the accuracy of the measurement method, because similar measurements made with another instrument at another time and place will give another values [6]. Thus, xi is a random value and its scattering is characterized by a parameter called the standard deviation estimator SD. where: -observed values, ̅ -mean value of these observations, -number of observations. SD can be identified with uncertainty of measurement if any of the values is taken as result of measurement.
An important aspect in the development of LiDAR data for DTM generation is an evaluation of the accuracy of the obtained product [7][8][9]. The accuracy of the DTM is a function of a number of variables such as roughness of the terrain surface, interpolation function, interpolation methods, and the three attributes (accuracy, density and distribution) of the source data [10,11]. Therefore, the measurement uncertainty parameter, i.e. SD of source data has an impact of the generated DTM.
In this paper focus is on the influence of the source data, in particular, whether the standard deviation estimator of ALS (Airborne Laser Scanning) data can be used as optimization criterion in dataset reduction and whether using SD has an impact on the generated DTM. Reduction was performed by means of the Optimum Dataset (OptD) method [12], which allows to preserve points representing characteristics elements in reduced dataset [13,14,15]. DTM was generated on the basis of the original dataset (after its filtration) as well as from the datasets obtained after processing by the OptD method. The study area is a fragment of the national road No. 16, Sielska street in Olsztyn, located in the Warmia-Mazury. The measurement was conducted by Airborne Laser Scanning. ALS was made by Visimind Ltd. For this study, part of this measurement was selected. Laser scanning angle was 60 degrees, with a frequency of 10,000 Hz scanning. Scanning was performed from a helicopter with speed of 50 km /h at an altitude of 70 m.
Within analyzed area control points were set up. They were measured by means of Leica Viva NetRover receiver. Observation time in each measurement point was 12 seconds, weather conditions were satisfactory. Corrections VRS from ASG EUPOS were applied. The result of the measurement was the dataset consisting of 204 points.

Data reduction and DTM generation
The dataset obtained from ALS used in this study contains 74785 points. There are buildings, street, trees within this area. The ALS dataset is presented in Fig. 1. The selected fragments were filtered by using 'adaptive TIN model' method [16] in proprietary software. As a result of the filtration, there are two datasets: the dataset of points showing the topography (topographic surface dataset -TSset) and a dataset of points showing the details points. The topographic surface dataset for TSset consists of 37481 points. It is presented in Fig. 2.
The datasets after filtration contain only points representing the terrain. Next, they were processed by means of the OptD method in order to reduce them. The OptD puts an emphasis on fact, that the reduction is conducted in a way, that the information necessary for the proper performance of a task is not lost. In this case, points representing characteristics of terrain are preserved due to further DTMs generation. To achieved that, the dataset is divided into measurements strips and within each strip linear object generalization method is used. In this paper Douglas -Peucker (D-P) method [17] was used in OptD-single. The D-P method is based on tolerance distance. The calculations are done in a vertical plane, what allows to accurately check each elevation. The degree of reduction depends on the value of mentioned parameters. They are automatically changed during processing to meet the criterion or criteria set by user. In results, the optimal datasets are obtained.
As optimization criteria in the OptD method parameters like: the number of points in reduced dataset (M) and the percentage of points to be in the dataset after processing (p%) were used and tested so far [13][14][15][16][17]. In this paper it was decided to use the standard deviation estimator (SD) of ALS data. The first calculation was made with assumption, that the difference in SD calculated for original and reduced dataset cannot increase more than SDTSset -SDOptDset <0.200 m. Then the difference of SDs was increased by 0.200 m. In this way, four datasets were determined. The datasets after processing by means of the OptD-single method were named OptDset1, OptDset2, OptDset3, OptDset4 respectively. In order to compare the results of using OptD-single method, the parameters that describe the TSset were determined. Statistics of TSset are presented in Table 1.  (2) where: (i = 1,2…, n) -the height of subsequent points in the dataset, ̅ -mean height in TSset, nnumber of points in dataset (original -N or reduced -M).
The value of SD increases with the decrease in the size of the dataset. In the range from 25% to 70% of reduced points in the TSset, the value of the difference between SDs of TSset and OptDsets ranges from 0.160 m to 0.752 m, these are insignificant values for data representing the surface area.
On the basis of the reduced datasets, DTMs were generated using the kriging method in the Surfer with various grid sizes (0.5m and 1m). In Fig. 4 and Fig. 5 the DTM generated from the TSset (DTM TSset) is shown. Application of the OptD method gave a few solutions, which were used for DTM generation. In Fig. 6 and Fig.  7 four DTMs based on these OptDsets are presented.
To test, how the reduction of measurement data influence the DTMs generation, some cross-sections in random localization were performed. It was made for DTMs with grid 0.5m and 1m. those differences are from range 0.035m -0.149m. For others cross-section results are similar. Thus, it can be assumed, that the reduction did not distorted the generated surface.
In order to determine the accuracy of generated DTM OptDsets within tested area 204 control points were measured by means of GNSS technique. Those points were used to calculate RMSE of generated models.     The results show that the values of SD increase along with the decrease in the number of points in the ALS dataset.
Presented results show that the more points have been removed in a dataset by the OptD method, then the bigger are the differences between RMSE of DTMs generated on the basis of this dataset. The range of changes is from 0.073 to 0.470 m. Therefore, using DTMs for various purposes is determined by these values.

Conclusions
This paper presents the use of the standard deviation parameter as the optimization criterion in OptD-single method. It was assumed that the difference in SD calculated for the original and reduced dataset cannot increase more than 0.200 m. Tests were carried out for 4 cases, each time increasing the difference in SD by 0.200 m. Next, on the basis of obtained (reduced) datasets, DTMs were generated. For the DTM assessment parameter like RMSE was used. To calculate the RMSE the control points were measured by GNSS.
The analyzes show that the SD of the dataset can be an optimization criterion in the OptD-single method. The obtained datasets met the optimization criterion, and the DTM generated on their basis also met the accuracy expectations. At control points, the calculated RMSE for DTM with 0.5 m grid increased by a maximum of 0.466 m, while for DTMs with 1m grid size by 0.470 m. Differences in RMSE indicated, that application of DTMs generated on the basis of reduced datasets, is determined by degree of reduction.