The circle object detection with the use of Msplit estimation

. The paper presents the use of M split(q) - estimation in the filtration and aggregation of point clouds containing a known number of elliptical shapes with preliminary unknown - locations and dimensions. These theoretical solutions may have practical relevance especially in the modelling of terrestrial laser scanning data of objects that have similar shape to circles. Mentioned shapes can be scanned of tree trunks, columns, gutters, etc., that are elliptical in the horizontal plane. The results are satisfied and encourage furthermore detailed research, particularly with the extension of 3D applications.

In the case of TLS, the measurement of spatial point locations must be performed from several measurement stations to get complete information, such as the envelope of spatial objects. Despite the efforts, the obtained data set will not fully represent the observed space. It will also contains a lot of surplus data (redundancy) that hinders the assessment of the measured space as well as its numerical analysis. Therefore, the raw measurement data is usually taken to modelling, which aims to present space as homogeneous sets of qualitatively [7], semantically and functionally objects descripted in the simplest way. This is usually done by specification the location of the object and its dimensions and orientation in space.
The spatial and topological relations obtained in this way can be used to gain knowledge about the study space as well as to perform analyses based on them such as [8][9][10][11]. The modelling process compared to the measurement process is more time consuming and demanding from operator's point of view.
It seems that it can never be fully automated, but many of the tasks can be accelerated enough and performed by the application of a supervised mode.
Nowadays spatial data modelling applications based on TLS measurements include many efficient tools for automatic object recognition. They are based on the nature of the mutual distribution of measurement points, their density and reflectance values [12][13][14][15][16].
However, it is always required to indicate the analysis (modelling) area and the initial parameters determining the operation of the selection, filtering or aggregation tools and the selection of geometric primitives best matched to the modelled element.
The main goal of this article is to present the algorithm for automatic detection, on a given points cloud, assumed number of elliptical shapes (horizontal cross-section).

M-estimation
Msplit(q) [17,18] assumes existence set of n observations l (l1,l2…ln) that contain q mixed random variables with expected values (1): where: -vectors of known coefficient values, X -vectors of unknown parameters q random variables.
According to the assumptions of the Msplit(q), for each observation, the division of potential is defined as the possibility of its belonging to the one of defined functional models. Therefore, each observation li are described by the functional model (2): corresponds to q competing functional models (3): This assumption allows presents the traditional functional model (2) as a conglomerate of functional models as follow (4): The main assumption of Msplit(q) is to assign the greatest influence (during estimation process of the particular parameters (5)) those observations that best fit to the particular functional models: (1) , (2) , … , ( ) (5) and can be used to determine estimators (6): The competition of functional models for every observation adjust them into the whole set of observations according to the conglomerate's assumptions.
Graphically, the idea of Msplit(q) estimation of the set points, that lays on the plane, of three circles -three functional models (q = 3) is shown in Fig. 1. The obvious problem in the iterative estimation of the Msplit(q) is the proper choice of the number of q models in (4). Incorrect assumption of this magnitude will result in the uncertainty of the observation set modelling (the residuals will be too big) and the result will be far from expectation. The number of approximating models can be from 1 to n where n is the number of all observations in the measurement set.
In the presented paper, the choice of the value of the number of models was assumed as a additional task for further considerations. Therefore, in this analysis value of q is the result of the application the statistical method of the cluster analysis with the use of singleneighborhood agglomeration (nearest neighbourhood) [19,20]. This method consists in determining the distance between any clusters as the distance between the two nearest objects (closest neighbours) belonging to distinct clusters. Elements of one cluster form one chain. This method fails with overlapping clusters belonging to different models, then in practical implementations, it may require operator supervision to correct (increase) number of clusters .
After determining the number of clusters and the types of functional models (4) for the all of the Msplit(q) observations, it can be performed. The result will depend on the correctness of the selected models, their number q, and the relation of the number of observations belonging to each model [21], as well as the way of weighting the observations in the estimation process.

Functional model
The use of Msplit(q) estimation in the elaboration of the 2D or 3D set of points projected to the plane, where they can be treated as q sets of circles requires the use of a suitable functional model. The alignment of a set of 2D points is reduced to determination of the circle centre coordinates and its radius and thus the parameters fulfilling the classical equation: where: xC, yC -centre coordinates of the circle, xi, yi -coordinates of circle points, R -radius of the circle. Using the above equation (7) in the LSM (least squares method) estimation requires transformation into a form (8) [22]: where: Such a functional model (8) was chosen for the estimation of q circles in the Msplit(q) process. Hence: where:

Iterative block of algorithm
An integral part of the algorithm using the Msplit(q) estimation for segregation of measurement data that represents sets q circles is the "iterative block of algorithm" shown in Fig. 2. The input to this block is the data set as the matrices A, L, X (13-15) and parameter q.
On such set of data, the LSM estimation is iterated q times. Weights for each estimation are calculated based on the residual values for observations from the previous estimate. The rule here is to assign higher weight values for observations that have higher residual values in the preceding estimation. This rule is implemented by the weight function module (Fig.2). The method of weighing should be chosen in relation to the optimality of the given task, theoretically and empirically tested. The activity sub-block marked with a dashed line can also be executed iteratively (more than once). Then the weights of the first estimation are obtained by analyzing the residuals of the qth estimate of the previous iteration. Such an approach may optimize the execution time of task (based on presented algorithm) for some systems of functional models.
After completing the iterative estimation and obtaining the parameters for the assumed q functional models, it must be chosen the least defective, it means the lowest error fitting into a hypothetical subset of data. A hypothetical subset of data is understood here as a set of observations that Euclidean distance from the found solution of the functional model is less than the assumed value of the parameter τ. The final step of this iterative block is removing, from the measured dataset, observations assigned to the optimal model just found (based on the assumed value of the parameter τ). It means that the remaining observations have a connection with others q-1 solutions.
The iterative block usage process is repeated q times, and each time one optimal solution is selecting and its observations are reducing from all data set. The obtained results of the parameters of all q functional models constitute the approximate parameters of the next iteration that are presented as a dotted line (Fig. 2). The gradient values of these parameters is the basis for the decision to end the algorithm. The schema of main algorithm was depicted on Fig. 3.

Initialization
Settings: nq -number of functional models N -number of points of datatset

Matrix and vectors construction:
A -according to equestion L -according to equestion P -identity weight matrix

Practical implementation
To test the algorithm (Fig. 3), i = 1050 points data was assumed. Points (xi,yi) can be interpreted as the creation of 7 functional models in the form of circle equations (8). Three of these circles were overlaid on each, in order to test the reliability of the algorithm to the penetrate results of functional models (Fig. 4).

Fig. 4. Visualization of dataset created for validation algorithm purposes.
Parameters of the theoretically defined circles (the theoretical coordinates of the centres of circles xc, yc and theoretical radiuses R) with coordinates of points contaminated with Gaussian noise with the maximum value 0.20 m, are presented in Table 1. Coordinates of circle centres and their radiuses, are estimated from data with Gaussian noise, are given respectively as xcest, ycest, Rest. Assumed value τ = 1.50 m.
The weighing function was the fiftieth power of residual for observations from the previous estimation.  As a result of the algorithm use in its 10 main iterations (dotted line in Fig. 3) and increased to 2 iterations in the "iterative block" (dashed line in Fig. 2), one obtains the solution of q = 7 functional modelscircles depicted as continuous line in Fig. 5. The obtained values of functional models parameters (xmc, ymc, Rm) and their absolute differences, with the values estimated from the Table 1, are presented in Table 2. The largest differences from the assumed theoretical values can be seen in solutions 5th and 6th of circle. The numerical justification for such a result is noticed in the order of finding solutions of functional models during using the algorithm. 5th and 6th circles in the iteration process were found earliest then others (there was the greatest negative impact of observation recognized in the further process that belongs to the other solutions).

Conclusion
It is worth to notice that among the existing applications of Msplit [23] there is lack of practical implementation of using this application for more than 2 spited sets (q>2). As the industrial application of the presented algorithm, the measurements of space with a highly structured form, but unknown objects parameters with circle shapes can be indicated. Developing an algorithm for 3D applications would allow the detection of linear infrastructures with circle cross-sections.
Mass acquisition of observation data , which is the domain of most geodetic survey nowadays, forces the search for numerical solutions capable of meeting the expected data processing time and the accuracy of the analysis results. The proposed modification of algorithm, the extension of the standardized approach of Msplit(q), works in theoretical considerations. Its practical utility depends on E3S Web of Conferences 26, 00014 (2018) https://doi.org/10.1051/e3sconf/20182600014 2017 BGC the quality of the data, the number of outliers, the number of observations associated with each of the functional models.
The evident problem of the algorithm needed to solved in the future is the automation of the definition of the value q -the number of functional models. This article assumes its knowing, which greatly influences the versatility of applications. One of potential solution is to place the entire algorithm into another external iteration responsible for the iterative changing value of parameter q. Store of solution results for the accepted value range q = 1 ... maxq and choosing this solution where the residual errors will be the smallest. The work involved in the optimal selection of parameter q is the basis for the next study and is closely related to its implementation and commercial applications.