The Reliability of Numerical Modeling in Remote Sensing Data Analysis

. The article deals with the problem of calculating reliable estimates of empirical distribution functions under conditions of small sample and data uncertainty. To study these issues, we develope computational probabilistic analysis as a new area in computational statistics. We propose a new approach based on random interpolation polynomials and order statistics. Arithmetic operations on probability density functions and procedures for constructing the probabilistic extensions are used.


Introduction
The presence of uncertainties in the remote sensing data requires the development of numerical methods that take into account these uncertainties. Thus, interval uncertainty lead to interval methods. The interval approach is actually one of the most important, but far from the only means of obtaining reliable results in mathematical computations.
Reliability can also be based on other approaches, both purely mathematical and related to computer tools for solving mathematical problems. To improve the reliability of calculations we propose to use the computational probabilistic analysis.
The paper discusses reliable estimates of remote sensing data analysis. The approach is based on the use of random interpolation polynomials and order statistics. One of the known approaches is the use of Kolmogorov-Smirnov confidence limits. Similar methods are used to construct the interval boundaries of empirical distribution functions or so called P-Boxes [1].
Information availability on the probability density function makes it possible to take into account the influence of data uncertainty in the calculations and to obtain results in the form of random variables with a constructed probability density. One approach to accounting for the random nature of the input data is Monte Carlo method [2].
With all its positive qualities, this method has several disadvantages. One of the most significant drawbacks is the low convergence rate. It is important that many practical tasks with random inputs require faster methods. Computational probabilistic analysis is one of these approaches. The main idea of computational probabilistic analysis is to use numerical operations and relations over probability densities.
In computational probabilistic analysis, various types of representations of the density function of random variables are used piecewise polynomial function. Piecewise polynomial functions are determined by grids of dimension m and the values of the functions at the grid nodes. Histograms, frequency polygons, splines, etc. are examples of such functions [4][5][6].
The paper considers probabilistic extensions of interpolation polynomials and splines in the case when the input data are random variables given by their probability densities.
The probability density functions of random variables x, y, z will be denoted by bold font x, y, z. Let us identify through R the set of all probability density functions.
The interpolation problem can be formulated as follows [3]. Let f be some function, its values f i = f (x i ) at the points a = x 0 < x 1 < x 2 ... < x n = b are random variables f i with joint probability density function p( f 0 , f 1 , . . . , f n ). The problem arises of approximate recovery of all realizations of the function f at an arbitrary point x.
Further, this problem will be solved be using computational probabilistic analysis and applying the concept of probabilistic extension. For these purposes, we will construct probabilistic extensions of Lagrange interpolation polynomials, piecewise linear functions and cubic splines.

Elements of computational probabilistic analysis
Let (x 1 , x 2 , . . . , x n ) be a system of continuous random variables with joint probability density functions p(x 1 , x 2 , . . . , x n ). Let the random variable z be a function Definition 1 We say that the random function f : R n → R is a probabilistic continuation of the deterministic function f :

Definition 2
The random function f : R n → R is called the probabilistic extension of the deterministic function f : R n → R on the set D ⊂ R n , if (i) it is probabilistic continuation of f on D, (ii) the probability density function f coincides with the probability density function z of the random variable z = f (x 1 , x 2 , . . . , x n ), where (x 1 , x 2 , . . . , x n ) is a system of continuous random variables with joint probability density functions p(x 1 , x 2 , . . . , x n ).
Consequently, we can write If at some point ξ it is necessary to directly indicate the value of the probability density function f , we will use the notation The support of a continuous function f is the closure of the set {x | f (x) 0} and it is denoted by the symbol supp( f ).
Let f (x 1 , . . . , x n ) be a rational function. We can obtained probabilistic extension f of real rational functions f by replacing (i) the real variables x 1 , x 2 , . . . , x n with an probability density functions x 1 , x 2 , . . . , x n and (ii) the real arithmetic operations with corresponding probabilistic operations. The result f is called a natural probabilistic extension [8]. Theorem 1 ( [8]) Let x 1 , . . . , x n be independent random variables. If f (t 1 , . . . , t n ) is a rational expression where each variable t i occurs not more than once, then the natural probabilistic extension approximates a probabilistic extension.
and for all real t function f (t, x 2 , . . . , x n ) be probabilistic extension of the function f (t, x 2 , . . . , x n ). Then Corollary 1 ([9]) Theorem 2 implies the possibility of recursive computations for the general form of probability extensions and reduction to the calculation of the one-dimensional case.
If the random variables are independent, we can calculate the value of (2) using numerical probabilistic arithmetic sequentially calculating piecewise polynomial approximations. To calculate one addition, we need Cm 2 polynomial calculations. Accordingly, the total number of calculations will be equal to Cnm 2 . Thus, the article [7] compares the number of generate random variables of Monte Carlo methods and numerical operations using probabilistic arithmetic to achieve the same accuracy. Thus, the accuracy of addition of n uniform random variables (m = 10) is achieved by Monte Carlo methods with the number of generate uniform random variables equal to n · 10 6 . In the case of dependent random variables, according to 1 and [9], the number of operations increases as m n .
In the general case, Monte Carlo methods are used [2]. To overcome the shortcomings associated with the Monte Carlo methods, we will use a new approach, computational probabilistic analysis. This allows in some cases to calculate integrals of the form (1) with the required accuracy.
Next we will use random functions in the form of linear combinations For this type of random functions, we introduce the following concepts. Then the formal derivative of f (x) is defined as follows: we will call the constriction of the function f by constants a i .

Interpolation problems
The interpolation problem is formulated as follows. Let the probability densities f i be known at the points a = x 0 < x 1 < x 2 ... < x n = b and their joint probability function p( f 0 , f 1 , . . . , f n ) is given. We need to build a random polynomial interpolation l n (x): l n (x i ) = f i . We consider Lagrange interpolation polynomials for the case of linear interpolation. Let f 1 , f 2 be known values of some function f at the points x 1 , x 2 . In the case of linear interpolation we obtain the exact equality where l 1 is the first-degree Lagrange polynomial If the values of f 1 ∈ supp( f 1 ), f 2 ∈ supp( f 2 ) are not exactly known, it is necessary to construct a linear random function l(x) satisfying the interpolation conditions l(x 1 ) = f 1 and l(x 2 ) = f 2 . Thus, using natural probabilistic extensions, we construct random Lagrange polynomials of the first degree The interpolation function l(x) is equal to the given values at the interpolation nodes. It is important that the constriction of a random linear function with respect to the constants f i is a real linear function. Further, if it is necessary to construct a random function l satisfying the inclusion f ∈ supp(l), for all x ∈ [x 1 , x 2 ] then you will need to have a priori information about the probability density of f ∈ supp( f ) on the interval [x 1 , x 2 ]. Then we can get an estimate Next, we consider the general case for the Lagrange interpolations polynomial. We have Thus the calculation of Lagrange interpolation polynomial at an arbitrary point reduces to calculating the sum of f i with weights. If the random variables f i are independent the calculations are simple because they fall under the conditions of Theorem 1. For the number of nodes n 5, the application of the Lagrange interpolation polynomials is not effective. In this case you can use piecewise linear interpolation defined by the first degree Lagrange polynomial on each interval [x i , x i+1 ].
Let us estimate the mathematical expectation of the Lagrange interpolation polynomials. In accordance with the linearity property, the expectation of the interpolation polynomial will be a linear combination of the expectations of the function values. It will coincide with the Lagrange interpolation polynomial constructed from the expectation values of the function: If, for the mathematical expectation of the random function f , we are estimates of the second derivative max x∈[a,b] |E[ f (2) ]|, then following theorem takes place: where K is a constant independent of h.
Consider the properties of a variance for piecewise linear interpolation l 1 of a random function f .

Reliable approximation of the distribution function
The section discusses the construction of reliable approximation of the empirical distribution function. Let x 1 , . . . , x n be a sample of a random variable x with the distribution function F(t), t ∈ [a, b]. The empirical distribution function is defined as follows where m t is the number of x i < t. Consider z i = F(x i ), i = 1, . . . , n. Notice, z i , i = 1, . . . , n are uniformly distributed random variables on [0, 1]. If z 1 ≤ z 2 ≤ . . . ≤ z n , then z k is the kth order statistic and its expectation is equal E[z k ] = k/(n + 1) [10].
Note, if instead of mathematical expectations i/(n + 1) if used exact values z i , then the error of the piecewise-linear function l(t) with the step h = max 0≤i≤n−1 (x i+1 − x i ) would satisfy the estimate Hence, the constructed piecewise-linear function fairly well approximate the distribution function F even for relatively small n.
As for z i , we are aware that they form the order statistics. It is known that the probability density of the kth order statistic is (see [10]) The joint probability density for the vector (z j , z k ) has the form (see also [10]) p j,k (z j , z k ) = n! ( j − 1)!(k − j − 1)!(n − k)! z j−1 j (z k − z j ) k− j−1 (1 − z k ) n−k , j < k, 0 ≤ z j ≤ z k ≤ 1. For each random vector (z 1 , z 2 , ..., z n ) we have the corresponding piecewise linear function l. Looking through all possible random vectors (z 1 , z 2 , ..., z n ), we get the whole set of piecewise linear functions {l}. Note that {l} contains the interpolant of the distribution function F. Hence, using the probability density of the kth order statistic for a node ξ k , the set {l} can be represented as a random piecewise linear function l. Accordingly, l is a reliable approximation of the empirical distribution function.