Distribution of the modified statistic in the two-sample problem

. At a time when modern production reaches colossal proportions, huge funds are spent on experimental research and testing of construction structures and materials with a view to their further technological improvement. Reducing such costs, accelerating ongoing research and optimizing the process of analyzing test results are significant challenges in the development of modern science. Due to the destructive nature of many tests, it is not possible to measure two or more parameters on the same object. The solution of such important tasks as reducing the duration of tests, calculating reliability in an alternating mode, and reproducing operating modes in laboratory conditions requires knowledge of functional, correlation and a number of other dependencies. These and other reliability problems led to the need to plan accelerated tests, according to the results of which it would be possible to test various hypotheses about simultaneously unobservable parameters. In many problems of the theory of accelerated testing, it is required to check the presence of a functional relationship between failures of construction products in various operating modes. This paper proposes a different approach to solving such a problem, which makes it possible to optimize the process of statistical analysis.


Introduction
At a time when modern production reaches colossal proportions, huge funds are spent on experimental research and testing of construction structures and materials with a view to their further technological improvement.Reducing such costs, accelerating ongoing research and optimizing the process of analyzing test results are significant challenges in the development of modern science.
Due to the destructive nature of many tests, it is not possible to measure two or more parameters on the same object.A classic example of simultaneously unobservable parameters are the moments of failure of a construction product in various modes [1][2][3].At the same time, the solution of such important tasks as reducing the duration of tests, calculating reliability in an alternating mode, and reproducing operating modes in laboratory conditions requires knowledge of functional, correlation and a number of other dependencies.[3][4][5][6][7].
These and other reliability problems led to the need to plan accelerated tests, according to the results of which it would be possible to test various hypotheses about simultaneously unobservable parameters.
In many problems of the theory of accelerated testing, it is required to check the presence of a functional relationship between failures of construction products in various operating modes.[1,3,[8][9][10][11][12].
Note that this problem belongs to the problems of nonparametric statistics [13,14].In contrast to the classical approach used in modern practice, this paper proposes a different approach to solving such a problem, which makes it possible to optimize the process of statistical analysis.

Methods
The hypothesis about the presence of a connection between product failures in different modes is equivalent to the statistical hypothesis about the equality of the distribution functions of two random variables.One of which is uptime in one of the modes.The second is the modified uptime in a variable mode in a certain way.The equality of the distributions of these random variables is checked by any of the criteria used for this purpose (Smirnov, Mann-Whitney, Wald-Wolfowitz, etc.) [15,16].
Let us have 2 independent samples of size m and n.We test the following hypothesis:  0 :  =   ,  > 0 (1) We check (1) using the Smirnov statistics.Due to the fact that  0 is the main hypothesis, the Smirnov statistics take the following form: Here  ̂ and  ̂ are empirical functions, sample distributions ( 1 ,  2 . . .  ) and ( 1 ,  2 , . . .  ).The indices m and n of the functions  ̂ and  ̂ are not specified, since they are fixed and will not change.
These hypotheses have not previously been considered as the main ones.In some cases, asymptotic methods have been defined for them.

Results
The distribution function  * is determined by the expression where  ,−1 * can be obtained by repeated application of the relation = (8) with initial conditions  ,0 * =  ,0 (9) Note that an arbitrary vector  = ( 1 ,  2 , . . . + ) can be associated with the trajectory of a random walk of a particle making m jumps up and n jumps down [18].According to (5) it follows that each such jump corresponds to one of (m+n) factors of the form (1 . Here j -is the number of jumps of the trajectory down, ( + ) -is the total number of jumps made up to this moment.Obviously, the value of this factor does not depend on how the trajectory came to this state (that is, it has j jumps down from the total number ( + ) of jumps [19,20].Then, if we divide all possible trajectories by the number of jumps up between ( + 1) and ( + 2) jumps down, then in (5) jumps up will correspond to products with the form ∏  ,+2 (ℎ) = .Here V s the number of jumps up to ( + 1) jump down, i is the number of jumps up to ( + 2) jump down.For i -fixed, V can take values from 0 to i. Hence relation (8) follows.Conditions (6) and (9) ensure that the probabilities of those trajectories for which  * > ℎ. Let The distribution function D is given by where  ,−1 can be obtained by repeated application of the relation = (12) with initial conditions: (13) For various values of m, n and the parameter α, according to (7) and (11), the values of the quantiles of statistics with their probabilities were calculated (Tables 1, 2).
The presented tables are built according to the following principle.The top line indicates the statistic (D or  * ), whose distribution is considered, and the numerical value of the parameter α.In columns with indices m and n, all possible enumerations of a pair of numbers m and n are carried out.The columns with indices h and P indicate, respectively, the possible values taken by the statistic in question, and the probabilities that the statistic is less than or equal to these values.In other words, column h contains the quantiles of the statistic, and column P contains the probabilities to which these quantiles correspond.Note that the tables show quantiles of the level greater than or equal to 0.8.When using tables, it must be taken into account that the values of the statistics under consideration must be calculated with great accuracy (sometimes with an accuracy of 3-4 digits after the decimal point).This is due to the fact that sometimes a very small change in the value of the quantile corresponds to a significant change in its level.

Discussion
The presented sample tables show that the test of hypothesis (1) about the equality of the distribution functions for two samples can be carried out with minimally selected m and n, which allows them to be used in the analysis of laboratory test data.Let us consider an example of using tables.Let m=4, n=5.We need to test our main hypothesis H 0 (1) with α=3.We will choose the significance level equal to 0.9.To test the hypothesis, we will use the statistics D.
The verification procedure will be as follows.In the table corresponding to the statistics D and α=3 for the indicated m and n, we find that the value P=0,893≈0,9 corresponds to the value h=0.784.Therefore, if according to the results of the experiment we get D>0.784, then the hypothesis H 0 is rejected.Otherwise, the hypothesis is accepted.

Conclusions
In contrast to the existing methods of statistical analysis of samples (Smirnov, Mann-Whitney, Wald-Wolfowitz, etc.), the developed approach makes it possible to analyze the parameters of the tested products for small (less than 20 by the sum of m + n) samples.The use of the obtained tables in statistical calculations makes it possible to determine the optimal number of samples to minimize errors in the calculations of the required parameters.

Table 1 .
Values of quantiles of statistics D (α=3) with their probabilities

Table 2 .
Values of quantiles of statistics D * (α=3) with their probabilities