Building a mathematical model for assessing the hazard of emergency situations and the formation of types of causes for rolling stock derailments

. The technique provides an analytical criterion for selecting the one set of numerical data most involved in an emergency out of several ones which are the relative deviations of the main factors from the critical ones in transport. The criterion is based on finding the theoretical left boundary of the interval of each set of deviations. The sets under study are reduced to normal distribution laws using a well-known method. The analysis uses basic numerical characteristics similar to those used in probability theory and mathematical statistics. The proposed method is based on finding the central theoretical moments up to the third order inclusive. Explanatory demonstration examples are provided.


Introduction
Part of the accidents may take place due to only one significant cause occurring in a certain direction, r €1,2, . .k, but at the same time, in other directions j ≠ R there may be quite small values of δ ji (i € {1, 2, .. k j).The main task is to process the numerical values δ ri comparing data sets.This is how a unit of measurement is introduced in [Rudanovsky] (  ,   ) = √∑ (  −   ) 2 /   2 ]/    1 where p n , k n are the values of the parameters of the same name, respectively, the convergence C i and kernels of class K j ; m jthe number of deviations in one class.The average value of deviations in the R-th set of deviations.On average, it characterizes the attitude towards accidents, but may be far from the truth when δ ri 0 is equal or very close zero, and all others are δ rj (j ≠ i 0) and reach quite large values.When δ r 1 = δ r 2 = …….δrkr = δ r0 , it turns out that L r 1 = δ r 0, and this, in the initial approximation, gives, albeit a rough, estimate of the direction being studied.When comparing with other directions, such averaged values will primarily be involved.A better understanding of this can be clearly felt by comparing the two directions G1 and G2.Involvement in an emergency situation is numerically expressed by the formula [Rudanovsky].where L1 is the average of the first direction, L2 is the average of the second direction; PR is a probability of involvement in the accident of the R-th direction (R=1,2) .If the deviations in each group (direction) are sufficiently close to each other, these probabilistic characteristics will give a fairly accurate forecast.But with a large scatter of deviation values in each direction, these characteristics q1 and q2 should be considered only as preliminary, giving a rough estimate.
The subject area being studied completely coincide with a discrete random variable (Gmurman) and with a single refinement δ ri (P = 1,2,... K ; i = 1,2,..., n R ) take only positive values.The analogue of L R 1 in mathematical statistics is x ƍ2 "sample average in the ƍ-th group" of variation series x R 1, x R 2, …, x Ri , x RnR.All known methods of mathematical statistics are based on methods of probability theory.Therefore, in the further presentation we will adhere to the terminology and methods of mathematical statistics and probability theory.Together with the metric of the average δ ri, a quadratic metric is introduced, It is also average, and for δ r 1 = δ r 2 =… =δ RnR = δ r 0, takes the values δ r 0. It is considered as an alternative to LR 1, and for it the probabilities of involvement in the accident are also considered using similar formulas To evaluate them, let us compare L R1 and L R2 .It is easy to see that In mathematical statistics [Gmurman], the value R R is called "variance in the R-th group" and is denoted as Dispersion characterizes the "scatter" of the squared distances of deviations xri relative to its center -the "sample average" ̅ R. Undoubtedly, at a greater distance from ̅ R , the dispersion will take on larger values.

Methods and materials
Let's check the feasibility of formulas (1), assuming in particular D 2 > D 1 for L 1 = L 2 = L 0 that the existing relations L 11 = L 21 = L 0 are valid, and then according to formulas (1) it should be?In fact, due to the greater scattering of data, both larger and smaller L0, the possibility of accidents increases due to δ 2 j < L 0. Consequently, an increase in dispersion only increases the propensity for the accident effect.Therefore, common sense refutes the validity of formulas like (1), which are not true in principle.First of all, this is explained not by coordinate or metric space, but by the existence of variation series x R1, x R2, …… x RnR.Let us give a brief confirming example of a small sample size, where for a simple calculation all deviations are increased by 10 times.Example 1.
Let the sets of deviations G 1 ={7;5;3}, G 2 ={9;4;2} be given.Easily calculated sample means But in the second set G2 there is an element δ23 =2, the smallest of those presented.It is it that causes the tendency to accident.Based on the above reasoning, it follows: with equal sample means L11 = L21 of two sets of deviations, the set with the largest dispersion is more involved in the accident.Here, the process of establishing priority for an accident can be called a two-stage process.At the first stage, the involvement in the accident is determined preliminary, albeit roughly, using formulas (1.1).The second stage involves analysis using variance.A more general statement is true when, with a smaller or equal sample mean of one set of deviations and a greater dispersion of this set, the mentioned set will be more involved in the accident.
Here, in the first stage, the first set of deviations (where the sample mean is smaller) dominates.According to formulas 1.1, the first set is more prone to an accident.At the second stage, due to greater dispersion, the first set of sound reasoning also tends to crash.
A quite interesting situation is the case when, at the first and second stages, equal attitudes towards accidents arise.
Let's look at a simple example 2.
Assuming the sets of deviations G1={9;4;2}, G2 ={8;6;1} are given.The calculations yield L11 = a 1 =5= a 2 = L21.In addition, the "sample range" (the difference between the largest and smallest elements) is the same for sets 9-2=7=8-1.The only fact that the accident is more involved can be the single smallest deviation value for the second set δ23 =1 (in the first set all δ1 j ≥ = 2 (j=1,2,3)).This circumstance is expressed by the negative value of the "central theoretical moment [Gmurman], which in terms of mathematical statistics has the expression M 3 (R)= In this case These properties are universal and do not depend on sample selection volumes.The simplest patterns appear equally in large and small variation series.The explanation for the sets under consideration is simple: for significantly small extreme elements δRj set samples GR the value (  −  ̅  ) will be the smallest negative, and the value (  −  ̅  ) 3

makes a decisive contribution to the sum M3 (R).
In relation to the case under consideration, Statement 3 holds With equal first two central theoretical points A set of deviations with a lower value is more related to accidents.A more difficult question about the involvement of two sets of deviations in the accident arises in the case of a smaller sample average of one of the sets and a smaller variance for the same set.
The fundamental factor for the analyses is the central limit theorem established by According to this theorem, almost all ongoing processes are described by the normal Gauss law with a probability density [ For the specified distribution, the following properties hold [Gmurman].

Property 1.
The mathematical expectation M [ X ] of a continuous random variable X(X=(-∞; + ∞)) of the normal distribution law is equal to the parameter  ̿.
The variance D [X] of a continuous random variable X of the normal distribution law is equal to δ 2 .The parameter δ = √[] is called "standard deviation".
Property 3. Laplace's theorem.a continuous random variable X of the normal distribution law falling into the interval (x1 , x2 ) is calculated using the formula Φ(t2) -Φ(t2), where t 1 = Due to the non-existence of the last definite integral in quadratures, its values are summarized in a table given in all probability theory textbooks under the heading "Appendix 2", where the argument x is tabulated in increments of 0.01, the Laplace function with an accuracy of 10 -4 .The Laplace integral function Ф(x) is odd.
Property 4. Three sigma rule.For a normal distribution law, the probability of a random variable falling into the interval (a -3δ; a +3δ) is 0.9973.Almost all random variables are concentrated in a central interval of length 6δ.
Next, we set the minimum value of the argument x0 at the distribution density of the normal distribution law, in which the probability of a random variable falling into the interval (-∞; x 0) is equal to the probability at the point x0 itself.To do this, we bring the argument х ̿to the unbiased variable z = x -a, and conduct a search in the variables z.This condition will be written Φ(z 0) = -z 0 f( z 0 ).Here obviously z will be a negative value, as can be seen from the graph f(z) [Gmurman].A comparison of equality can be carried out by differentiating with respect to z the expression Φ (z) = -z 0 f(z), assuming the derivative of a definite integral with a variable upper limit at a point equal to the value of the integrand at this point [Fichtenholtz].
As a result, we get f ( z ) = -f ( z ) + + δ 2 f ( z ).Considering that the exponential function does not vanish at the end points, we obtain z 2 =2δ 2 .
For a normal distribution law, the probability of getting into the interval coincides in modulus with the value of the probability function at the end point of the interval j1 , also the probability of getting into the interval G1 = (-∞;  − √2δ) coincides in modulus with the value of the probability function at the end point of the interval G 1 , also the probability of falling into the interval G2 = (  + √2δ; + ∞) coincides with the value of the probability function at the starting point of the interval G2 .
In Appendix 2 of the values of the integral function Φ the largest value equal to 0.5, for an increasing function occurs at x=5.0.In practice, it is possible for a random variable x to fall into the interval (a -5δ; a +5δ) to be reliable.Modeling the initial variation series (set of deviations) using the laws of normal distribution greatly simplifies the task of determining priorities for the occurrence of emergency situations.In this case, the mathematical expectation αR in mathematical statistics is interpreted as a "sample average", in our terminology dR1; the standard deviation, both in probability theory and mathematical statistics, is interpreted as the square root of the variance DR.Then for two sets of deviations G1 and G2 the parameters aj = dR1 are found; δi = √i _ (i =1.2), and based on them, distribution laws with probability density functions are compiled G1 using property 5.
In some cases, the theoretical calculated bound b1 =  − √2δ turns out to be significantly smaller in a given initial set.This only means that the values of the set under study are located to the right of the left border of the theoretical series.Although the approximate value ( − √2δ) characterizes the theoretical left boundary,  + √2δ will also characterize the right theoretical boundary of the data set under study.Thus, theoretically, the entire set of deviations G R is within the theoretical interval  0 = ( − √2δ;  + √2δ).The length of this interval is equal to 2√2δ, which will be greater than the range of the set, denoted by T.
Indeed, even with the most unfavorable scatter: part of the G values in the amount of R1 is located on the left edge of the interval J00 = (  −δ;  + √2δ) another part of the G values in the amount of n -R1 is located in the right edge of J00.Then D= In this case, the range of the data set is T = 2δ < 2√2δ.The given examples 3-4 express the left theoretical boundary of the interval J0.But there will also be similar examples demonstrating the right boundary of the interval J0.Obviously, this will be the case for the mirror image to G1 ={ 11 , 12 , …., 1 n 1} relative to the center of the set G2 ={ 21 ,  22,…  21 in which each component 2 i of the second set is removed for exactly the same distance |1 i -a| from the point  ̿ , for the set G2 the main numerical characteristics of the mathematical expectation and dispersion will be the same: (11) The difference consists only in different theoretical central points of the third order: Let's look at supporting example 5. Let the set G1 ={10;7;4;3} be given.The calculations yield a1= L11 =6; D1 = .Hence δ 1 =2.74, and the value a -√2δ=6-√15 = 2,13.But this value has nothing to do with the smallest value δ13 =3.At the same time, the value (a+ √2δ) = 9.87 is quite close to the largest value of the set δ11 =10.Consequently, the set under study is "pressed" to the upper boundary of the interval J0.Its lower bound is easy to find using the range of the data set.In this case, T=10-3=7, and the theoretical lower bound will be equal to (a + √2δ -T) =2.87.The last value of the theoretical smallest value corresponds to the smaller value δ1 i = 3 of the data set G1.
The mirrored row here will be G 2 ={12-10;12-7;12-4;12-3 }= {2;5;8;9}, where also a=6; Finally, we can formulate the main criterion for the relation to the accident rate of two sets of deviations G1, G2, for which the main characteristics up to the third order will be: Of the two sets of deviations G1 and G2, the set for which there is the smallest value of the values b1, b2 has a greater relation to accident rates.
To demonstrate it, let's use example 6.
G2 causes the greatest propensity for an accident.By a simple comparison of the original ones with the smallest deviation δ23 =2, one can conclude from common sense that the second set of data has a tendency to accidents.
Statement 4 does not necessarily express a percentage.If b1 and b2 are equal, the involvement of two sets in the accident should be considered equal, and the final determination of the cause of the accident does not relate to numerical characteristics.Let's consider example 7.
Let the following sets be given: G1={7;6;2}, G2 ={8;8;2}, which can be visually compared.The main numerical characteristics here will be the following: a1 =5; D1 = The approximate calculations of the values of b1 and b2 are the same here, only with an accuracy of 0.05 the value of b1 is less than b2.Therefore, the first set can be considered more relevant to accident rates.Generally speaking, both sets of deviations involve the same smallest deviations δ13 = δ23 =2.Only in this case, in the set G2, the remaining deviations are more distant from zero than the deviations G1.
In general, we can note the consistency of statement 4 with the previous three.Using the same scheme, it is possible to select sets of deviations G1, G2, G3 that are more involved in the accident, for each of which the main numerical characteristics are found: sample average; dispersion and corresponding standard deviation; central theoretical point of the third order; sample range for M3 >0.The quantities b1, b2, b3 are calculated using formulas 1.3.Then it's true Statement 5 (the main criterion for selecting the set involved in the accident).
Of the sets of deviations G1, G2, G3, the set corresponding to the smallest value of the numbers b1, b2, b3 has a greater relation to the accident rate .
According to statement 5, the smallest of the numbers is: 2.69; 4; 4 is b1, which determines the accident propensity of the first set of deviations.And in fact, the smallest deviation δ13 =3 leads to the same conclusion.

Conclusion
Finally, it should be noted that in mathematical statistics the main numerical characteristics , ̿ D , δ, M 3 ... are determined the more accurately, the larger the sample volume provided is.This is explained by some general property of the available random variables.In sets of deviations, all quantities δ ri ( R €{ 1;2,….., K ; j € {1,2,… n R }) do not depend on each other, and therefore the main numerical characteristics do not depend on the number of factors considered.But the analysis methods are similar.
Particular attention should be paid to the positive influence of the methods of probability theory and mathematical statistics on the analysis of numerical data.
The influence of dispersion on a greater propensity to an emergency situation has been established.For two or more sets of deviations, a criterion for the involvement of one or another set in the accident was identified.It is found by comparing the values that determine the theoretical left boundary of the sets, calculated from the main numerical characteristics of the given sets of deviations.The main criterion for involvement in an accident has been compiled for two and three sets of deviations, formulated by statements 4 and 5.The possibility of compiling involvement in an accident using basic numerical characteristics has been shown.