Modeling of accident rates involving trucks in order to improve road safety in the Russian Federation

. A statistical analysis of accidents on road transport in the Russian Federation involving trucks was carried out. We have studied the legal acts regulating road safety in General and the implementation of cargo transport in particular. The object of the study is the state of road safety. The possibility of using regression for modeling and analyzing the relationship between the number of cargo vehicles and the length of roads is considered. These indicators are taken into account for 85 regions of the Russian Federation. The result of the joint influence of independent variables on the number of road accidents is obtained. A multiple regression equation is given to explain the model parameters. The constructed model is useful, and the selected two variables "length of highways, km"," number of cargo vehicles, units " allow predicting the level of accidents on road transport.


Introduction
In the Russian Federation, about 80% of the total volume of cargo transported by all modes of transport (rail, air, sea) is carried out by road. Most of the goods cannot be transported to the consumer without the participation of road transport. The growth of production volumes, changes in the specifics of products, the need and development of the economy reflect the demand for road freight transport. The annual (2-4%) growth of the freight rolling stock fleet determines the need for these services. [1] The main industries that use road freight transport are related to retail. Road freight transport is economically feasible for use in short-and medium-distance transportation [2][3][4][5][6][7].
Every year, about 7% of accidents occur due to violations of traffic rules by truck drivers. The severity of accidents involving trucks is the highest of all the committed accidents and is 10.1 deaths per 100 accidents. Reducing accident rates is a priority task of the state.
The authors' works [8][9][10][11][12][13][14] are devoted to assessing the impact of such factors as the number of people, the number of traffic violations, the parameters of automatic detection systems, the commissioning of new roads, etc. on the stability and safety of road traffic.

Methods and materials
In this paper, preference is given to the study of multi-factor regression, which involves establishing a linear relationship between a set of input independent and one output dependent variables [15].
The output dependent variable is "number of road accidents, units", the input independent variables 1 are "total length of highways, km" and 2 "number of cargo vehicles, units".
The initial data for the study are: statistical data on accidents in road transport, the number of registered trucks for 2019 in accordance with the data of the State road safety Inspectorate, the length of public roads in the subjects of the Russian Federation according to the Federal state statistics service for 2019. The research was conducted in the context of 85 subjects of the Russian Federation.

Results and discussion
Thus, to get the model, the number of observations is 85. The number of independent variables in the model is 2. The number of regressors taking into account the unit vector will be equal to the number of unknown coefficients.
Using the least squares method and matrix approach, the regression coefficients in the equation are determined.
The vector of regression coefficient estimates will take the form: Using a matrix scatter plot, which is a top view of the regression plane and two views along the plane, we can say that the spread of points relative to the regression plane is in the range +/-3.
A matrix of and and a transposed matrix are made up. The resulting matrix has the following correspondence: Based on the available data, the paired correlation coefficients will take the values: The matrix of paired correlation coefficients will take the form: The analysis of multicollinearity of factors , 1 and 2 showed that the results of multiple regression are reliable. In the case under study, with the available initial data, all paired correlation coefficients are| | < 0.7 , which indicates that there is no multicollinearity of the factors: 1/ 2 = 0.0797, 2/ 1 = 0.6750, 1 2/ = 0.4420. Matrix analysis allows you to select factor features that can be included in the multiple correlation model.
Since the results obtained are in the range of 0.3 ≤ | | ≤ 0.7 , the relationship between the factors is significant.
The factor 2 "number of cargo vehicles, units" ( = 0.7946) has the greatest influence on the result attribute. This means that it will be the first to enter the regression equation when building the model.
A more objective description of the tightness of the relationship is given by partial correlation coefficients that measure the influence of factor xi on the result at the same level of other factors.
The multiple correlation index evaluates the tightness of the joint influence of factors on the result. If the value is close to 1, the regression equation better describes the actual data and factors have a stronger influence on the result.
The multiple correlation coefficient (1) can be determined using a matrix of paired correlation coefficients: where ∆is the determinant of the matrix of paired correlation coefficients;∆ 11determinant of the interfactor correlation matrix. The multiple correlation coefficients will be 0.7961, i.e. the relationship between feature and factors is strong.
A more objective estimate is the adjusted coefficient of determination (2): We evaluate the significance of the multiple regression equation. Let's test the hypothesis of General significance, i.e. the hypothesis that all regression coefficients are simultaneously equal to zero for explanatory variables: Since the actual value of > кр = 3.07for кр (2; 82), the coefficient of determination is statistically significant and the regression equation is statistically reliable (i.e., the coefficients are jointly significant).
The need to assess the significance of the additional inclusion of a factor is due to the fact that not every factor included in the model can significantly increase the proportion of the explained variation of the effective feature. This may be due to the sequence of input factors, since there is a correlation between the factors themselves.
A measure of evaluating the significance of improving the quality of the model, after the factor is included in it, is a specialcriterion-(4): where 2 − 2 ( 1 , )increase in the proportion of variation due to an additional factor included in the model .
If the observed value of is greater than кр , then the additional introduction of the factor into the model is statistically justified. A particularcriterionevaluates the significance of the regression coefficients .
We evaluate the feasibility of including 1 factors in the regression model after the introduction of ( 1 )using a particularcriterion.
The observed value of the partialcriterion is 1 = 0.519.
Let's compare the observed value of a particular -criterion with the critical one: 2 > , and determine whether it is advisable to include factor 2 in the model after introducing factor 1 .

Conclusion
The model parameters can be interpreted as follows: an increase in 1 "length of highways, km." by 1 unit leads to an increase in "number of accidents, units." on average by 0.00921 units; an increase in 2 " number of cargo vehicles, units." by 1 unit. leads to an increase in on average by 0.192 units. By the maximum coefficient 2 = 0.75, we conclude that the factor 2 has the greatest influence on the result . The statistical significance of the equation was verified using the coefficient of determination and the Fisher criterion. It was found that in the studied situation, 63.37% of the total y variability is due to changes in the factors.
Thus, the constructed model is useful, and the selected two variables "length of highways, km"," number of cargo vehicles, units " allow predicting the level of accidents on road transport.
The work was performed under the RGNF grant 21-19-00240.