Method of autoregression in application of singular-spectral analysis of time series for forecasting production of oil and gas industry products

. More and more time series data are produced in various fields. It provides data for the research of time series analysis method, and promotes the development of time series research. Due to the generation of highly complex and large-scale time series data, the construction of forecasting models for time series data brings greater challenges. The theoretical aspects of using the model of singular-spectral analysis of time series with the use of autoregression are considered, and the justification of the expediency of using this model for forecasting the production of products for both the oil and gas industry and dual-use products is given. Both autoregressive model and decision tree model can be applied with the same degree of reliability for forecasting aggregate values of production.


Introduction
The basis of any automated time series forecasting is the construction of a model to calculate future events based on known events of the past.
From the existing models of time series in the theory of econometrics two models were chosen, as they best meet the requirements for the task of forecasting socio-economic macroindicators in the medium and long term forecasting of industries:  A model based on the transformation of non-stationary series into time components using the singular value decomposition method with the application of decision trees.
 Model based on the transformation of non-stationary series into time components using the singular expansion method with the application of autoregressive model.The mechanism of action of singular-spectral analysis of time series for forecasting the production of dual-purpose products, as well as the training of the "noise" component through the application of decision trees is most fully described in [1][2][3][4][5].
In machine learning methods there are a large number of different models and approaches for their formation, but all are united by the need to calculate the initial parameters of the model.The computation of these parameters requires large time expenditures, so their reduction is one of the most important problems, for which there are various methods.One of such approaches is the use of parallel algorithms [6].In this article we will talk about the mechanisms of action and analyze the application of the autoregressive model, or, more precisely, its variant, the autoregressive integrated moving average (ARIMA) model, which is based on the parallel execution of two independent processes.

Theoretical aspects of using autoregressive models
Autoregressive model is a method of modeling the subsequent components based on the previous ones in the form of a linear function.Autoregressive model in its pure form is rarely used in practice, as it is suitable only for time series without trend and seasonal components.
This study will analyze the application of ARIMA model (Box-Jenkins model).
When analyzing historical data, there are no distinct regular components in the time series of dual-use products.In addition, individual observations may contain significant errors, which interferes with the identification of regular components, as well as with the construction of the forecast.Autoregressive integrated moving average (ARIMA) allows the selection of regular components [7][8][9][10][11][12][13][14][15].
The ARIMA model is a model for analyzing non-stationary time series that combines two parallel processes -autoregression and moving average.Based on the definition, the mechanism of the ARIMA model can be divided into two processes.
Autoregressive (AR) process.Provided that there are elements in the time series that consistently depend on each other, the time series can be written as a function: where  is a constant,  � ,  � ,  � are autoregressive parameters.
Each element of the series is the sum of a random component as well as a linear combination of previous observations [16][17][18][19].
Moving average (MA) process.The difference between the moving average process and the autoregressive process is the presence of the cumulative effect of previous errors on each subsequent observation.MA takes the following form: where μ is a constant,  � ,  � ,  � are autoregressive parameters.That is, the series model is a sum of the random component at the current moment of time and a linear combination of random variables at previous moments of time.
As a rule, ARIMA models (p, d, q) are used, whose parameter values do not exceed 2. In this case, the parameter p determines the order of the autoregressive component, the parameter q determines the order of the moving average, and the parameter d determines the order of the difference (discrete derivative).
Thus, the ARIMA model is a model combining two processes (AR and MA) and has the following form: where  � is the stationary time series, с,  � ,  � are the model parameters, ∆ � is the difference operator.
The main advantages of using ARIMA models are:  the greatest scientific validity and clear statistical-mathematical justification compared to other time series forecasting models;  simplicity in application to time series forecasting, availability of formalized instructions for selection of the most appropriate model.Among the disadvantages of the ARIMA model we can single out the presence of strict requirements for the number of observations, non-adaptability and the need for re-estimation when data change, as well as large time costs.

Model comparison
To analyze the adequacy of the application of the model of singular-spectral analysis of time series with the use of autoregression and with the use of decision trees to forecasting the production of products, including dual-use, the forecast of organizations by industry was compared with the forecast of artificial intelligence according to the two models and the corresponding deviations were identified.
To conduct the study, a sample of 46 nomenclature items of products manufactured by organizations, which most fully reflect the general picture of production of products, including dual-use defense industries, was made, of which:  15 nomenclature items produced by organizations of the aviation industry, including: navigation, meteorological and geophysical equipment, systems and complexes of on-board radio electronic equipment (hereinafter -BREЕ), autopilot equipment, fire alarm systems and other high-tech products;  21 nomenclature items produced by organizations of the radio electronic industry, including: radio measuring equipment, microwave electronics and related components, components of electrical machines, vacuum devices, complexes of devices for pre-flight inspection of avionics and other high-tech products;  10 nomenclature items produced by oil and gas industry organizations, including: marine instrumentation products (engines, communication equipment, electromagnets, electromagnetic contactors and starters, navigation, meteorological, geophysical instruments and similar tools), related products (lighting fixtures) and other high-tech products.The input data for the forecast were the actual data on the output of products including dual-use products for 2018-2022, as well as the estimated value for 2023.The forecast values were built for the period up to 2033.To summarize the situation, we present a comparison of aggregated data by industry.
Aviation industry.Based on the comparison of forecast data for 17 key products of the aviation industry to which the artificial intelligence forecasting mechanism was applied, the following conclusions can be drawn (Figure 1):  linear trends of forecasts for the period up to 2033 have the same directionality and do not diverge in the long-term period;  in the short-term period, the model forecasts may significantly deviate from the expert forecasts of the defense industry organizations due to the fact that the model does not take into account the existing plans of the defense industry organization, as well as political and economic risks;  the expert forecast of defense industry organizations is more conservative compared to forecast modeling, which may be due to intentional underestimation of values in order to achieve (exceed) indicators in the future;  on average, the forecast using decision trees deviates by 67%, and the forecast using autoregression -by 57% to the greater side from the expert forecast of the defense industry organizations;  the average growth rate for the period 2018-2033 according to the expert forecast of DIC organizations was 104.2%, according to the forecast modeling using decision trees -107.3%, according to the forecast modeling using autoregression -107.2%.Radioelectronic industry.Based on the comparison of forecast data for 21 key products of the radio electronics industry, to which the forecasting mechanism with the help of artificial intelligence was applied, the following conclusions can be drawn (Figure 2):  linear trends of forecasts in the short-term and long-term period have the same direction and do not diverge;  on average, the forecast using decision trees deviates by 16%, and the forecast using autoregressive model -by 17% downward from the expert forecasts of defense industry organizations;  the average growth rate for the period 2018-2033 according to expert forecasts of ICS organizations was 105.7%, while according to forecast modeling using decision trees -103.5%, and by forecast modeling using autoregression -103.4%.Oil and gas industry.Based on the comparison of forecast data for 10 key products of the oil and gas industry to which the artificial intelligence forecasting mechanism was applied, the following conclusions can be drawn (Figure 3):  in general, the linear trends of forecasts have the same directionality and do not diverge over the long-term period;  in the short-term period the model forecasts may significantly deviate from the expert forecasts of the defense industry organizations, which is due to the cyclical nature of shipbuilding production;  expert forecasts of defense industry organizations are more optimistic compared to forecast modeling;  on average, for the entire period, the forecast using decision trees deviates by 6.6%, and the forecast using autoregression -by 6.3% to the lesser side from the expert forecasts of the defense industry organizations;  the average growth rate for the period 2018-2033 according to the expert forecast of the DIC organizations was 123.8%, according to the forecast modeling using decision trees -119.8%, and using autoregression -120.1%.In addition, the RMSE (root mean square error) metric was calculated to select the most appropriate research model.RMSE is calculated using the standard formula: As a rule, in order to better understand the values of the RMSE indicator, it is normalized by the following formula: ���������� � ���� ��� ��� ����� � ��� ��� ����� (5) Normalization brings the entire range of values to between 0 and 1, which allows models with closer to zero RMSE values to be interpreted as better fit.

Conclusion
The model of singular-spectral analysis of time series with application of decision trees and the model of singular-spectral analysis of time series with application of autoregression can be used for forecasting the production of dual-use products by the organizations of the defense industry, as they show sufficiently accurate and adequate results in the long-term period and do not contradict the vision of the development of product output by experts of the defense industry organizations.

Fig. 1 .
Fig. 1.Comparison of aggregated forecasts for dual-use products of aviation industry organizations.

Fig. 2 .
Fig. 2. Comparison of aggregated forecasts for dual-use products of radioelectronic industry organizations.

Fig. 3 .
Fig. 3. Comparison of aggregate forecasts for products of oil and gas industry organizations.