Singular-spectrum time series analysis for oil and gas industry production forecasting

. The article considers theoretical aspects of using the model of singular-spectral analysis of time series with the use of decision trees, and also provides justification of the feasibility of using this model for forecasting the production of products for both oil and gas industry and dual-use products. To reduce the risks of distortion of aggregated forecast values when working with large data sets, the expert should carry out preliminary manual selection and exclusion of products, the release of which has been completed or suspended.


Introduction
Long-term and medium-term planning of the development of the defense-industrial complex as a set of individual enterprises and organizations is currently based not only on mobilization plans and the state armament program, but also on a systemic complex of all interconnected works necessary for the production of dual-use and civilian products.
Dual-use products play an increasing role in the structure of output of defense industry organizations.Firstly, this is due to the fact that a significant part of the defense industry production falls on components, raw materials, and materials that can be used in finished products for various purposes.For example, the same products of the radio-electronic industry are used in both military and civilian airplanes, ships, missiles, wheeled and tracked vehicles.Secondly, after the imposition of sanctions by foreign countries, the need for domestic components for finished products has increased even more.The transfer of technologies successfully used in the production of military equipment to civilian samples can partially solve the problem of component shortage.In this regard, there is a need for a more detailed analysis of dual-use products output, as well as long-term forecasting of their production in order to make balanced management decisions.At the same time, there is currently a need to create mechanisms for automated forecasting to minimize the negative impact of the human factor.

Theoretical aspects of using the model of singular-spectral analysis of time series with application of decision trees
A time series (dynamic series) is understood as a sequence of observations of a random variable (attribute) ordered in time.
Let us present two main types of time series:  Stationary time series are time series describing random processes that proceed approximately homogeneously in time and have the form of continuous random fluctuations around some average value, and neither the average amplitude nor the characteristic of these fluctuations reveal significant changes over time.However, in most cases, the real economic situation can be represented as a set of nonstationary (random) processes with a definite directional tendency of development;  Non-stationary (random) time series are time series that may have trend, seasonality and cyclicality.Currently, there are several methods for analyzing non-stationary time series, the main among them are:  wavelet transform;  Huang-Hilbert transform;  singular spectral analysis method.
The choice of the singular spectral method for the analysis of dual-use production by the organizations of the defense industry is due to the fact that this method allows not only to analyze non-stationary time series, but also solves the most urgent problem -data forecasting.
The singular-spectral method of analysis is a method of analyzing non-stationary time series, which is a variant of the principal component method and is based on the transformation of a univariate time series into a multivariate one (i.e., the singular-spectral method of analysis decomposes a time series into groups of components, and then, by combining them in a special way, allows to obtain a new time series free of non-deterministic components).
Here is a description of the mechanism of the singular-spectral method of time series analysis.
Let there be a non-zero (i.e., there exists at least one i such that  0) time series   , . . .,  of length N (let N > 2).A typical analysis mechanism consists of two stages [4]: Decomposition.Stage 1: Nesting is a translation of the original time series into a sequence of multivariate vectors.Let L be the window length (some integer), 1 < L < N. The embedding procedure forms K = N -L + 1 embedding vectors having dimensionality L:   , . . .,  , 1   (1) where: X is the trajectory matrix of the time series F consists of embedding vectors as columns: Х Х : . . .: Х (2) trajectory matrix X is a matrix of the form: So, х  and matrix X has equal elements on the "diagonals" i+j = const.Thus, the trajectory matrix X is a gankel matrix (i.e., a matrix with equal elements on all diagonals perpendicular to the main diagonal).There is a one-to-one correspondence between gankel matrices of dimension LxK and the time series N = L+K -1.
Stage 2: Singular Decomposition.The result of this stage is the singular decomposition of the time series trajectory matrix.
Let   .Let  , . . .,  denote the eigenvalues of the matrix S taken in non-decreasing order  . . . 0 , and  , . . .,  denote the orthonormalized system of eigenvectors of the matrix S corresponding to the singular numbers  , . . .,  .
Let   :  0 .If we denote    /  ,  1, . . ., then the singular expansion of the matrix X can be written as:   . . . (4) where     .Each of the matrices  has rank 1, hence they can be called elementary matrices.The set    is called the i-th singular triple of the expansion. Recovery.
Step 3: Grouping.Based on the decomposition obtained in formula (4), the grouping procedure divides the entire set of indices Formula ( 7) represents the averaging of matrix elements along the "diagonals" i+j=k+2: choosing k=0 gives   ,, for k=1 we obtain    /2, etc. Applying diagonal averaging (7) to the resulting matrices, we obtain the time series   , . . .,  , and, as a consequence, the original time series  , . . .,  decomposes into the sum of m time series: The main advantages of applying the singular-spectral method to time series analysis are:  absence of a strict requirement for stationarity of the time series;  possibility of application to noisy and short series;  the possibility to identify both periodic and complex non-stationary components of time series.The weakness of the singular-spectral method is the absence of an analytical model representation of the series, for example, in the form of a sum of simple functions, a compact analytical representation of which could be clearer and more accessible for interpretation than an aggregate of a large number of components [1].
The singular-spectral method of analysis forms a trend of the initial data series development, even if it is not linear.At the same time, for the purpose of further training, a time series of trend deviations from real values, which is commonly referred to as "noise", is identified.It should be noted that focusing on trend and noise improves the accuracy and generalizability of models due to the fact that fewer factors and data are needed to identify patterns.Training of the noise component can be accomplished by various methods, including the use of a decision tree model.
A decision tree is a data analysis tool that is a hierarchical tree structure that consists of a set of rules that solve problems like "If ..., then ... .".
The general scheme of operation and the main elements of a decision tree are presented in Figure 1.
The method of decision trees consists in the following mechanism: the set of feature vector values is divided into non-overlapping subsets, after which a separate model is built for each such subset.It is necessary to construct a function f: XY from the set   ,  , called training sample with unknown distribution P (x,y) = P (x)P(y|x).Where  , i = 1,2,...,l are feature vectors or priors [3].At the same time, there are several scales of feature measurement: numerical, ordinal, nominal.If  (dependent variable) can take only a finite number of values, i.e.,  ∈  ,  , . . .,  ,  2, then it is necessary to solve the classification problem,  is called a class label and determines the belonging of the corresponding object to one of c classes, and the feature itself is called a class label.If  (dependent variable) is measured on a numerical scale, then it is necessary to solve the problem of regression, and the feature itself is called response.Among the advantages of using this method of data analysis are the following:  decision tree algorithms are similar to human decision making;  the decision tree mechanism is simple enough to understand and does not require special knowledge;  relatively small amount of calculations and fast training;  the possibility of working with discrete data.

Analysis of the application of singular-spectral analysis of time series with the use of decision trees for forecasting the production of products, including dual-use products
To analyze the adequacy of applying the model of singular-spectral analysis of time series with the application of decision trees to forecasting the production of dual-use products, the expert forecasts of defense industry organizations by industry were compared with the forecast of artificial intelligence according to the model and the corresponding deviations were identified.
To conduct the study, a sample of 46 nomenclature items of dual-use products was made, which most fully reflect the general picture of dual-use production by defense industries, of which:  15 nomenclature items produced by aviation industry organizations, including: navigation, meteorological and geophysical equipment, systems and complexes of airborne radio electronic equipment, autopilot equipment, fire alarm systems and other high-tech products;  21 nomenclature items produced by organizations of radio electronic industry, including: radio measuring equipment, microwave electronics and related components, components of electrical machines, vacuum devices, complexes of devices for pre-flight inspection of onboard radio electronic equipment and other high-tech products;  10 nomenclature items produced by oil and gas industry organizations, including: oil and gas instrumentation products (engines, communication equipment, electromagnets, electromagnetic contactors and starters, navigation, meteorological, geophysical devices and similar instruments), related products (lighting fixtures) and other high-tech products.The inputs for the forecast were the actual dual-use output data for 2018-2022, as well as the estimated value for 2023.The forecast values were constructed for the period up to 2033.
To summarize the situation, we present a comparison of aggregated data by industry.

Aviation industry
Based on the comparison of forecast data for 15 key dual-use products of the aviation industry to which the artificial intelligence forecasting mechanism was applied, the following conclusions can be drawn (Figure 2):  linear trends of the forecast for the period up to 2033 have the same direction and do not diverge in the long-term period;  in the short-term period, the model forecast may significantly deviate from the expert forecast of defense industry organizations due to the fact that the model does not take into account the existing plans of the defense industry organization, as well as political and economic risks;  the expert forecast of the DIC organizations is more conservative compared to the forecast modeling, which may be due to intentional underestimation of values in order to achieve (exceed) the indicators in the future;  on average, the model forecast deviates by 67% upwards from the expert forecast of organizations;  the average growth rate for the period 2018-2033 according to the expert forecast of organizations was 104.2%, and according to the forecast modeling -107.3%.The divergence of 3.1 p.p. is not so significant in the fifteen-year interval of values.

Radioelectronic industry
Based on the comparison of forecast data for 21 key dual-use products of the radio electronics industry, to which the forecasting mechanism with the help of artificial intelligence was applied, the following conclusions can be drawn (Figure 3):  linear trends of forecasts in the short-term and long-term period have the same direction and do not diverge;  the expert forecast of the defense industry organizations is more optimistic  compared to forecast modeling;  on average, the model forecast deviates by 16% downward from the expert forecast of the DIC organizations;  the average growth rate for the period 2018-2033 according to the expert forecast of the DIC organizations was 105.7%, and according to the forecast modeling -103.5%.The discrepancy of 2.2 p.p. can be considered insignificant.

Oil and gas industry
Based on the comparison of forecast data for 10 key dual-use products of the oil and gas industry to which the artificial intelligence forecasting mechanism was applied, the following conclusions can be drawn (Figure 4):  in general, the linear trends of forecasts have the same direction and do not diverge over the long-term period;  in the short-term period, the model forecast may significantly deviate from the expert forecasts of the defense industry organizations, which may be due to the cyclical nature of shipbuilding production;  the expert forecast of the defense industry organizations is more optimistic compared to the forecast modeling;  on average, over the entire period, the model forecast deviates by 6% downward from the expert forecast of the defense industry organizations;  the average growth rate for the period 2018-2033 according to the expert forecast of DIC organizations was 123.8%, and according to the forecast modeling -119.8%.The discrepancy of 4 p.p. in the long-term period can be considered insignificant.It is important to note the aspect that forecasting based on singular-spectral analysis of time series using decision trees cannot be used for products that are in the phase-out stage, as the model cannot predict the suspension or termination of production, which in turn can be caused by both random risks and plans of the organization.Let us illustrate this shortcoming with an example.
Let the organization produced some product in the period from 2018 to 2022, and in 2023 the production ended (the production volume amounted to 0 rubles).Forecast modeling cannot determine the zero value as the end of production, therefore it builds a forecast with increasing values (Figure 5).In order to reduce the risks of artificial intelligence distorting the forecast when working with large data sets, the expert should perform a preliminary manual selection and exclusion of products that are being finalized.For the above analysis of 46 dual-use products in three industries, such data were excluded.

Conclusion
The study, based on the analysis of a sample of 46 key products to which the forecasting mechanism with the help of artificial intelligence was applied, allowed us to draw the following conclusions.
Forecasting based on the of singular-spectral analysis of time series with the use of decision trees shows adequate results and can be used to forecast aggregate values of dualuse products output by the organizations of the defense industry in the long-term period in the context of industries and the defense industry as a whole.
Forecasting based on the model of singular-spectral analysis of time series with the use of decision trees does not take into account possible political, economic and other risks, so expert adjustments should be made (if necessary) to the final machine forecast.
The work was carried out within the framework of the state assignment of the Ministry of Education and Science of Russia on the topic "Development of methodology of production of dual-use products by high-tech companies of Russia using elements of artificial intelligence in the conditions of digitalization of the economy and sanctions pressure" № 123011600034-3.

Fig. 1 .
Fig. 1.General scheme of operation of the decision tree.

Fig. 2 .
Fig. 2. Comparison of aggregated forecasts for dual-use products of aviation industry organizations.

Fig. 3 .
Fig. 3. Comparison of aggregated forecasts for dual-use products of radioelectronic industry organizations.

Fig. 4 .
Fig. 4. Comparison of aggregated forecasts for dual-use products of oil and gas industry organizations