Similarity analysis of the patterns of the monthly electric energy demand time series

The similarity analysis of the monthly electric energy demand time series sequence patterns are shown. The similarity-based forecasting models are allowed to be created because a strong relationship between input and output patterns exists. The chi-square test and the correlation tables were calculated for a few definitions of patterns.


Introduction
The monthly load time series for the national power systems are characterised by annual periodic variations. The annual variations are mainly related to the variable length of the day during the year and differences in air temperature depending on the season [1]. The analogies between time-series sequences with periodic variations are used successfully by the pattern similarity-based forecasting models. It is possible to deduce the course of the time series in any period based on its past behaviour [2].
The procedure for creating a forecasting model using the similarity of time-series sequences is as follows. First, the time series is divided into patterns, which usually cover a period of a few to several points. In the case studied in this paper, patterns are the points in a sequence of the monthly loads of the Polish power system (1998-2014) processed using specific functions. The pattern similarity-based forecasting methodology relies on the assumption that if the input patterns x a and x b are similar, then similar are also the output patterns y a and y b , which represent the time-series fragments following the fragments represented by the x a and x b patterns.
In literature medium-term load forecasting (MTLF) methods are categorized into two groups [4]. The first group consists of the conditional modelling approach and uses management, economic analysis and long term planning [5]. A MTLF model of this group is described in [6], where a few macroeconomic indicators, e.g. consumer price index are proceeded as inputs [5].
The second category consists of the autonomous modelling approach. This type requires a smaller set of inputs: firstly past loads and optionally e.g. weather variables. Models from this category are more suited for stable economies [5]. This group is represented by classical forecasting e.g. ARIMA or linear regression [7], and computational intelligence methods, like neural networks [8].

Pattern-based representation of time series
In case of forecasting m points in a series, the forecast fragment is denoted by Y i = {E i+1 E i+2 … E i+m }. Let us denote the preceding Y i fragment of the series with a length of n as In this paper, the following 4 different definitions of the points in sequence X i will apply [1]: The i E and D i are determined from sequence X i . They allow for the determination of the forecast of demand based on the forecast of y i returned by the forecasting model. For this purpose, the transformed formulas (5)-(8) are used. For example, if y is defined using (8), the forecast of demand is determined using the equation [1]: .
The patterns x i and y i , which represent the preceding sequence and the forecast sequence, are combined in pairs (x i , y i ). A set of these pairs for a historical time series is used to create a forecasting model (for parameter estimation, teaching).

Similarity analysis of the patterns
In order to confirm the validity of the assumption that if input patterns are similar, then output patterns are similar too, the analysis of the dependence between input patterns x and output patterns y is carried out. Similarity is measured using the Euclidean metric.
The units of the statistical population to be analysed are pairs of patterns [1] ((x i , y i ), (x j , y j )), where: i = 1, 2, ..., N; j = 1, 2, ..., N; i ≠ j; N -the number of pairs of patterns. The pattern lengths equal to N=12 were adopted. The size of the population is equal to M = N(N-1). The feature D x which determines the distance between two patterns x and the feature D y which means the distance between a pair of patterns y are defined. The distances between all pairs of patterns are the vector of pairs of realisations of the analysed random variables [2] is the distance between a pair of patterns x i and x j ; d(y i , y j ) is the distance between a pair of patterns y i and y j .
To demonstrate stochastic dependence between random variables D x and D y , the null hypothesis (H0) is formulated: the existence of differences in sizes of the population units in defined categories of values of features D x and D y is caused by the random nature of the sample [2]. The null hypothesis can be verified using the  2 test. For this purpose, a correlation table (Table 1) is created, illustrating the empirical joint distribution of features D x and D y . The critical value for 64 degrees of freedom (9 categories for features D x and D Y ) is 83.68. The  2 test values are within the critical area. This entitles to reject the null hypothesis and adopt the alternative hypothesis. The number of categories for feature D x in the table is equal to g, and for feature D y it is h. Quantiles of the order of 0, 1/g, 2/g, …,1 were adopted as category limits for D x and quantiles of the order of 0, 1/h, 2/h, …,1 were adopted as category limits for D y . High values of the Cramer contingency coefficient, V, and the Pearson correlation coefficient, ρ, prove strong relationships between D x and D y [2]: where: x D , y D are mean values of D x and D y , and S Dx , S Dy are standard deviations of these features.  Table 2 shows the  2 , V and ρ statistics values for different pattern definitions. In the pattern similarity-based methods, the nearest neighbours of input pattern x are found in the set of data and patterns y paired with them are used to construct the forecast pattern. Next, it is demonstrated that there is a statistically significant relationship between the distance of pattern x i and its k th nearest neighbour: x i*,k , k = 1, 2, ..., K and the distance between patterns y i and y i*,k paired with them. In this work K=5 and it is the number of the last k th nearest neighbour. The units of the statistical population to be analysed are pairs of points [1] ((x i , y i ), (x i*,k , y i*,k )), where: i=1,2, …, n; k=1, 2, …, K; x i*,k y i*,k -the k th nearest neighbour of pattern x i and the pattern y i paired with it. The population size is n*K. The distances between all pairs of patterns determined by expression (14) are the vector of pairs of realisations of the analysed random variables [2] [(d(x i , x i*,k ), d(y i , y i*,k ))] = = [(d(x 1 , x 1*,1 ), d(y 1 , y 1*,1 )) (d(x 1 , x 1*,2 ), d(y 1 , y 1*,2 )) ...
(15) …(d(x n , x n*,K ), d(y n , y n*,K ))], where: d(x i , x i*,k ) is the distance between a pair of patterns x i and its nearest k th neighbour x i*,k , and d(y i , y i*,k ) is the distance between a pair of patterns y i and d(y i , y i*,k ). To demonstrate stochastic dependence between random variables D x and D y , the null hypothesis (H0), identical to that for the population determined by expression (10), is formulated.  Table 3 shows the  2 , V and ρ statistics values for different pattern definitions and K =5. Low values of ρ arise from a large dispersion of values among the intervals in the correlation table. They are not concentrated around diagonals. This results from the fact that all input patterns are collected and there are only 5 output patterns for each input pattern. Therefore, the maximum distance D y is shorter and the dispersion of patterns y within these intervals is higher.
Associated with the nearest neighbour of input pattern x i , the forecast pattern y i*, 1 may not be the nearest neighbour of the forecast pattern y i . The ratio of the distance between patterns y i and y i*,k (pattern associated with the k th successive neighbour of pattern x i ) and the distance between patterns y i and y i',k (pattern associated with the k th successive neighbour of pattern y i ) is calculated [2] 1 ) , ( The best accuracy of the forecasting model is achieved when patterns y i',k , y i*,k are identical, i.e. w d takes the value 0. If these patterns are very similar, it means a strong link between data and good quality of the model. However, if they vary considerably, it means a weak link between random variables and poor quality of the model. In Figure 1 relationship between distances D x and D y are shown. It is drown in two variants: depending on y * (red colour) and y ' (blue colour). As can be seen in the Figure 1, the distances D x and D y are very similar. In Figure 2, the monthly loads of the Polish power system (1998-2014) are visualised. For the analysed time series, these patterns are characterised by a high degree of similarity. Figure 3 shows mean values of w d from successive neighbours of each pattern. It can be observed that these values are highest for the first neighbours. The values are lower than 0 in the vicinity of the 15 th neighbour. It means that these are the nearer neighbours of pattern y. This suggests the use of models that consider several neighbours of pattern x to improve the accuracy of forecasting.

Decomposition of series
Decomposition of a time series can be described using the formula [3]: where f() represents the function, s t is a seasonal component, m t is a long-term trend and z t is a random disturbance. There are two most popular decomposition models [3]: additive decomposition If the variance around the trend or the size of seasonal variations does not change with the series level, an additive decomposition should be used. In the event when the variance or the amplitude of seasonal variations is proportional to the series level, a multiplicative decomposition will be more suitable.
To isolate the components of a series (trend, seasonal and random variations), different methods are used. Two groups of methods can be distinguished: -parametric methods, in which the model for regular components within a series is adopted (e.g. linear or square model). In this case, the estimators (estimates) need to be found for unknown parameters, -non-parametric methods, in which the trend is not determined in the form of an analytical formula. The advantage of this group is a greater flexibility. A popular trend determination method is the use of the moving average, which is defined as follows [3]: where q is the parameter (an order of the moving average) used to control the data smoothness degree [3]. The graph illustrating smoothing using the moving average for the analysed time series is shown in Figure 4. Three types of moving average was shown. As can be seen, higher values of the q parameter give more smoothness degree of the moving average function.
If we want to assign different weights to the points of the series depending on the time distance from the moment t, we use the weighted moving average, which takes the following form [3]: where w -q +w -q+1 +⋯+w q-1 +w q =1 and w j =w -j (for symmetric moving average).  As a result of the use of one of the decomposition methods, the estimates are obtained for individual components of the series, i.e. trend and seasonality ( t ŝ and t m ). It is possible to remove them and determine a random series of residuals (in case of additive decomposition) [3] .
If multiplicative relationships can be observed in the analysed series (the amplitude of seasonal variations or variance is proportional to the level of data), the conversion of (19) is used instead of (22) . ) ( It is necessary to eliminate the trend and seasonality to match the stationary model [3].    Figure 8 shows the annual variations within the series. A considerable similarity of the values in individual months over the subsequent years can be observed in the graphs, taking into account the rising trend, which shows a strong upward tendency in the summer months. Strong annual cycles with a higher demand in the winter months are observed. It is worth emphasising the reduction in variance of the series in the final period where the annual cycles have a lower amplitude.

Summary
The analysis carried out in this paper for the Polish power system monthly load time series confirms that there is a strong relationship between the similarity of patterns x and the similarity of patterns y paired with them and that it does not result from the random nature of the sample. Such a conclusion for the specific time series justifies the sense of , 0 2019) E3S Web of Conferences https://doi.org/10.1051/e3sconf /20198401008 10 84 0 PE 2018 8 ( constructing and applying the forecasting models using the similarity of patterns of seasonal cycles. In the further work, it is planned to examine the similarity of patterns for the monthly load time series for power systems in other European countries.
In a few articles [5,[9][10][11][12] authors demonstrate the effectiveness of a similarity-based approach on real-world data. Comparing with commonly used methods, e.g. ARIMA and exponential smoothing, the similarity-based models results achieve close errors on average [10]. For more regular time series with lower noise component and stable relationship between input and output patterns better performance of this kind of models is observed. The factors which decrease this stability are heteroscedasticity of time series and the nonlinear trend [10].