Very short-term forecasting of power demand of big dynamics objects

. The issue of very short-term forecasting is gaining more and more importance. It covers both the subject of power demand forecasting and forecasting of power generated in renewable energy sources. In particular, for the reason of necessity of ensuring reliable electricity supplies to consumers, it is very important in small energy micro-systems, which are commonly called microgrids. Statistical analysis of data for a sample big dynamics low voltage object will be presented in this paper. The object, in paper author’s opinion, belongs to a class of objects with difficulties in forecasting, in case of very short-term horizon. Moreover, forecasting methods, which can be applied to this type of forecasts, will be shortly characterized. Then results of sample very short-term ex post forecasts of power demand provided by several selected forecasting methods will be presented, as well as some qualitative analysis of obtained forecasts will be carried out. At the end of the paper observations and conclusions concerning analyzed subject, i.e. very short-term forecasting of power demand of big dynamics objects, will be presented. set used for calculations in this paper have been collected during the realization of the project being a part of the ERA-Net Smart Grid Plus initiative, having a support from the European Union's Horizon 2020 research and innovation programme.


Introduction
The issue of very short-term power demand forecasting is gaining more and more importance. In particular, for the reason of necessity of ensuring reliable electricity supplies to consumers, it is very important in small energy micro-systems, which are commonly called microgrids.
The problem of very short-term forecasting of the power demands has been discussed in several publications, e.g. [1][2][3][4][5][6]. Very comprehensive overview of forecasting methods, taking into account ultra-and short-term time horizon has been included in [7]. In the overview among others some papers dealing with the problems of time horizons being considered, as well as different areas and location types (small city, microgrid, smart building) have been presented. Mainly different kinds of neural networks were used in the papers for forecasting, such as Multi-layer Perceptron (MLP), Support Vector Machine (SVM), Support Vector Regression (SVR) and Self-Organizing Map (SOM).
In the publications being analyzed, the objects of big dynamics in the power demand level are very rarely the subject of very short-term forecasting activities. This is the clue and the key aspect of this paper. Very short-term forecasts will be related to the component of microgrid [6] of significant dynamics in the level of received power. One of the main technological components of sewage treatment plant plays the role of an object showing considerable dynamics in the level of power demand. Some controllable and noncontrollable loads, together with some generating units are also the part of the microgrid.
In the further part of the paper the results of sample very short-term forecasting processes for the presented component of microgrid, obtained with the use of the following methods (models): naive model, weighted moving average models, auto-regression models, multiple linear regression models, MLP type neural networks and Radial Basis Function type neural networks will be given. The short description of those methods is presented below. The naive method. In this method it is most often assumed, that the value of the variable being explained (forecasted) by the model in time t will be the same as in time t -1 [8]. In most cases naive method and its results can serve as a reference point in the process of evaluation of other forecasting methods. Weighted moving average models. The approach making use of these models is based on the following equations set [8]: where: y t * -value of the forecast for the variable determined for time t, y t-1 -actual value of the variable for which forecasting takes place in time i, k -smoothing factor, w i-t+k+1weight of the value of the variable being forecast in time i. Auto-regression model (AR). This model bases on the assumption, that the subsequent values of the variable being forecast are correlated, that is [8]: where: y t * -value of the forecast for the variable determined for time t; y t-1 , y t-2 ,…,y t-pvalues of the variable being forecast in time t -1, t -2,…,t -p; φ 0 , φ 1 , φ 2 ,…,φ p -parameters of the model; p -delay; e t -error in time t. Multiple linear regression (MLR). In the methods basing on MLR for the purpose of determination of the values of explained variable different kinds of explanatory variables are used, including the past values of both the explained variable and explanatory variables where: y t -explained variable in time t; x 1t , x 2t , x mt -explanatory variables in time t; α 0 , α 1 , α 2 , α n+m -parameters of the model, n+m -the number of explanatory variables, ε t -error in time t. Multi-layer Perceptron. It is a unidirectional neural network (trained with supervision) consisting of [9]: input layer, several hidden layers (usually not more than two) and output layer. During the our research neurons in the hidden layer had a non-linear (hyperbolic tangent) activation function, while neurons in the output layer had a linear function. RBF type network. It is also a unidirectional neural network consisting of input layer, hidden (radial) layer and output layer [9]. Neurons in the hidden layer do not have any weights assigned, but radial base functions instead (during the our research Gaussian type functions were used). Neurons in the output layer had a linear activation function.
The parameters (weights) of individual models were selected using optimization methods (BFGS algorithm for MLP networks, RBFT algorithm for RBF networks and Newton's algorithm for other models for which the values of parameters need to be chosen).

Statistical analysis of data
Statistical analysis is based on data from one day from December and one day from June (the only available data). The daily time series includes 8640 periods of 10 seconds. In the data from the December day there is a lack of 169 data. Missing data from the December day was supplemented with information about values neighboring with missing ones. In the data from the June day there is a lack of 578 data. Missing data occur in quite random places of the time series. In a few cases it is the lack of a longer fragment of the time series (several minutes). Data from the December day and the June day were jointly normalized for anonymisation to relative units (1 relative unit is equal to an average value from time series). Table 1 shows selected statistical measures of two mentioned time series of the power demand. The time series of the December day has characteristics significantly different from the June day. The time series from the June day has a smaller mean value but a greater variance (over 25%) than the time series from the December day. Probably forecast errors for the June day will be higher than for the December day, assuming that the dominant component is the random component. The minimum values from both time series are almost identical but the maximum value of the power demand is higher for the December day. The Kolmogorov-Smirnov and Lilliefors tests show that the both time series of the power demands do not have a normal distribution. Figure 1 shows the percentage distribution of cases in both time series depending on the power demand.    Figure 3 shows in turn two parts of the day from December (period from 3:30 to 4:00 am and the period from 9:30 to 10:00 am). Big changes dynamics (rapid jumps) in the power demand is clearly visible.
For both time series (December day and June day), the occurrence of periods of different length with a very similar power demand is characteristic. These are values from very low to medium power demand (up to 1 p.u. for the June day) and large power demand (up to 2 p.u for the December day). These periods are usually divided by shorter periods of much bigger power demand (up to 2 orders of magnitude larger).  The seasonality problem of the time series of the power demand was not analyzed due to the lack of more amount of data. For the December day, it was examined whether there was daily variation in the power demand in 1-hour periods (see Fig. 4). For each hour the average 10-second power demand was calculated. A polynomial illustrating the best daily variation trend was selected. The biggest value of the determination coefficient R 2 equal to 0.2671 was obtained for the polynomial degree 5 (polynomials from degree 2 to 6 were analyzed). The value of the determination coefficient is very low. The function of polynomial degree 5 explains only less than 27% of the variability of the explanatory variable. The polynomial shape shows that probably (the analysis concerned the only one day) the energy consumer has 2 periods of increased power demand during the day. The first peak is from 6 am to 14 pm. The second peak is from 22 pm to 24 pm. The power  For the December day, the partial autocorrelation coefficient is statistically significant (5% significance level) only for the first three values and several single values distant from each other (see Fig. 5). Very big values of the partial autocorrelation coefficient for the first three periods back (in turn: 0.9464, 0.9114 and 0.8999) and quite big negative value equal to -0.3590 for 33 periods back, i.e. 5 minutes and 30 seconds, is noteworthy.
The Table 2 shows for the December day the values of Pearson linear correlation coefficients between 10-second power demand and considered explanatory variables All correlation coefficients are statistically significant (5% level of significance). It is noteworthy that the sum of the power demand values from the period t-1 with the weight 0.75 and the period t-2 with the weight 0.25 (AWE) has maximum correlation coefficient from all potentially explanatory variables. Moreover the 24-hour profile of the variation of power demand (FUNC) seems to be definitely much better explanatory variable than the hour of power demand (HOU).

Comparative analysis of very short-term forecasting methods for power demand of big dynamics objects
Comparative analysis was performed using data from the December day. All data were divided into sets for parameters (weights) estimation of a given model (80% of randomly selected data due to ensuring maximum representativeness of the process) and data for testing the quality of "ex post" forecasts (20% of randomly selected data).
In order to have a broader, multi-aspect view of the quality of individual forecasting models, four measures of "ex-post" forecasts quality were used: MAPE error, SOS error, maximum percentage error and Pearson's linear correlation coefficient.
In the case of MLP networks, the SOS error was minimized, while in other models the minimization of the SOS error, MAPE and the maximization of the linear correlation coefficient were tested.
The selection of the most favorable explanatory variables for the MLP network was made by using the "top-down" method. Undesirable variables (not providing additional information to the model) were eliminated stepwise. The selection of explanatory variables for elimination is supported by the sensitivity analysis of neural networks. In the case of multiple regression models, information about the most advantageous sets of explanatory variables previously found for the MLP network was used. Table 3 shows characteristic of all tested prognostic models. The description of neural network models (MLP/RBF) is as follows: number of neural network inputs, number of neurons in the hidden layer and number of outputs. For the models of neural networks, only the models with the most favorable structure are given in the Table 3 (the number of neurons hidden in a fairly wide range was tested). In addition, for neural network models, the validity of each explanatory variable calculated on the basis of the sensitivity analysis is given in the descriptive field (column) of the given explanatory variable. Validity (number) is the quotient of the neural network error for which the given variable has been replaced in all sets by its average value by the neural network error calculated for the whole given set of explanatory variables. In Table 3 the explanatory variables probably undesirable in a given model are marked in italics. The most important explanatory variable in a given model is marked as bold.
In the case of the naive model (NAIVE), multiple regression models (M.REGR), autoregression models (AR) and moving average weighted models (MOV.A), the values of parameters of this models assigned to each explanatory variables are given in the descriptive fields (columns) of explanatory variables. Variables with codes FUNC, POW1, POW2, POW3, POW4 and POW5 create the proper set of explanatory variables for MLP model. The set using additionally variables with AWE, AVE3 and ACE2 codes results in a similar quality of forecasts. The use of both described sets of explanatory variables for multiple regression models has also proved to be the most advantageous. Definitely the most important explanatory variable in almost all models is the power value from the t-1 period (POW1). Table 3. Characteristic of all tested prognostic models. Source: Own elaboration.  Table 4 presents the results of "ex post" forecasts of the tested forecasting models. Significantly worse results in a given error category are marked in italics. While significantly better results in a given error category are marked as bold. Figure 6 shows the chosen forecast results for 71 10-second periods from hour 3:33.
The obtained forecast errors of the tested models are large (the smallest MAPE error was slightly above 107%), but this is a normal situation for a process with such a large and frequent changes dynamics and a large random component. However, compared to the simplest and the worst naive model, the improvement in the quality of forecasts of the best models is very significant (more than doubling the size of the MAPE error and the SOS error).
It is not possible to unambiguously choose the best model due to the use of four measures of forecast quality with slightly different properties.
The most favorable forecast quality measures have MLP models (the smallest maximum percentage errors, the smallest SOS errors and the largest correlation coefficients). MAPE errors of MLP models (overestimation of power demand values especially for very small values) are significantly bigger than in some other models, e.g. moving average weighted models or autoregressive models. MLP models have the worst forecasts by very low power demand (MAPE errors typically above 200%) (see Fig.6). This is due to the fact that the SOS error is minimized by the neural network during the learning. The difference between the two very small values of power demand (actual value and forecast) is a very small number, however the MLP models tries to minimize the large SOS errors (from its point of view) occurring for much bigger power demand values (see Fig.6). Multiple regression models and autoregressive models in which MAPE error is minimized have much smaller MAPE errors of forecasts than MLP models. However, in these cases the SOS errors of forecasts of these models are significantly worse than in the MLP models. A very characteristic phenomenon for multiple regression models and autoregressive models are very accurate forecasts of very small power demand values, but this is done at the expense of the forecasts accuracy for bigger power demand values (see Fig.6). On the other hand, what is surprising, the models of the moving average weighed despite the use of the correlation coefficient as the maximized error function have MAPE errors of forecasts quite much smaller than in the MLP models.  It is also worth adding that in the autoregression models, the addition of a constant significantly worsened the forecast results. In moving average weighed models, the increase of k (number of back periods in model) reduced the SOS error of forecasts but simultaneously increased the MAPE error of forecasts. An unexpected phenomenon is also fact that in all models of the moving average weighted after the estimation of the model parameters, the parameter related to the value of the power demand from the period t-2 is equal to almost zero.
It seems that from the point of view of the purposes of the forecasts it is more important to obtain greater accuracy of forecasts for the value of power demand other than very small values. Therefore, the measure of the SOS error of forecasts should be considered as a priority in our opinion.
However, if we treat SOS and MAPE errors of forecasts as equivalent numbers (without units) then the most advantageous models of the same quality are: the multiple regression model with the code M.REGR6 and the autoregression model having the code AR(5)3, thus quite simple construction models. In both models, however, the MAPE error is minimized. It is worth adding, which is very characteristic that in the case of both models the most important was the parameter assigned to the value of power demand from period t-1. Information on the power demand from periods from t-2 to t-5 has almost zero importance for these models, which makes these models very similar to the naive model, which has however significantly bigger forecast errors.
Noteworthy is also a very simple moving average weighed model (k = 3) due to the relatively small forecast errors of both MAPE and SOS (the sum of both errors as numbers is almost as small as for the models M.REGR6 and AR (5)3).
None of the forecasting models was able to effectively perform the forecasts for single 10-second very large increases in the power demand -the phenomenon is probably entirely random.
Definitely the worst model is the RBF model. In practice, this model is entirely not suitable for this type of forecasts.

Conclusions
The issue of very short-term power demand forecasting of big dynamics object was the subject of this paper. First detailed statistical analysis of available measurement data has been carried out. Then results of sample forecasts obtained by means of 6 selected methods (models) have been presented and compared.
Four measures of "ex-post" forecasts quality were used: MAPE error, SOS error, maximum percentage error and Pearson's linear correlation coefficient. The most important explanatory variable in almost all tested models was the value of power level from the t-1 period of time.
It is not possible to unambiguously choose the best model due to the use of these four measures of forecast quality with slightly different properties.
The most favorable values of forecast quality measures can be observed for MLP models (the smallest maximum percentage errors, the smallest SOS errors and the largest correlation coefficients). MAPE errors of MLP models are significantly bigger than in some other models. Multiple regression models and autoregressive models have much smaller MAPE errors of forecasts than MLP models. However, in these cases the SOS errors of forecasts of these models are significantly worse than in the MLP models. The models of the weighted moving average have MAPE errors of forecasts much smaller than in the MLP models.
The obtained forecast errors of the tested models are large (the smallest MAPE error was slightly above 107%). However, compared to the simplest naive model, the improvement in the quality of forecasts of the best models is very significant.
If we treat SOS and MAPE errors of forecasts as equivalent numbers (without units) then the most advantageous models of the same quality are: the multiple regression model with the code M.REGR6 and the auto-regression model having the code AR(5)3. In the case of both models the most important was the parameter assigned to the value of power demand from period t-1. A very simple weighted moving average model (k = 3) results in relatively small forecast errors of both MAPE and SOS. Definitely the worst model is the RBF model.
None of the forecasting models was able to effectively perform the forecasts for single 10-second long very large increases in the power demand.
Data set used for calculations in this paper have been collected during the realization of the scientific project being a part of the ERA-Net Smart Grid Plus initiative, having a support from the European Union's Horizon 2020 research and innovation programme.