Power System Load Forecasting Method Based on Recurrent Neural Network

Power system load forecasting plays an important role in the power dispatching operation. The development of the electricity market and the increasing integration of distributed generators have increased the complexity of power consumption model and put forward higher requirements for the accuracy and stability of load forecasting. A load forecasting method based on long-short term memory (LSTM) is proposed. This method uses deep recurrent neural network from the artificial intelligence field to establish a load forecasting model. Using the LSTM network to memorize the long-term dependence of the sequence data, the intrinsic variation of the load itself is identified from both the horizontal and vertical dimensions within a longer historical time period, while considering various influencing factors. Actual load data is used to verify the forecasting performance of different historical date windows and different network architectures.


Introduction
Power system load forecasting can estimate the power demand for a period in the future. The future unit commitment, generation plan, tie line exchange plan and organization of the electric power transactions can be arranged according to the load forecast result. Therefore, load forecasting is the basis to realize the safe, reliable and economical operation of power system, and it is also an important guarantee to improve the utilization ratio of power generation equipment and the efficiency of economic dispatch. Accurate load forecasting has great significance to optimal unit commitment, economic dispatch, optimal power flow and electricity market transaction, etc [1].
Power system load is affected by economic factors, weather, season, date, and other factors, so load variation has random components. With the development of power system, the types of user load and influencing factors are increasing. In the electric power market environment, the complexity of electricity consumption mode is increased by the users' behaviour, the response of users to the incentive policies and the urban development situation. The introduction of large-scale distributed generators and the widespread use of electric vehicles have increased the load fluctuation [2]. Peak-valley time-of-use price through its own economic leverage, urges users to constantly change the way they use electricity [3], and stimulates and encourages electric vehicles to change their charge-discharge behaviour [4].
At present, load forecasting methods are mainly divided into two categories: statistical methods and artificial neural network (ANN) based methods. Statistical methods use time series prediction to forecast [5], mainly including multiple linear regression (MLR), auto regression (AR) and autoregressive moving average (ARMA). The models of such methods are simple, but only a small number of influencing factors and sample data can be processed, and the stability of the original time series is highly demanded, because these methods are based on the assumption that the historical load, load influencing factors and future load are linearly related. In fact, the influencing factors and the load changes have strong non-linear characteristics. Artificial intelligence based methods include artificial neural network (ANN), support vector machine, fuzzy logic and other modern forecasting methods [6,7,8]. Artificial neural network has been successfully applied in speech recognition, natural language processing, computer vision and other fields [9,10,11]. Many researchers have tried to apply this method to load forecasting and new energy power forecasting and achieved good results [12,13,14]. ANN simulates brain mechanisms, learns implicit rules from data and forecasts the future data. ANN has a strong nonlinear modelling ability and can learn the nonlinear characteristics of the load from the historical load data. ANN includes a variety of different network architectures. At present, feedforward neural network (FNN) is widely used in power system load forecasting [14]. But FNN can't consider the dependence between the training samples, can't model the sequence data, and can't establish the load forecasting model from a longer historical time scale.
In this paper, an improved recurrent neural network (RNN) of long-short term memory unit is used to establish a load forecasting model. Using the deep recurrent neural network's characteristics of remembering historical information, based on many historical load data, the inherent rules of load variation can be identified from a longer historical time range, and the complex nonlinear relationship between the load influencing factors and the load can be learned. Actual load data are used to verify the model. The experimental results show that the method can identify the load variation rules to a certain extent and make a more accurate forecast of the future load.

Ordinary recurrent neural network
Recurrent neural network (RNN) is a kind of neural network which can process sequence data [15]. The network architecture is shown in Figure 1. Compared with FNN, each neuron in RNN hidden layer can be recycled through self-link. Given the input data , RNN uses formula (1) and formula (2) to calculate hidden state and output . (1) (2) , , represent input weight, hidden weight and output weight, respectively. , represent hidden layer activation function and output layer activation function, respectively. RNN uses hidden state to remember all previous input information. The output at the current time is not only affected by the current input , but also by the hidden state at the previous time. RNN uses back-propagation through time to train model [12]. In the training process, RNN has difficulty in learning long-distance dependence because of the disappearance of gradient.

Long-short term memory
Long-Short Term Memory (LSTM) is an improvement on ordinary RNN. The long-term dependence problem, which makes the model not suitable on learning sequence data because of gradient disappearance, can be solved through introducing memory cell into each neuron in hidden layer, and using forget gate, input gate and output gate to control the state of memory cell [15]. The architecture of LSTM neuron is shown in Figure 2: Memory cell and the hidden state can memorize the historical information of the sequence data together. The information in the memory cell is controlled by three gate units.
Forget gate deletes the information in the memory cell according to the hidden state of the previous moment and the input of the current moment . Forget gate is calculated as follows: (3) Input gate adds information to the memory cell according to the hidden state of the previous moment and the input of the current moment .The functions are shown in formula (4) and formula (5). represents the information that needs to be memorized; represents the candidate memory cell to update next memory cell.
(4) (5) After calculating the forget gate and the output gate, formula (6) is used to update the memory cell.
(6) Output gate decides the current hidden state according to the hidden state of the previous moment , the input of the current moment and updated memory cell , as shown in formula (7) and formula (8).

Basic theory
Power system load is periodic because of the influence of people's production and living patterns, and it also has great uncertainty because of the influence of random factors such as weather changes, major events and so on. The load at a certain time of a day is affected by the type of the day (days of the week, whether it is a holiday or not) and the current weather factors (temperature, CPEEE 2020 humidity, etc.). It is also related to the load at the past time of the day and the load at the same time of the past few days, where t is the moment, d is the date, and w is the historical date window. LSTM has the advantage of learning the rules of sequence data from a long-time range. It can recognize the load variation regularity of the forecasting day horizontally, identify the load variation regularity at the same time in the historical date window vertically, and consider the influence of related factors on the load of the forecasting day.

Forecasting model
Based on LSTM network, the forecast day load curve is forecasted using the load curves of w days before forecast day and the influencing factors of forecast day. The LSTM network structure is shown in Figure 3.
is the forecasted load for forest day, and T is called step size, T=24|96|288|1440.  (10) represents the load at time T of days before forecast day. is the historical date window.
(11) It includes load influencing factors, such as date types, holidays, weather and so on.
The input of the load forecasting model at time T is and the output is .

Model training
Model training includes three stages: data pre-processing, model training and model evaluation.

Data preprocessing
Data preprocessing mainly includes two steps: Vectorization and normalization. Neural network is based on linear algebra theory, and cannot be trained directly on the original data. So it is necessary to convert the original data into vector before training. Data vectorization concatenates the load data in the history date window with the influence factor data of the forecast day , and transforms the concatenated data into vector.
The neural network is trained by the back-propagation algorithm based on gradient descent. If the data is too large or too small, it will be difficult to find the optimal solution in the training process. Therefore, normalization of data into a standard interval is conducive to the solution of the model. The min-max scaling normalization is used to normalize each element of the vector into the interval [0-1], as shown in formula 12 [16]: (12) where is the maximum value, is the minimum value, is the original value and is the normalized value.

Model training.
Back-propagation through time (BPTT) [17,18] algorithm is used to train the LSTM network for load forecasting. The training objective is to adjust the network parameters so that the network output is as close to the real values as possible. The training process is as follows: 1) Initialize network parameters.
2) Input the training data to the network and calculate the network output.
3) Calculate the difference between the network forecast value and the actual value, and use loss function to measure the error. 4) According to the error, calculate the gradients of network parameters over loss functions in two directions of network level and time. 5) Adjust network parameters according to the gradient. 6) Repeat steps 2-5.
Steps 2-3 constitute the forward reasoning process, and steps 4-5 constitute the backward propagation process.
The loss functions include mean square error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and so on. In this paper, MSE is used as the loss function, which is defined as follows [18]: (13) where N represents total number of training samples, represents forecasted load, and represents actual load.

Model evaluation.
In the process of model training, all data are used to input into the network to adjust its parameters, which is called an epoch. If the number of epoch is too small the network will be under-fitting; if the number of epoch is too large the network will be over-fitting. In order to determine the appropriate epoch and select the optimal model, the model should be evaluated during each epoch. The following steps are used to determine the optimal epoch. 1) Divide historical load data into training set, validation set and test set.
2) Input the training set into the network for training. After an epoch, the validation set is put into the network model, and the model is evaluated by formula (13).
3) If the performance of the current model exceeds those of all the previous epochs, keep the model as the optimal model. 4) Repeat steps 2-3 until the exit condition is satisfied. (The performance of continuous N epochs can't be improved any more.) 5) Output the optimal model and use test set to test the model.

Experimental data
The hourly historical system load data of Electric Reliability Council of Texas (ERCOT) control area from 2003 to 2016 are used to validate the proposed model [18]. Data from 2003 to 2014 are used as training set, data from 2015 as validation set and data from 2016 as test set. Validation set is used to verify the optimal number of training epoch, the optimal network structure and parameters. Test set is used to test the forecast accuracy of the model. While the historical weather information is not available, the weather factors are not taken into account in the model training process. Only the type of date (working day, the days of week), the same time load in the historical window period and maximum and minimum daily load in the historical window period are used.

Evaluation indices
The common model evaluation indices include mean square error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE). The definition of MSE is shown in formula (13). MAE [18] and MAPE [1] are defined as in formula (14) and (15).
N represents total number of training samples, represents forecasted load, and represents actual load.

Network structure
In LSTM network, the number of hidden layers and the number of LSTM neurons in each hidden layer affect the accuracy of load forecasting. Using the hourly historical load data of 2015 as validation set, with fixed historical date window , the number of hidden layer neurons is selected layer by layer through enumeration method [1] to determine the optimal network structure. First, the optimal number of hidden neurons in the first layer is determined and fixed; then a hidden layer is added to determine the optimal number of hidden neurons in the second layer. The process is continued until the forecast accuracy is no longer improved.
The forecasting performance under different network structures is shown in Table 1. When the number of hidden neurons in each layer is selected, options are 5-40 with interval 5 and 8 levels. The number of hidden layers is set to 1, 2 and 3 in turns. As shown in Table 1, when the hidden layer is 1, the number of hidden neurons in the layer is 5, the minimum MAPE is 4.6%; when the hidden layer is 2, the number of hidden neurons in each layer is 20, the minimum MAPE is 4.72%; when the hidden layer is 3, the number of hidden neurons in each layer is 10, the minimum MAPE is 4.58%. Therefore, when , the number of hidden layer is 2 and the number of hidden neutrons in each layer is 10, the forecasting performance of the network is better.

Historical date window
Set the network layer as 1, 2, 3 and the number of hidden cells as 5, 20, 10, respectively. The historical window takes 1-60 days. The hourly historical load data of 2015 are used as validation set.  Figure 4 shows the curves of MAPE changing with under different network structures. As shown in Figure 4, with the increase of , MAPE of different network structures are almost the same. There is no obvious downward or upward trend, but the change of MAPE becomes unstable when increases. Especially when the network has 1 layer and there are 5 neurons in the layer, the larger causes MAPE to change violently. Under three network structures, when the value of is in [7,30], MAPE is relatively low and stable; when the hidden layer is 1, the number of hidden neurons in the layer is 5 and is 23, the forecasting performance is the best and MAPE is 4.4%.

Result analysis
In order to verify the validity and scientificity of the proposed method, the optimal forecasting model (hidden lay is 1, the number of neurons is 5 and is 23) and the test set (hourly load of year 2016) are used to test the forecasting performance of the model.  The forecast results are shown in Figure 5, and the daily MAPE is shown in Figure 6, with average MAPE 4.45%.    Figure 8 show the comparison between actual load and load forecast in 2016, January and July, respectively. The analysis shows that system load changes in January have great randomness, because the model does not consider weather, holidays (except Saturday and Sunday), special events and other factors. So the forecasting accuracy is low and only based on load itself, and MAPE is 6.66%. The forecasting accuracy of July is higher except the daily peak load and valley load, and MAPE is 3.51%.
According to the forecasting results, the forecasting model used in this paper can learn the variation of the historical load and accurately forecast the future load.

Conclusion
In this paper, recurrent neural network (RNN), from the field of artificial intelligence, is improved with long-short term memory (LSTM) unit to build load forecasting model, taking the advantage that LSTM can learn the long-term dependence of sequence data to identify the variation rules of load itself. The actual load data are used to verify the model and the influence on forecasting performance from different neural network structures and different historical time windows. The experimental results show that the LSMT network can forecast the future load accurately.
In the LSTM network training, the influence of historical weather factors, major events and other random factors on the load is not considered. The next step is to consider more affecting factors to validate the model and further improve the forecasting accuracy of the model.
Artificial intelligence technology, represented by deep neural networks, has made remarkable achievements in many fields. In the future, recurrent neural network (RNN) will be applied to more business areas related to power system dispatching operation, such as wind and photovoltaic power forecasting, voice dispatching log, dispatching robot and so on, to improve the intelligent level of power system dispatching operation.