Short-term Load Forecasting Model Considering Meteorological Factors

: Because of the limitation of basic data and processing methods, the traditional load characteristic analysis method can not achieve user-level refined prediction. This paper builds a user-level short-term load forecasting model based on algorithms such as decision trees and neural networks in big data technology. Firstly, based on the grey relational analysis method, the influence of meteorological factors on load characteristics is quantitatively analyzed. The key factors are selected as input vectors of decision tree algorithm. This paper builds a category label for each daily load curve after clustering the user's historical load data. The decision tree algorithm is used to establish classification rules and classify the days to be predicted. Finally, Elman neural network is used to predict the short-term load of a user, and the validity of the model is verified.


Introduction
Load forecasting is the basis of power system planning, operation, demand side management and other work. Due to the limitations of basic data and processing methods, traditional load forecasting methods rarely achieve user-level refined prediction. Based on the large data technology, it is important to establish the relationship between meteorological factors and load forecasting, excavate the inherent law of load characteristics, and construct an accurate short-term load forecasting model for the economic operation and planning of power grid.
Typical load forecasting methods include time series [1][2] , support vector machine [3] , neural network [4] , and combination forecasting [5] . Literature [6] Short-term load forecasting is realized by establishing autoregressive moving average model (ARMA model), and the stability of power load is analyzed by nonparametric test method. Literature [7] reduces the amount of data in the sample by looking for historical short-term loads similar to the meteorological characteristics of the day to be predicted. The fuzzy classification is used to fuzzyly assign meteorological factors that have a great influence on the electrical load, so as to determine the historical load data required for the prediction. The SVM is used to predict the grid load in Shanxi in the short term. Compared with the BP neural network, the accuracy of the prediction is significantly improved. Document [8] combines chaotic characteristics of load and least squares support vector machine to select the nearest neighbouring points of the prediction point as the training sample data of least squares support vector machine, the prediction accuracy is relatively high. Some researchers [9] - [12] shorten the training time and improve the accuracy of the prediction model by improving the neural network prediction model.
In this paper, a refined short-term load forecasting model considering meteorological factors is established by means of clustering analysis, grey correlation analysis, CART decision tree and Elman neural network algorithm. Based on grey relational analysis, key meteorological factors are selected as input vectors of decision tree algorithm. Cluster analysis is used to establish classification labels for load curves, and classification rules are established through CART decision tree. Finally, Elman neural network is used for short-term load forecasting to realize fine load forecasting for a user.

Selection of influencing factors of load characteristics
Grey relational degree analysis is a method to evaluate the selected objects after considering various factors comprehensively. The relational relationship is determined by comparing the similarity between reference sequence and comparison sequence. The more similar the geometric shape of reference sequence is to that of comparison sequence, the greater the correlation degree obtained by grey correlation analysis. Based on the results of grey correlation analysis, the main factors affecting load characteristics can be selected.

Determining comparison sequence and reference sequenc
The first step is to determine the reference sequence (systematic feature sequence) and comparison sequence (influence factor sequence) that need to be used in the quantitative analysis process. Let ( ) In the above formula, n denotes the number of impact factors that need to be evaluated in the whole evaluation process.

Normalizing Sequence Matrix
The numerical values of different sequences are quite different, and the dimensions of different types of data are not comparable, which will affect the accuracy of the final analysis results, so it is necessary to standardize the sequence. In this section, the maximum value is used to normalize the data set. The calculation formula is as follows: The normalized reference sequence (system feature sequence) and the comparative sequence (influencing factor sequence) are respectively:

Generating difference matrix
Considering reference sequence and comparison sequence as points in multidimensional space, we can study the relationship between various factors and system characteristics in a specific t-dimensional space. The calculation formula of difference sequence is as follows: Then the difference matrix Δ is:

Computing the correlation matrix between the comparison sequence and the reference sequence
The correlation coefficient reflects the correlation between the characteristic sequence of the system and the sequence of influencing factors at different time points. The calculation formula is as follows: In the above formula, is called resolution coefficient. The smaller the resolution, the stronger the resolution. Then the correlation coefficient matrix is obtained as follows: The results of grey correlation analysis are obtained.
In order to calculate the correlation degree between the system characteristic sequence and the subsequence of each influencing factor, the correlation coefficients at each time point are weighted and summated, as shown in the following formula: In the formula above, j L is the correlation degree between the feature sequence of the system and the j-th influencing factor and ( ) i ω is the weight of the correlation coefficient at the i-th time point are considered.

Analysis of the effect of meteorological factors on load characteristics
Taking a certain user as the research object, the daily load characteristics of the user's historical power consumption are analyzed, including daily maximum load, daily minimum load, daily peak-valley difference, daily peak-valley difference, daily minimum load rate, and the correlation between daily minimum load rate and meteorological factors such as temperature (maximum, minimum, average), daily average humidity, air pressure (maximum, minimum, average), daily maximum wind speed, etc. The key factors affecting the user's electricity consumption characteristics are obtained to prepare the input vectors of the decision tree in the next section. The results of grey correlation analysis are shown in the following table: Table 1 Grey Relevance Analysis of User Load Characteristics   T1 T2 T3 T4 T5 T6 T7 T8 In this table,T1,T2,T3,T4,T5,T6,T7,T8 represents daily maximum temperature ， daily maximum barometric pressure,Maximum daily wind speed,daily minimum temperature,daily minimum barometric pressure,daily average temperature,daily average humidity,daily mean barometric pressure respectively. And P1, P2, P3, P4, P5, P6, P7 represents daily peak load,daily minimum load,daily peak-valley difference,daily peak-valley difference rate,daily average load, daily load rate and minimum daily load rate respectively.
As can be seen from the above table, temperature has the greatest and most direct impact on user load characteristics. Daily maximum temperature, daily minimum temperature, daily average temperature and daily average humidity are the four meteorological factors that have more than 0.6 grey correlation degree with most of the characteristics of daily load, which have a greater impact on the user's electricity load. Therefore, this paper chooses the daily maximum temperature, daily minimum temperature, daily average temperature and daily average humidity as the historical electricity load based on decision tree. Input vector for classification.

classification of predicted daily load based on CART decision tree
Decision Tree (DT) is an analysis method which constructs a decision tree to obtain the probability that the expected NPV is greater than or equal to zero on the basis of known sample data occurrence probability. According to the different types of input data and prediction objectives, it can be divided into classification tree and regression tree: classification tree can achieve the classification prediction of unknown types of objects; regression tree can achieve the prediction of continuous values.
In this paper, CART decision tree is used to classify the user's historical electricity data and to establish classification rules. The idea is as follows:  Fig. 1 The process of establishing user load classification rules In the figure above, the middle part is the CART decision tree algorithm model, and the left part is the influencing factor set of the user's daily load curve input by the decision tree algorithm. As the sample eigenvalue of the daily load, the right part is the user's power consumption category obtained by clustering analysis using k-means algorithm. As the class label of the sample, the sample eigenvalue and the sample label are input into the decision tree. By training in the model, the classification rules of electricity load can be established, and the sample eigenvalues (meteorological factors) of the days to be predicted can be input into the established classification rules, and the classification results of the days to be predicted can be obtained. Finally, the same type of daily data sets as the days to be predicted can be found.

Load forecasting model based on elman neural network and CART decision tree
Elman neural network was proposed by J.L. Elman in 1990,which is a dynamic regression neural network.Elman neural network added a receptive layer as a delay operator in the hidden layer, which has the function of short-term memory. Therefore, it has the ability to process dynamic information and can recognize and detect time-varying patterns.

Elman neural network structure
Elman neural network includes input layer, hidden layer, acceptance layer and output layer. The structure of the whole network is as follows: Elman neural network uses the sum of squares of errors as learning index function to determine model parameters.

Load forecasting based on elman neural network and CART decision tree
Elman neural network has a stronger dynamic memory function than the traditional static feedforward neural network, so it is suitable for building load forecasting model of time series. The flow chart of Elman neural network prediction is as follows:

Case study
In the analysis of this example, firstly, we use clustering algorithm to classify the user's historical electricity data and get the category label of the historical data; secondly, grey relational degree analysis algorithm is used to get the meteorological factors which have great influence on the user as the input vector of the decision tree algorithm; secondly, input the selected meteorological factors and category labels into the decision tree to get the classification rules and to forecast. The meteorological factors of the day are input into the classification rules established by the decision tree, and the classification results of the day to be predicted are obtained. Finally, the historical electricity data of the same kind of day to be predicted are used as the training set, and the daily load curve of December 31, 2016 is predicted by Elman neural network.

Cluster analysis of historical electricity data
In this section, the K-means clustering algorithm is used to classify the daily load curves of users from January 2015 to December 2016. Each daily load curve is composed of 24 full-time active power loads per day. The clustering results of the users are as follows:

Selection of key influencing factors for users
In this paper, the correlation between meteorological factors such as temperature, humidity, air pressure, maximum wind speed and load characteristics has been quantitatively analyzed by using grey correlation analysis. The meteorological factors which have great influence on daily load characteristics have been selected, and the results are shown in the following table: The four meteorological indices of daily maximum temperature, daily minimum temperature, daily average temperature and daily average humidity are selected as the main meteorological influencing factors and become the input vectors of the follow-up decision tree algorithm.

Classification of predicted days based on decision tree
Based on the preparation of the previous two steps, the user's historical meteorological data, i.e. daily maximum temperature, daily minimum temperature, daily average temperature and daily average humidity, are input into the CART decision tree algorithm as sample eigenvalues, and the class labels of the sample data obtained by clustering analysis are also input. The classification rules shown in the following figure are established: In the figure above, x1 represents the daily maximum temperature, x2 represents the daily minimum temperature, x3 represents the daily average temperature, x4 represents the daily average humidity, a total of 15 nodes. The values of meteorological factors on the day to be predicted are shown in the following table: Based on the classification rules established by decision tree algorithm, the meteorological factors of the day to be predicted are input, and the day to be predicted is divided into category 1.

Load forecasting based on elman neural network
Based on the preparation of the first three steps, Elman neural network can be used for load forecasting. The third step results show that the day to be predicted belongs to the first category of clustering results. Therefore, the historical load of the category label belonging to the first category is used as the training set of Elman neural network. The final load prediction results are shown in the following figure:

Fig. 5 Load prediction results
In the figure above, curve  represents the actual value, curve  represents the direct forecasting value, curve  represents the Elman neural network method forecasting value. The average relative error of the daily load curve obtained by the method described in this paper is 3.87%. The load data of two years before the forecast date are trained directly by Elman neural network. The average relative error of the forecasting result is 12.53%. The method in this paper reduces the error of the load curve forecasting obviously and improves the accuracy of the load curve forecasting. Although the accuracy of the user-level load forecasting is improved, the load curve is not smooth due to the large fluctuation of the daily load curve of the user level. There are a few errors in the load forecast for 24 hourly points of the day, resulting in a final error of 3.87%. Overall, this method has reference value for user-level load forecasting.

Summary
This paper obtains the category label of user's historical daily load curve based on clustering analysis; obtains four key factors influencing user's daily load characteristics based on grey correlation analysis: daily maximum temperature, daily minimum temperature, daily average temperature and daily average humidity; selects the same type of daily data set based on CART decision tree and the day to be predicted; and finally carries out load pre-loading based on Elman neural network. Measurement. A refined short-term load forecasting model considering meteorological factors is established by synthesizing various methods in this paper.
Compared with training historical load data sets directly using Elman neural network to obtain prediction results, the method used in this paper chooses the same type of daily data sets as the day to be predicted, which improves the prediction accuracy. The refined load forecasting model established in this paper has reference and research value for user-level load forecasting, and can also be extended to load forecasting of regional power grids (provincial and municipal power grids).