Quantitative Prediction Method for Distribution Power Grid Risk

. The electric power distribution grid is directly oriented to the majority of the ordinary users. Traditional operation and maintenance are performed mainly based on experience, which disable to rationally evaluate the status of the line and predict faults. Based on big data, the risk of the line is evaluated through principal component analysis in this paper, so that a machine learning algorithm is carried out to calculate the risk value of the distribution grid line unit. Finally, GA-BP neural network is used to build a line risk value prediction model for improvement.


Introduction
The traditional electric power distribution grid operation and maintenance is difficult due to a large number of distribution grid lines, which mostly relying on manual inspections. In the context of the big data era, it is extremely important to build a reasonable line failure risk prediction model using the massive data from the existing distribution grid. In literature [1], in order to extract valuable information from distribution grid data and provide effective data support for distribution grid operation, it is proposed the distribution grid risk early warning management and control based on big data. In literature [2], aiming at the problem of inadequate construction of power system facilities and measures, an optimization of BP neural network based on cloud theory and genetic algorithm is proposed to improve the accuracy of fault location. In order to solve the problem of feeding forward (BP) neural network in the cost prediction of distribution grid, a GA-BP neural network distribution network project cost prediction model is proposed [3]. In literature [4], a model for the location and isolation of the fault section in the distribution grid is proposed with dynamic topology change based on an improved genetic algorithm. In literature [5], the influence of multiple types of faults generally existing in the direct current power grid is analyzed, and a real-time fault diagnosis method based on the neural network model of current are proposed. In this paper, big data is introduced so as to build a risk value prediction model using GA-BP neural network so that evaluate the risk of the distribution grid lines through calculating the risk value of distribution grid lines based on the machine learning.

Calculate risk value of the route
According to the collected data of a power supply company's 10kV distribution line failures in two years, analysis and cause classification are carried out, so that 8 key factors leading to the distribution grid line failures are achieved.

Failure cause factors description
The main fault cause factors are show in Figure 1, with corresponding explanation.

Principal component analysis method
Taking into account the difference between the dimensions of the data of each variable and the inconsistent units, the method of normalizing variables is adopted to eliminate the influence of the dimension. The normalization formula is (1): After the data is normalized, the principal component analysis is performed to obtain the contribution rate of each principal component in Table 1.  As show in Table 1, the cumulative contribution rate of the first 7 principal components has reached 96.21%, so that only the first 7 principal components are selected for subsequent calculations, in order to reduce the calculation dimension.
In order to obtain the comprehensive score of the principal component of each route, the value of each principal component is used to measure the magnitude of the risk value. The contribution rate of each principal component is set to construct a comprehensive evaluation function to calculate the risk score of each route in (2).  Since each variable in the principal component analysis is a positive indicator. That is, the larger the variable value, the greater the risk of the route. Therefore, as for the line risk value calculated through the above steps, it can also be considered that the larger the value, the greater the risk of the line.
The risk score calculated by the principal component analysis method has a negative value, which is transformed to obtain the risk value _ risk value , with the following formula and the actual value of the dependent variable in the regression equation (3).

A risk model with BP neural network
The aforementioned EMLR model has a good effect on a specific data set, while does not work well on other data sets. Thus, a BP neural network is performed to improve the model so that it can be better adapted for the data sets. The design of BP neural network mainly includes several aspects such as the number of network layers, the number of input layer nodes, the number of hidden layer nodes, the number of output layer nodes, the transfer function, the training method, and the setting of training parameters.
(1) Number of network layers: BP neural network can contain one or more hidden layers. However, theoretically it has been proved that a single hidden layer network can result in nonlinear mapping by appropriately increasing the number of neuron nodes. Therefore, for most application, a single hidden layer is rational.
(2) Number of nodes in the input layer: The number of nodes in the input layer depends on the dimension of the input vector.
(3) Number of hidden layer nodes: The number of hidden layer nodes has a great influence on the performance of BP neural network. Generally speaking, more hidden layer nodes can bring better performance, but may lead to too long training time, which is also a defect of the current BP neural network. At present, the main empirical formulas for determining the number of hidden layer nodes are in (4)  80% of the samples in the data set are randomly selected as the training set so as to train the BP neural network. 20% of the samples are selected as the test set in order to test the training effect of the BP neural network. The partial result of the test set prediction is in Table 3 as follows :  10kV Liangban Line Longtaishe Branch Line 10.2233 10.2177 In order to quantitatively evaluate the prediction effect of the BP neural network, the coefficient of determination is introduced, which ranges within [0, 1]. The closer it is to 1, the better the prediction effect of the model is. On the contrary, if it is closer to 0, it seems that the prediction effect of the model is worse. The expression of the coefficient of determination is in (4): Where, l is the number of samples, ( 1,2,..., ) i y i l is the true value of the i sample, andˆ ( 1, 2,..., ) i y i l is the predicted value of the ith sample.

3.2.Optimal BP neural network through the genetic algorithm
If only the traditional BP neural network is used to train the risk value of each line, the training model obtained will not be stable. In order to increase the reliability of the model, the genetic algorithm is introduced to optimize the initial weights and thresholds of neurons in the hidden layer and output layer in the BP neural network. When using the genetic algorithm to update the weights and thresholds of each neuron, the fitness function is defined as (5)

Conclusions
This article firstly conducts principal component analysis on the 8 factors that affect line faults, and obtains the principal component scores of each line, which is regarded as the risk worthy of the line. Then, 8 failure factors are taken as independent variables, and the risk value normalized by risk value score are used as the dependent variable. In addition, stochastic gradient descent method is performed for parameter training in multiple linear and exponential regression equations, so that achieve the quantitative expression of the risk value. Finally, in order to increase the generalizability of the model, 8 fault factors and risk values are performed with GA-BP network training to obtain a universal line risk value prediction model.