Non-intrusive load decomposition model based on Group Bayesian optimization and post-processing

Non-intrusive load decomposition can decompose the power consumption of a single appliance from the household bus data, which is of great significance for users to adjust their own power consumption strategy. In order to solve the problem of large amount of computation in hyperparameter optimization of load decomposition model based on deep residual network, a Group Bayesian optimization method is proposed. This method can obtain better hyperparameter combination with less computational cost. In addition, in order to solve the problem of irrelevant activation of the model decomposition results, an improved post-processing method is proposed to improve the comprehensive performance of the model. Finally, the public data set REFIT is used to verify the proposed method, and the results show that the proposed method has a low decomposition error.


Introduction
One of the urgent needs of smart grid and energy Internet is to obtain the electricity consumption data of individual electrical appliance, based on which users can understand the energy consumption law of each electrical appliance and reduce energy consumption pertinently. This is an important step towards transparency and intelligence of the power grid. The current measurement technology can only automatically read the total power consumption data, and it is difficult to further obtain a user's internal load information. The load decomposition technology has become a major bottleneck in the development of smart grid.
Load decomposition techniques can be divided into Intrusive Load Decomposition (ILD) and Non-Intrusive Load Decomposition (NILD) method. NILD is a technology for estimating the power consumption of each electrical appliance of the user when the total power demand of the user bus is known. ILD technology needs to install measuring instruments on all the electrical appliances of the user, which leads to high investment cost and difficult maintenance. In contrast, NILD has the advantages of easy installation, convenient maintenance, simple hardware and so on. Correspondingly, NILD puts forward higher requirements for software algorithms, which has become the biggest obstacle for NILD to put into practical application.
Hart [1] put forward the concept of NILD for the first time and adopted the method of Combinatorial Optimization (CO) to solve this problem, creating a precedent in the field of NILD. Subsequently, a variety of machine learning methods have played an important role in the research of NILD, including k-Nearest Neighbor (KNN) algorithm, Support Vector Machine (SVM), Matrix Factorization and so on [2,3]. Among them, Hidden Markov Model (HMM) has become one of the more popular methods, because it can independently model the running states of a single electrical appliance, and achieve better decomposition results. In reference [4], a single electrical appliance is described as a HMM model, and NILD problem is described as a Factorial HMM (FHMM) problem. The model is solved by maximizing the logarithmic likelihood probability of the total power sequence. However, the traditional machine learning methods also have some limitations, and there is room for improvement in decomposition accuracy and generalization performance.
In recent years, as deep learning and artificial intelligence have become the focus of research, the research methods of NILD have gradually transferred from traditional machine learning to deep neural networks. The advantage of deep neural network is that a more complex and accurate model can be established through the superposition of neural network layers. In reference [5], a sequence-to-sequence (seq2seq) structure based on neural network is proposed and solved by RNN or CNN model, while in reference [6], the seq2seq method is improved and a sequence-to-point (seq2point) structure is proposed, which solves the problems such as disappearance of model gradient and inaccurate prediction of edge points in seq2seq structure. Reference [7] uses the same seq2point model as [6], which proves that this method has strong learning ability of cross-appliance and cross-domain transfer.
Although the above methods have achieved certain results, there are still two common problems to be solved in the current NILD algorithms: 1) The structure of the deep neural network is quite complex, so the hyperparameters in the network can easily reach millions. An efficient method is needed to determine a set of superior hyperparameters in numerous combinations with low time and hardware cost to improve the accuracy of the model.
2) One problem with NILD is that there may be many unreasonable prediction results in the decomposition results. These irrelevant activations often damage the performance of the model, but there is no simple and efficient method to deal with this phenomenon.
In view of this, we introduce the concept of group optimization on the basis of the traditional Bayesian optimization method, and proposes a Group Bayesian optimization method to realize hyperparameter optimization of deep residual network (ResNet), obtaining a better load decomposition model with a small computational cost; at the same time, a post-processing method for the decomposition results of the model is creatively proposed to eliminate unreasonable activation and further improve the comprehensive performance of the model. Finally, the public dataset is used to verify the method proposed in this paper, which proves the effectiveness of the method we propose.

NILD problem
NILD aims to predict the electricity consumption of a single appliance from the total electricity consumption sequence. Assuming that it is known that the bus power consumption of a power user in a certain period of time is 1 2 [ , ,..., ] sum T P P P P , where T is the length of time, the total power consumed by the user at a certain time can be expressed as: ; N is the number of electrical appliances; t P represents the total power at time t ; i t P represents the power of the appliance i at time t and t e represents the noise of the model.

Deep ResNet
In NILD task, extracting the operating characteristics of the target electrical appliance from the bus power is the key to achieving load decomposition. Convolutional Neural Network (CNN) can realize feature extraction by constructing different convolution kernels [6,8]. We can understand the power sequence extracted from the user bus as a two-dimensional image that integrates the electrical behavior characteristics of multiple electrical appliances, and introduce the method of CNN to extract the unique operating characteristics of the target appliance to achieve load decomposition. The performance of CNN is closely related to the depth of the network. It is found that with the increase of the number of network layers, the phenomenon of gradient disappearance and gradient explosion will hinder the convergence of the model. In order to solve this problem, we introduce the deep ResNet model proposed by He et al. [9].
The difference between the deep ResNet and the general CNN is that the former introduces the idea of fast connection. The original input skips multiple convolution layers through the fast connection and is directly superimposed with the residual into the output. This operation enables the deep ResNet to learn the original features directly from the input, so as to avoid the disappearance or explosion of gradients.
The schematic diagram of the residual block, the basic unit of the deep ResNet, is shown in Figure 1. It can be seen that the residual block is divided into two parts: the connection on the left is identical mapping, and the connection on the right is the residual part, which is composed of two convolution operations and their corresponding Batch Normalization and Relu activation process. The final output of the residual block can be obtained by superimposing the left and right parts and activating the Relu function. When the number of channels of the input and output is not the same, the identity mapping can not be used, and the 1 1 u convolution operation is needed to increase or reduce the dimension. The mathematical expression of the residual block is: where l x represents the input of the residual block; +1 The application of residual block makes it possible to further deepen CNN, which is helpful to extract higher-level features from power sequence and improve the accuracy of load decomposition model.

Load decomposition model based on the deep ResNet
The structure of the deep ResNet finally built in this paper is shown in Figure 2. It uses the total power sequence intercepted by the sliding window as the input. The first convolution layer extracts the preliminary characteristics of the load, and then the output of the convolution layer is sequentially fed into the residual block to extract higher-level and more abstract power consumption behavior characteristics, and the final fully connected layer realizes the nonlinear mapping of the feature vector to the target appliance power value in the midpoint of the power sequence.

Bayesian optimization
The NILD model based on the deep ResNet introduced above has many adjustable hyperparameters, such as the size of the convolution kernel, the sliding step size, the number of neurons in the fully connected layer, and the size of the sliding window, etc. The selection of hyperparameters will greatly affect the performance of the model. In order to ensure a better model decomposition effect, we use Bayesian optimization to optimize the hyperparameters instead of manually setting them. Bayesian optimization utilizes the Gaussian process. For load decomposition model ( ) Y f X , we hope to determine the next search point through a combination of known hyperparameters. If t sets of hyperparameters 1 2 , , Covariance is calculated by Gaussian kernel function: where 0 D and [ are parameters of the kernel function. After obtaining t sets of candidate solutions, the corresponding Gaussian regression model is established to obtain the posterior probability of the model index value at any point, and the posterior probability is used to construct a collection function to determine the next set of hyperparameter combinations that need to be searched. The collection function expression is: ( ) t f x obey the t-dimensional normal distribution. The mean vector and covariance matrix are divided into blocks, which can be written as: 1: 1: 1: ( ) t f x is known. According to the nature of the multi-dimensional normal distribution, the conditional distribution obeys one-dimensional normal distribution. The calculation formula is: where P is related to 1: V only related to the covariance value calculated by the Gaussian kernel function, and has nothing to do with 1: ( ) t f x .

Group Bayesian optimization
On the basis of Bayesian optimization, this model introduces the idea of group optimization, and realizes the intelligence of problem solving through the search behavior of the body and the information interaction within the group. The specific steps of the Group Bayesian optimization method proposed in this paper are as follows: Step1: Initialize the maximum number of searches and the total number of searched individuals, and give each individual an initial hyperparameter combination within a reasonable range.
Step2: Construct models separately according to the hyperparameter combination of each individual, use the same data set for training, and obtain the index value of the model under the hyperparameter combination corresponding to each individual.
Step3: For each individual, compare its hyperparameter combination with the hyperparameter combination corresponding to the individual with the best model index value in the group, and update it according to the following formula: (11) where id x is the value of the d dimension of the i-th individual; id p is the optimal value of the d dimension of the i-th individual; gd p is the optimal value of the d dimension in the group; (0,1) rand is a random number from 0 to 1.
Step 4: If the end condition is not met, return to step 2, otherwise the algorithm ends, and the hyperparameter combination corresponding to the model with the best model index value in the group is the optimal solution searched out.

Training method
The NILD problem to be solved in this paper is a typical regression problem. Using the mean square error (MSE) as the loss function used in network training, that is, using the square of the difference between the real value and the predicted value and then summing the average, MSE can describe the total gap between the real value and the predicted value. The smaller MSE is, the higher the prediction accuracy of the model is, and the mathematical expression is as follows: where n y represents the real value of the model andˆn y represents the predicted value of the model.

Results & Discussion
In this paper, python programming environment is used to develop the algorithm, and the deep ResNet model is built on the framework of tensorflow + keras. Model training and testing is completed on a server with 4 GeForce RTX 2080 Ti graphics cards, and we use GPU for hardware acceleration.

Dataset
We select the public data set REFIT to simulate the built model. The REFIT dataset is a public dataset from three universities Loughborough, Strathclyde and East Anglia which provides electricity use data of 20 households in England, covering the period from 2013 to 2015 [10]. These data include the total household electricity consumption data and the electricity consumption data of a single appliance with a frequency of 8s. We selected fridge, kettle, microwave, washing machine and dishwasher as the target research appliances which are the most used and accounted for the largest proportion of energy consumption in the households. Table 1 shows the splits for the REFIT data in the simulation:

Pre-processing
In order to avoid deviations in the results of load decomposition caused by different total power levels, we use the minimum-maximization normalization method to pre-process the original data and map the data to the [0,1] interval. The normalization function is as follows: where * x represents the normalized data; max x represents the maximum value of the power sequence data and min x represents the minimum value of the power sequence data.

Post-processing
Post-processing refers to revising the decomposition results of the model based on prior knowledge. Literature [11] found through research that the decomposition results of the load decomposition model not only include the true on state of the electrical appliance, but also often include some sporadic "false" activations, that is, the model mistakes the electrical appliance in the on state, which leads to the final decomposition value bias high, which reduces the decomposition performance of the model.
For these "false" activations, simple logical judgments can often be used to eliminate them, thereby further improving the performance of the model. For this (0,1)( reason, we propose a post-processing method for the model decomposition results, including five steps: Step1: Record the shortest activation time of the target appliance in the training data through the threshold method.
Step2: Use the threshold method to record the duration of each activation of the target electrical power decomposition value.
Step3: Eliminate the activations whose activation duration is less than the shortest activation time in the power decomposition value of the target appliance.
Step4: Judge the total load power section corresponding to the remaining activation. If the total load power in this section has corresponding power rises and drops, then the activation is considered reasonable; otherwise, the activation is considered unreasonable, and it is removed.

Metrics
We use three typical metrics to evaluate the performance of the model. The first metric is the Mean Absolute Error (MAE), which is defined as the average value of the absolute error between the predicted value of the target electrical appliance and the true value at each moment. The mathematical expression is as follows: where t y represents the real power of the target appliance at time t andˆt y represents the predicted power of the target appliance at time t .
The second metric is the normalized signal aggregation error (SAE), which is defined as the relative error of the total energy between the predicted value of the target appliance and the true value. The mathematical expression is as follows:Ê where E represents the total energy consumption of the appliance, that is 1 T t t E y ¦ , andÊ represents the predicted appliance energy consumption, that is The third metric is the standard decomposition error (NDE), which is defined as the normalized error of the squared difference between the predicted value of the target electrical appliance power and the true value. The mathematical expression is as follows:

Experimental results
In order to build a better load decomposition model, we first use Group Bayesian optimization to find a set of suitable hyperparameters for the load decomposition models of kettle, microwave, fridge, washing machine and dishwasher. The hyperparameters that need to be optimized here include the size of the convolution kernel of the convolution layer, the sliding step size, the number of neurons in the fully connected layer, and the size of the sliding window. Figure 3 shows the results of Group Bayesian optimization. After constructing the optimal load decomposition model for each electrical appliance, the test set is used to test the decomposition performance of the model, and the data of the test set is not involved in model training. Figure 4 shows the true and predicted values of the above electrical power. The results show that the deep ResNet we proposed can extract the operating characteristics of electrical appliances and almost accurately capture every activation. However, it can be found that the model also predicts some irrelevant activations, which will damage the decomposition accuracy of the model. After using the post-processing algorithm we proposed to deal with this problem, it can be found from Figure 4 that these irrelevant activations have been eliminated, which can prove the effectiveness of our post-processing method. Table 2 shows the comparison results of various metrics between the method in this paper and the method in [7]. It can be found that the deep ResNet we proposed has achieved optimal values in MAE, SAE and NDE metrics compared with ordinary CNN. This is due to the fact that the deep ResNet can be stacked very deep and it can extract deeper features to achieve load decomposition. Especially after post-processing, the performance of the model can be further improved on the original basis.

Conclusions
In this paper, based on the theory of CNN and residual mechanism, a NILD method based on deep ResNet and improved post-processing mechanism is proposed. In this method, the power sequence is converted into feature images, and the behavior characteristics of many kinds of electrical appliances are expressed in the way of images, so as to realize load decomposition. The results of numerical examples show that the proposed method achieves lower error level and higher decomposition accuracy. The contributions of this paper are as follows: 1) Based on the CNN theory, the power sequence to be decomposed is transformed into a two-dimensional feature image, and the load decomposition model based on deep ResNet can fully extract the characteristics of multi-appliance behavior contained in the power sequence, so as to obtain the decomposition power of the target appliance.
2) Based on the theory of group optimization and Bayesian optimization, a Group Bayesian optimization method is proposed to realize hyperparameter optimization more efficiently.
3) In view of the unreasonable activation in the results of load decomposition, an improved post-processing method is proposed in this paper to improve the comprehensive performance of the model in an all-round way.
The transfer and learning ability of deep ResNet model can be further studied in the future.