Weighted Islanding Detection for DC Microgrid Based on Random Forest Classification

. At present, the main form of microgrid is AC grid. DC microgrids have received extensive attention and research with the rapid development of various DC power. The operation mode of the DC microgrid is divided into grid-connected operation and islanding operation. Islanding is formed after the circuit breaker tripped, which connects microgrid to large grid. Islanding operation can be divided into planned islanding and unplanned islanding. Unplanned islanding will cause certain harm to users and systems, so it is necessary to detect islanding accurately in the DC microgrid. This paper proposes an islanding detection method for DC microgrid based on random forest classification. Firstly, raw data is cleaned, extracted features and generated feature vector set. The extracted features include six islanding characteristic indexes, which consist of voltage, current, output active power and their first order backward difference on the DC bus side. Then, based on random forest classification, building the islanding detection model. Islanding detection model for DC microgrid can distinguish islanding event successfully and accurately. Based on weighted random forest classification, it can detect islanding event more accurately compared with decision tree classification when processing large amounts of data.


Introduction
Currently, DC microgrids are developing rapidly as distributed power supplies and a variety of DC powerusing devices are being used in a large number of microgrids [1]. DC microgrid operation is divided into two modes of operation: grid-connected and islanded [2][3], and when the circuit breaker of the connection between the microgrid and the larger grid is tripped, islands are formed. Island operation can be divided into planned and unplanned islands [4]. Unplanned islands can cause some harm to the users or the system [5], so accurate island detection is essential for the safe and stable operation of DC microgrids. Traditional islanding detection methods mainly include local active, local passive, and remote methods [6]. These methods have large detection blind zones, affecting the power quality of inverter output, high cost and complex design problems. Data mining technology has two main functions: one is to query the historical operating information of the power system, and the other is to establish potential links between query data to address siloed predictions, decision making, etc. Therefore, scholars at home and abroad have conducted different levels of research on data mining techniques. The reference [7] proposes an islanding detection data mining system, which consists of three steps, including key feature identification, base learner and meta-learner, to improve the accuracy and generalization of AC microgrid islanding detection. The reference [8] employs a data mining C4.5 decision tree to solve the problem of islanding detection in AC microgrids. Reference [9] proposed a random forest classifier-based anomaly data detection method to achieve anomaly detection and anomaly data repair function. The reference [10] trains five different classifiers for the problem of islanding detection in AC microgrids, with a random forest classifier capable of Detecting silos with high accuracy and reasonable time. Random forest is an integrated learning method for multiple decision trees, which can overcome some of the shortcomings of a single decision tree, with good scalability and parallelism [11], which is more widely used in many fields, such as remote sensing data analysis [12], biochips [13], speech identification [14], cheating webpage detection and complete network detection [15]. It has a good application in islanding detection in distributed power generation systerms [16]. In this paper, an islanding detection method for DC microgrid based on random forest classification is proposed. First, a DC microgrid simulation model was built to obtain state information such as voltage, current, and output active power on the DC bus side of the system in grid-connected and islanded operation mode, and data cleaning was performed. Then, key features reflecting the islanded operation of the DC microgrid are extracted to generate a set of feature vectors. Finally, a random forest classification-based method for DC microgrid islanding detection is proposed, which is shown to improve the accuracy of DC microgrid islanding detection. Compared with decision tree classification under different scales of data and weighted random forest, the random forest classification model considering the weight can detect islands more accurately, which has certain scalability and practical significance.

DC Microgrid Islanding Operation Feature Extraction
DC microgrid islanding detection mainly includes data acquisition, data cleaning, islanding feature extraction and random forest classification. Island features are intrinsic to island operation and are directly related to the accuracy of island detection, so extracting valid island features is key to island detection. At the moment the islanding occurs, the AC grid changes active power , reactive power , frequency , voltage , and power factor . Therefore, the key features of islanding detection are: and its rate of change , and its rate of change , and its rate of change , and its rate of change [17][18] . To summarize the existing studies, the common features of islanding detection in AC power grids are shown in Table 1. The DC microgrid only needs to consider active power balance and stable DC bus voltage, but not phase, frequency, and reactive power [19][20] , so the paper selects six islanding characteristics indicators such as DC bus side voltage, current, output active power, and their respective first-order backward differential as detection features, as shown in Table 2. The extraction of key features of DC microgrid islanding operation includes the following steps, as shown in Figure 1.

Captions/numbering
Build a DC micro-grid simulation model, and the DC bus side of the continuous voltage is , the sampling voltage is , for the sampling period, defined as . The voltage data for gridconnected and islanded operation of the DC microgrid can be obtained. Similarly, you can get state information such as the DC microgrid bus-side current , output active power , AC-side voltage, current, output active power and reactive power and so on. And these data is stored.

Data cleansing
Estimating the similarity of each pair of samples using (Euclidean distance) [21][22] , cleaning and merging the data, and handling duplicates values and outliers to improve the quality of the dataset. The Euclidean distance is calculated as follows

Captions/numbering
Captions should be typed in 9-point Times. They should be centred above the tables and flush left beneath the figures. (1) where, -the Euclidean distance between the sample and the sample ; -the number of sample dimensions; , -the dimensional information of the sample , .

Feature extraction
Six state quantities of voltage, current, output active power and their respective first-order backward differential are extracted from the cleaned data as DC microgrid islanding detection characteristics. The firstorder backward differential of voltage, current, and output active power are , , respectively, and are calculated as follows (2) where, , -the voltage on the DC bus at the moment and moment , , the current on the DC bus at the moment and moment , , -the output active power on the DC bus at the moment and moment .

Generation of vector sets
Determine the set of feature vectors, each of which contains , , , , , etc. six feature indicators.

Random forest-based classification DC microgrid islanding detection
The random forest consists of multiple mutually independent decision trees, which is an integrated learning method based on decision trees, and the random forest classification results are determined by the voting of all decision trees, which has a high accuracy and is able to process big data effectively, and therefore has good results in orphan detection. The paper develops the modeling using random forest algorithm on the basis of obtaining the indexes of DC microgrid islanding characteristics, and the modeling process is shown in Figure 2. First, the sample set containing six indicators , , , , , of islanding characteristics was split into a training set and a test set. Then use the method randomly selects multiple subsets of training samples from the training sample set and models a decision tree for each subset separately. Combining the decision results from multiple trees, the final orphan detection model is derived by voting [23] .

Random selection of a subset of training samples
The raw sample set consists of two types of data, DC microgrid grid-connected operation and islanded operation. The training set containing six indicators of islanding characteristics is divided into a training set and a test set according to 7:3 [24][25] . Firstly, a n sub-sample set was randomly selected from the training set by sampling method, and then a n classification regression tree was constructed ( classification and regression tree CART ， ) [26] . The construction of each decision tree requires the random selection of F of the M island detection traits as a random feature variable to participating in the decision tree nodes splitting process. This method solves the overfitting problem arising from the construction of a decision tree and ensures the randomness of the construction of the decision tree.Where When M is 6, randomly choose F as 1 or 2. When decision tree randomly chooses 1 characteristic, =1 F ; and when decision tree randomly chooses 1 characteristics, =2 F .

Building the CART decision tree
Based on the principle of Gini coefficient minimization, n decision trees is constructed for each subset of random training samples using a CART algorithm, creating a "forest" [27] . According to the reference [28] the number of decision trees is about 100 when the random forest classification performance is close to optimal, so in this paper 99 trees are selected decision tree to form a random forest.

Voting on the results of silo testing
The random forest model consists of a set of n CART decision trees that are used to validate the accuracy of the model using test set data. The test samples a are used as inputs to the random forest and the output of the k tree is the

Experimental results and analysis
Application of establishes a simulation model for DC microgrid. The model consists of five parts: AC main grid, inverter controller, line impedance model, DC circuit breaker, distributed generation, and a DC microgrid system and load model, is shown in Figure 3. The wind turbine simulator is used as the object of study.
On the basis of guaranteeing the maximum power output of wind power generation and system power balance, the DC microgrid grid-connected operation, large island event, and small island event are simulated by switching action, respectively. As shown in Table 3. Small Island disconnect cb1 、cb2： the DC microgrid is disconnected from the AC main power grid and DC load disconnect cb1 、cb3：the DC microgrid is disconnected from the AC main power grid and AC load disconnect cb1 、cb5：the DC microgrid is disconnected from the AC main power grid and lithium battery energy storage system disconnect cb1 、cb6：the DC microgrid is disconnected from the AC main power grid and photovoltaic power generation system For grid-connected operation, large islanding events, and small islanding events, three operating states are simultaneously collected, including DC bus side voltage, current, output active power, AC side voltage, current, output active power, and output reactive power of the wind turbine simulator. Take the grid-connected DC microgrid operation and large island operation as an example, the blue part of the system simulation diagram shows the grid-connected operation state and the red part shows the large island operation state. When the simulation runs to 0.7s, the cb1 switch is disconnected and the DC microgrid enters large island operation. After 0.2s or 0.9s moment, the large island operation state enters steady state operation and collects the DC bus side voltage of the wind turbine simulator. Current, output active power changes, as shown in Figure 4.  Table 4 simulation events, 240,000 sets of simulation data can be obtained. Each set of data contains status information such as fan simulator DC bus side voltage, current, output active power, AC side voltage, current, output active power, output reactive power and so on. Select the DC bus-side voltage, current, and output active power as raw data, and clean these data in MATLAB. KNN classification results of the classifier learner as an evaluation of the quality of the dataset indicators. The confusion matrix has 0 for non-islanded and 1 for islanded, and the 240,000 sets of data are cleaned to get the 223,780 groups data. Misclassified data were significantly reduced and the quality of the dataset improved, as shown in Figure 5. The AUC values before and after data cleaning are 0.93 and 0.97 respectively, so the cleaned data can be classified better, i.e. the data quality is higher. . The ROC curves before and after data cleaning are shown in Figure 6.
from the cleaned data. And these six sets of features are selected and combined to generate feature vector sets. The sample set contains 223,780 feature vectors, of which 70% are randomly selected for the training set and the remaining 30% for the test set. The training set is used to train the random forest classifier and the test set is used to test the recognition rate of the completed trained model. The number of samples is shown in Table 4 and the test results are shown in Table 5.  From Table 5, it can be seen that the detection accuracy for islanding and parallel operation is 98.22% and 93.81%, the total detection accuracy of random forest classification is 97.48%. respectively, methods based on random forest classification can be effective in detecting orphan islands.

Comparison of random forest classification method and decision tree method
Compare and analyze the random forest-based classification method with the decision tree method. Select sample datasets containing 180,000, 120,000, and 60,000 groups for the test analysis and repeat the above steps, and the results of the analysis are shown in Table 6. As shown in Table 6, the larger the sample size of the dataset, the higher the predictive accuracy of the random forest classification detection model compared to the decision tree approach.

Weighted random forest classification method
In the random forest classification method, calculate the decrease value of the Gini coefficient of a feature index at the corresponding node, which is called D Gini . The weight of the characteristic index can be obtained through solving and calculating the reduced value of all Gini coefficients in the forest, and taking the average. Calculated as follows n is the number of overall trees in the construction of the random forest, =99 n in this paper; t is the number of nodes in a single classification tree; G D kij is the reduction value of the Gini coefficient of the -th k indicator at j nodes on the -th i tree. The above method of calculating the weights of feature indicators can effectively improve the generalization ability of the data set, and the data set can handle multiple types of data without standardization. Bring the complete training set sample data into the random forest model for training, use the self-detection function of the random forest model, and obtain the results of the importance of each indicator according to the formula (8) at the same time. According to Tabel 7, voltage is the most important indicator, accounting for 32.6% of the importance of all indicators, which is shown in the status information comparison chart of grid-connected operation and island operation. The first order of voltage, the first-order backward difference of output active power, and output active power account for 28.1%, 15.8%, and 12.3%, indicating that these three indicators have a greater effect on islanding detection and can affect the detection result to a certain extent. It can be found that the influence of current on the detection result is much lower than the voltage characteristic index. For the generated feature vector group, each electrical feature index is given a weight value, and the weight distribution is (7) Where, .
After assigning weights to the feature vector set obtained before, the feature vector set is obtained. Similarly, the sample set contains 223780 feature vectors, of which 70% are randomly selected as the training set, and the remaining 30% are used as the test set. The training set is used to train the weighted random forest classifier, and the test set is used to test the recognition rate of the trained model. The test results are shown in Table 8. As shown in Figure 7, it can be seen that in the weighted random forest model, the detection accuracy of isolated islands and grid-connected operation are 98.26% and 93.73%, respectively. Compared with the unweighted random forest, the islanding situation's detection accuracy of the weighted random forest model has increased by 0.04%. Fig.7. Comparison between weighted random forest and unweighted random forest

Conclusion
This paper proposes an island detection method for DC microgrid based on random forest classification and verifies the effect through simulation, resulting in the following main conclusions.
1) The method proposed in the paper is capable of accurately detecting orphaned islands with an accuracy of 98.26%. This ensures the safe operation of the distribution network containing a large number of distributed power supplies.
2) Compared to decision trees and random forest, the weighted random forest classification model has better generalization ability and can better detect siloed situations when dealing with large amounts of data. The follow-up research focus of the paper will further consider the influence of more factors including the number of random forest containing trees, the proportion of transform training set and test set on the islanding detection accuracy, and continuously improve the proposed islanding detection method for DC microgrids based on random forest classification.