Power Reconstructing Method of Distributed Photovoltaic Based on the Temporal and Spatial Correlation

. 55GW distributed photovoltaic have been installed in China, but nearly half are connected to the low voltage level of 380V, without real-time power data acquisition. The sequential power data is needed to be reconstructed based on some related monitoring data. Current researches focus on outliers recovery, but not reconstruction from none. This paper explores the temporal and spatial correlation of power between adjacent centralized photovoltaic stations and proposes a large-scale missing power data reconstructing method based on the time-delay power correlation, the spatial geometric characteristics of stations and the thought of ensemble learning. Finally, we verify the effectiveness of the proposed method by simulation based on the real photovoltaic power data. The proposed method can get the better effect of data reconstructing compared with the traditional method, which only use the power curves of the nearest CP station to reconstruct the power curves of the DP station according the capacity conversion.


Introduction
Until the end of 2018, 55GW distributed photovoltaic (DP) have been connected with distribution network in China, which challenge the operation of distribution network a lot [1][2][3][4]. To make clear characteristics from historical data is the very start of solving problems.
One of the biggest challenges in China is the worse data conditions [5][6]. There are half of DP are connected to the voltage level of 380V, which have no real-time power data acquisition. For example, 30% of DP installation in northern Hebei Province of China can only collect every electricity energy data, but not power data. In addition, the available meteorological data of DP is very few and low precision due to the high economic cost. Therefore, we need to reconstruct the large-scale missing data of DP.
Actually, some studies have been carried out in outliers recovery. Literatures [7][8][9][10] reconstructed the missing power data based on principal component analysis and neural network, using the data of CP stations around the target DP station and related weather factors. The current studies on the data reconstructing depend on the meteorological data and more suitable for repairing the small-scale missing data. In addition, some intelligent algorithms like neural network require data labels, which means that they cannot handle data reconstructing from none. However, we can learn from their analysis methods of the temporal and spatial characteristics between different photovoltaic stations.
As for the large-scale missing power data reconstructing, the traditional method in engineering is currently only using the power curves of the nearest CP station to reconstruct the power curves of the DP station according the capacity conversion. It is obvious that this method would bring great errors.
Therefore, this paper, based on the real power data of centralized/distributed photovoltaic power stations, proposes daily electric energy ratio, power crosscorrelation coefficient, time-delay power crosscorrelation coefficient and other indicators to mine the temporal and spatial characteristics between CP and DP from the perspective of historical power data. This paper also proposes a data reconstructing method based on the time-delay power correlation, the spatial geometric characteristics of stations and the thought of ensemble learning, which can get smaller errors than the traditional data reconstructing method.

Characteristic mining and data reconstruction
Without historical power curves in DP, it is impossible to reconstruct the missing data using relatively advanced learning algorithms of supervised learning. For the data reconstructing of DP, we can only use its daily electrical energy and the historical power curves of peripheral CP. Therefore, we adopt the idea of unsupervised learning, based on the time-delay power correlation, the spatial geometric characteristics of DP and CP, to reconstruct the missing data.

CPEEE 2020
In order to explore the temporal and spatial characteristics of CP and DP historical power curves, we proposed station relative distance, daily electrical energy, ratio of daily electrical energy, power cross-correlation coefficient and time-delay power cross-correlation coefficient in this section.
Station relative distance (km) refers to the relative distance of each photovoltaic power station, which is calculated by latitude and longitude coordinates, reflecting the spatial characteristics of each station, as shown in (1). In the formula, the lg1 and lg2 represent longitude of two stations respectively. The la1 and la2 represent latitude of two stations respectively. [111 Daily electrical energy refers to the sum of the daily power curve of a photovoltaic station, as shown in (2). In the formula, the n represents the name of a photovoltaic station. The d represents one day. The M represents total number of data points in a power curve. The Pn,d(i) represents the active power of the station n at time of i on the day d.
Ratio of daily electrical energy refers to the ratio of days in which daily electric energy of CP is larger than that of DP, as shown in (3). In the formula, the D represents the total number of statistical days. The QDP represents daily electric energy of DP. Power cross-correlation coefficient refers to the correlation coefficient between a daily power curve of CP and that of DP, as shown in (4). In the formula, Pn(d) and PDG(d) represent the power sequence of CP station n and DP station on the day d respectively. It should be noted that we only use the power sequence in 4 hours around 12 noon to calculate the correlation coefficient, which can reflect the characteristic of a photovoltaic power curve better.
Time-delay power cross-correlation coefficient refers to the correlation coefficient between a power curve of CP after a certain distance of time translation and that of DP, as shown in (5). In the formula, the dt represents the time translation distance. The xmax represents the maximum of the time translation distance, which should ensure that the power sequence used for calculation after time translation belongs to the same day.

Characteristics mining
In order to explore the temporal and spatial characteristics of CP and DP historical power curves, we select a DP station and 8 adjacent CP stations and take their historical power as the object of analysis. Fig. 1 shows their relative position and capacity data. (1) Characteristics about Electrical Energy In order to compare the differences about daily electrical energy between DP and CP, we firstly convert the daily power curves of CP stations according to the capacity of DG0 by (6). In the formula, Sn represents the capacity of CP station n.
Furthermore, we calculate the ratio of daily electrical energy by (3) to get Fig. 2. We can know that the ratio of daily electrical energy of CP stations is close to 1 except for CG6, which is caused by the better photovoltaic maintenance of CP stations than GP stations. However, the reason that the ratio of daily electrical energy of CG6 is close to 0 may be because some inverters of CG6 are not in operation. In general, under the same installed capacity, there are great differences between the daily electrical energy of CP and DP. Thus, when we use the power data of CP to reconstruct the missing power data of DP, we should consider the differences about electrical energy rather than the direct capacity conversion.
(2) Characteristics about Correlation Time-delay power cross-correlation coefficient reflects temporal characteristics of power between different stations. Fig. 3 shows the time-delay power cross-correlation coefficients for four typical days of CG7 and the corresponding power curves of CG7 and DG0. The power curves of CG7 are converted according to the capacity of DG0. We can know that the correlation coefficient increases first and then decreases with the increase of time delay, and reaches its maximum on one point called the best delay time. However, this trend would weaken with the increase of power curve fluctuation. When the power curves is smooth, maybe in a sunny day, the station relative determines the best delay time. When the power curve is smooth, maybe in a sunny day, the best delay time is determined by the station relative. When the power curve fluctuates greatly, maybe in an overcast day, the best delay time may also be influenced by the speed of wind or other weather conditions. Furthermore, we calculate the average time-delay power correlation coefficients of each CP station, as shown in Fig. 4. In general, the time-delay correlation between CP power curves and DP power curves is obvious, which can be used for data reconstructing.
(3)Other Characteristics When we do data reconstructing, we don't know the power data of DP stations, so that we can't directly use its power curves to calculate the time-delay power crosscorrelation coefficient. In order to use time-delay characteristic to reconstruct data, we need to find other power curves similar to the missing power curves. Thus, we calculate the correlation coefficients between the daily power curves of other CP stations and those of CG7. Compared this correlation coefficient and power crosscorrelation coefficient calculated before, we can know that for the different CP stations, The ratio of days where their difference is less than 0.1 is shown in the Table 1.  We can know that in the majority of cases, the difference between the two correlation coefficients is less than 0.1. Therefore, we can use the power curves of the CP stations near DG0 to calculate the time-delay power cross-correlation coefficient.

Data reconstructing method
According to the analysis of characteristics about the power curves of DP and CP, we propose a DP power data reconstructing method based on the electrical energy data of DP and CP, the power curves of CP, the spatial location of the stations and the correlation between the power curves of DP and those of CP.
In addition, traditional DP power data reconstructing method is using the power curves of nearest CP station, which would bring the great accuracy sometimes, such as in the sunny days. However, overall speaking, this model would bring the large variance. Therefore, we lead the idea of bagging in ensemble learning into the data reconstructing method and use several nearcentralized photovoltaic power curves to reconstruct the missing distributed photovoltaic power data.
The specific data reconstruct process is as shown in Fig. 5. There are 9 steps in this method, of which the first two steps screen out target DP and CP stations: DGt, CGtt1, CGtt2 and CGtt3. And other steps involve some calculation formulas, which are described in detail below.
Step4, step5 and step6 are in the cycle. We would introduce the situation that i is equal to one, to which other situations are similar.
In step4, we calculate the best delay time, dt1 and dt2, at the biggest time-delay power cross-correlation coefficients between the power curves of CGtt1 and those of CGtt11 and CGtt12 by (7). In the formula, CGtt11 is CGtt2 and CGtt12 is CGtt3. 1 Step5: estimating the best delay time(dt i1 , dt i2 ) at the biggest time-delay cross-correlation coefficients between CG tti1 , CG tti2 and DG t according to dt 1 , dt 2 and spatial geometric characteristics.
Step4: replacing the power curves of DG t with those of CG tti to calculate the best delay time(dt 1 , dt 2 ) at the biggest time-delay power cross-correlation coefficients of the other two CP stations(CG tti1 , CG tti2 ).
Step 1: using the station relative distance to find N CP stations near DG t as candidates(CG t1 ,CG t2 ,...,CG tN ).
Begin: importing the target DP stations DG t and the target date day t Step2: selecting 3 best CP stations(CG tt1 , CG tt2 , CG tt3 ) from candidate sets according to the following principles：(1)good quality of data,(2)nearest to DG t .

End
Step9:Revising P t0 with known daily electrical energy of DG t Step3: i=1 Step6: estimating the power curves(P ti ) of DG t using the time-delay power curves of CG tti1 and CG tti2 according to dt i1 and dt i2 .
Step8:estimating the power curves, P t0 =(P t1 +P t2 +P t3 )/3 In step5, we use dt1, dt2 and spatial geometric characteristics to calculate the best delay time, dt11 and dt12, at the biggest time-delay cross-correlation coefficients between CGtt11, CGtt12 and DGt. The spatial geometric characteristics of stations CGtt1, CGtt11 and CGtt12 are shown in Fig. 6. We can calculate dt11 and dt12 by (8). The meanings of variables in the formula are shown in Fig. 6. In step6, we estimate the power curves Pt1 of station DGt by (9). In the formula, Pt11(t+dt11) and Pt12(t+dt12) respectively represent the power curves after the time We can get Pt1, Pt2 and Pt3 at the end of the cycle. In step8, we can calculate the power curve of station DGt by (10).

Evaluation indicators of data reconstructing effect
We can't get the real power curves of DP stations whose power data is large-scale missing, so we can't evaluate the data reconstructing effect in the actual situations. However, in order to evaluate the data reparing method proposed by us, we assume that we don't know the power curves of the DP station and use the data reparing method mentioned before to get the power curves. Then we use the actual power curves and the reparing power curves to calculate some indicators to evaluate the data reconstructing method, including daily mean absolute error, daily mean relative error and daily mean capacity relative error. Daily mean absolute error(DMAE) can be calculated by (12). In the formula, the M represents total number of data points in a power curve. The Pact and Prep respectively represent the actual power curve and the reparing power curve. 1  3 Case study

Simulation data
The simulation data originates from 8 CP stations and a DP station in Hebei Province, China, including their capacity, daily power curves, daily electrical energy and geographical position. It should be noted that the power curves of the DP station are only used for the evaluation of the data reconstructing but not for the reconstructing. Their basic information is shown in Fig. 1.

Simulation analysis
In order to evaluate the effect of data reconstructing, we propose some evaluation indicators and design two cases, whose explanations are following.
Case1: only using the power curves of the nearest CP station to reconstruct the power curves of the DP station according the capacity conversion.
Case2: using the data reconstructing method proposed in this paper based on the temporal and spatial characteristics between the power curves of the CP stations and those of the DP station.   7 shows the simulation results of case1 and case2. Fig. 7(a1) shows the best effect of data reconstructing of case1 and Fig. 7(a2) shows the best effect of data reconstructing of case2.
We can know that case 1 and case2 can get the great effect of data reconstructing no matter in the best situation of case1 or in the best situation of case2. But actually, case2 can get the better effect of data reconstructing according to the evaluation indicators, which are shown in Table 2 and Table 3.  Fig. 7(b) shows the worst effect of data reconstructing of case1 and case2. Case1 and case2 get the worst effect of data reconstructing in the same day. Specifically speaking, case2 can get the better effect of data reconstructing according to the evaluation indicators, which are shown in Table 4. We compares the DMCRE of case2 with those of case1 in different days in Fig. 8. We can know that case2 can get smaller DMCRE and show the better effect of data reconstructing in most of days.