Development of Digital Twin for Load Center on the Example of Distribution Network of an Urban District

. The paper proposes a concept of building a digital twin based on the reinforcement learning method. This concept allows implementing an accurate digital model of an electrical network with bidirectional automatic data exchange, used for modeling, optimization, and control. The core of such a model is an agent (potential digital twin). The agent, while constantly interacting with a physical object (electrical grid), searches for an optimal strategy for active network management, which involves short-term strategies capable of controlling the power supplied by generators and / or consumed by the load to avoid overload or voltage problems. Such an agent can verify its training with the initial default policy, which can be considered as a teacher’s advice. The e ﬀ ectiveness of this approach is demonstrated on a test 77-node scheme and a real 17-node network diagram of the Akademgorodok microdistrict (Irkutsk) according to the data from smart electricity meters.


Introduction
Innovative and structural changes in urban electric grids, their increasingly close interaction with the transport system, and the service sector determine the trends and related research on the development of concepts for the "smart neighborhood" with a subsequent transition to the "smart city" [1]. The advent of digital electricity meters and the development of telecommunications and elements of smart electrical grids made it possible to increase flexibility, optimize consumption, and reduce energy losses in urban electrical networks by using various adaptive solutions. It is becoming increasingly clearer that smart neighborhoods must be able to leverage the enhanced monitoring and flexibility of the electrical grid through the intelligent operation of distributed multi-energy resources (heat, electricity, gas) in combination with automation infrastructure and information and communication technologies.
The digital twin technology can become an effective solution to this issue. This technology is understood as a virtual prototype of a real object, which allows conducting experiments and testing the hypotheses, predicting the behavior of an object, and solving the problem of managing its life cycle. The digital twin of electrical networks is a mathematical model of electrical networks implemented based on special software. It is capable of assessing the reliability of power supply to a smart neighborhood and identifying vulnerabilities in its electrical network, developing and visualizing various scenarios for the network development [2].
In 2019, the Irkutsk Scientific Center of the Siberian Branch of the Russian Academy of Sciences launched a * e-mail: tomin.nv@gmail.com project for installation of smart power meters to enable more detailed and accurate monitoring of various parameters of power consumption in the electrical networks of the housing stock of a district in the city of Irkutsk (Akademgorodok) [3]. The obtained data are planned to be used to create a digital twin of the district electrical network. This technology will make it possible to more effectively fulfill some tasks related to the power system operation (power consumption monitoring, network optimization, minimization of power losses, modeling and forecasting of various scenarios of network operation, and others), and development (assessment of various forms of consumer activity, reconstruction of the current network infrastructure, the appearance of new elements of system flexibility in the near future).
The paper proposes a concept of building digital twin based on reinforcement machine learning methods that allow implementing an accurate digital model of an electrical network with bidirectional automatic data exchange, used for modeling, optimization, and control. In this case, the data transmitted from the digital twin are control actions. The data sent in the opposite direction are either state updates or feedback signals. Since the digital twin tracks all information about the analyzed electrical network, changes in the state of the system are to be transmitted to it for synchronization. Feedback signals that reflect the correctness of control actions are considered as a variant of state updates.

Digital twin concept for power grid through reinforcement learning
an ultra-realistic digital model of a product or system with bidirectional automatic data exchange used for simulation, optimization, and control. Such a dynamic virtual model mirrors in real time a complex physical system in production from a certain perspective, for example, electric power network online analysis, and has built-in intelligence to address the associated concerns, for example, power grid security assessment. However, various problems associated with the development, updating, and application of digital twins in the energy sector have not been solved yet and turn to be the subjects of intensive research [5]. The technologies, such as Generative Design [6] that allow one to automatically find the optimal design solutions for power supply [7], are developed very slowly. These problems are particularly acute for the life cycle of the small consumers' power systems of low voltage levels (0.4 kV), which usually have neither powerful software nor highly qualified staff.
As repeatedly pointed out in the literature on digital twins, machine learning and artificial intelligence could realize those improvements through learning expressive nonlinear models. Jaensch et al. [8] present a generic concept for incorporating learning methods into Digital Twins. Wang et al. [9] and Sapronov et al. [10] tune parameters of the digital twin through machine learning.
We propose using an approach based on reinforcement learning to adapt the digital twin's control policy derived from erroneous models, developed in [11] and modified in our paper. We show its application to a case of a real distribution electric system using Active Network Management (ANM) as a core of digital twin.

Reinforcement learning
Reinforcement learning (RL) is inspired by the way humans learn. The learning agent observes the state x i ∈ X of the environment, decides on control action u i ∈ U, which alters the state, possibly receives a reward r according to some reward function R(x, u), and observes a new state x t+1 of the environment. Over time, it will learn to distinguish good actions from bad ones. More formally, the underlying model is a Markov Decision Process (MDP).
In this study, we consider the RL problem formulation, which may be free of dynamics, i.e., if the system state depends on a particular part under processing but, regardless of the selected action for that part, the next state is already determined. In this formulation, at each round t, the environment prepares state , the learner selects action and receives reward r i . The next state x t+1 prepared by the environment is unrelated to x i and u i . The agent aims to learn policy π : X → U that satisfies the optimal value function: Some studies show [5] that reinforcement learning addresses the focal issue of improving digital twins through learning. The advantage of this method is that the created virtual environment can go through an infinite number of repetitions and scenarios in order to train agents remembering all the situations that have arisen and the ways out of them that gave the maximum reward. This approach allows for the specifics of distribution networks with a large number of components, and this number can only increase in the case of the network transformation into an active network (for example, the emergence of renewable energy sources, storage devices, active loads).

Enhancing digital twin algorithm
Based on the reduced RL problem formulation, we modify the algorithm for enhancing digital twins for power grids, proposed in [11]. For ease of understanding, Fig  The digital twin observes state x i and decides on control action d i based on its default policy π d . The RL algorithm observes, both, x i and d i . It decides then whether to apply d i or u i = π a (x i ) to the physical system G (power grid). The system then generates a feedback signal (reward) r i and a next state x t+1 , which is observed by the digital twin. Reward is used to improve the RL agent policy π a .
With the digital twin, we have access to the default policy π d that can be regarded as a teacher's advice. The default policy π d is an original control policy of the digital twin before we apply machine learning to compensate for model inaccuracies. This default policy may be suboptimal, but arguably superior to the agent's policy π a in the initial learning period. The default policy adds each round applied by the agent, i.e., followed, to a budget for exploration. Once a sufficient amount is accumulated, the agent may explore actions differing from the default under the risk of performing worse.

Active network management
With the increasing share of renewable and distributed generation in electrical distribution systems, Active Network Management (ANM) becomes a valuable option for a distribution system operator (DSO) to operate the system securely and cost-effectively without relying solely on network reinforcement. ANM strategies are short-term policies that control the power injected by generators and/or taken off by loads to avoid congestion or voltage issues. While simple ANM strategies involve curtailing temporary excess generation, more advanced strategies tend to shift the consumption of loads to anticipated periods of high renewable generation. However, such advanced strategies imply that the system operator has to solve largescale optimal sequential decision-making problems under uncertainty [12].
We state these problems as MDP, where the system dynamics describe the evolution of the electrical network and devices, while the action space encompasses the control actions that are available to the DSO. Therefore in our study, we consider the ANM model as an RL agent that aims to learn policy π a in the digital twin (Fig. 1).

Operational planning problem statement
Operational planning is a recurring task performed by the DSO to anticipate the evolution of the system, which is an impact of the injection and the consumption patterns on the operational limits of the system and make preventive decisions to stay within these limits. We describe this evolution by a discrete-time process having a time horizon T, the number of periods used for the operational planning phase. The period duration is 15 minutes, by analogy with a typical market period. The power injection and withdrawal levels are constant within a single period, and we neglect the fast dynamics of the system, which may be handled by real-time controllers [13]. The control actions are aimed at directly affecting these power levels and can introduce time coupling effects, depending on the type of device.
We now describe two control tools of the system: the modulation of the distribution generation (wind, solar, and others) and the modulation of the flexible demand (heat pumps, electric vehicle, and others) proposed in [5]: 1. Curtailment of a distributed generator. For each device belonging to the set G ⊂ D of distribution generators (DGs), the DSO can impose a curtailment instruction, i.e., an upper limit on the generation level of the DG. This request can be performed until the period immediately preceding the one related to the curtailment, and it is acquired in exchange for a fee. This fee compensates for the producer's financial loss associated with the energy that could not be produced during modulation periods. We assume that this fee is defined as a per-unit compensation for the energy not produced, with respect to the actual potential known after the market period.
2. Modulation of flexible loads. We also consider that the DSO can modify the consumption of some flexible loads, subset F of a full set of loads of the network. An activation fee is associated with this control tool, and flexible loads can be notified of activation until the time immediately preceding the start of the service. After the activation is performed at time t 0 , the consumption of the flexible load d is modified by a certain value during T d periods. For each of these modulation periods t ∈ {t 0 + 1, . . . , t 0 + T d }, this value is defined by the modulation function There are other approaches to control the system, such as modulating the tariff signal(s), affecting the topology of the network, or using distributed storage sources, which are not considered in this research. Nor do we model the automatic regulation devices that often exist in distribution systems, such as On Load Tap Changers of transformers, which automatically adapt to control the voltage level.

Optimal decision-making formulation
We formulate operational planning as an optimal sequential decision-making problem. The uncertainty of future power injections from DGs relying on natural energy sources and the variability of power consumption of the loads should also be explicitly taken into account in the ANM strategy. Therefore, we model this problem as an MDP with mixed-integer sets of states and actions.

System state
The global state space S of the system is decomposed in three subsets: where S (1) , S (2) , S (3) -state subsets of distribution generation, consuption and past realizations of the uncertain phenomena (i.e. wind speed, solar irradiance, and consumption levels). The power injections of the devices are sufficient to obtain the value of the electrical quantities through equations (3) and (4). ∀n ∈ N: S n = P n + jQ n = V n I * n = V n Y * n· V * ∀n ∈ N: S n = P n + jQ n = d∈D(n) where S n is the apparent power injection at bus n and Y n , denotes the n th row of the nodal admittance matrix; P d , Q d -active power and reactive power injection values, which associated with every device (generators or loads) d ∈ D(n)

Control actions
The control tools available to the DSO to control the system are modeled by the set A s of control actions. This set depends on the state s t of the system because it is impossible to activate the flexibility service of a load if it is already active. The components of vectors a t ∈ A s are defined by a t = (p t ,q t , act t ) withp t ,q t ∈ R |G| such that, for period t + 1 and for each of the generators g ∈ G, By using this representation of the control actions, we consider that a curtailment or flexibility activation action targeting period t must always be performed at the period t − 1.

Reward function and goal
To evaluate the performance of a policy, we first specify the reward function r : S × A s × S → R that associates an instantaneous reward for each transition of the system from a period t to a period t + 1: curtailment cost of DGs where C curt g (g t + 1) is the one fourth of the day-ahead market price for the quarter of hour q t + 1 in the day and C f lex d is the activation cost of flexible loads, specific to each of them; P g,t is a power curve of the DG. The function Φ aims at penalizing a policy that leads the system to an undesirable state (e.g., that violates the operational limits or induces many losses) and, together with C curt g and C f lex d , it must be determined when instantiating the decision model. Note that this equation is such that the higher the operational costs and the larger the violations of operational limits, the more negative the reward function.
For a DSO, addressing the operational planning problem is equivalent to determining an optimal policy π among all the elements of Π, i.e., the policy that satisfies the following condition: The purpose of the first term in the penalty function is to be an incentive to prevent the policy from bringing the system to a state that violates operational limits. This definition allows evaluating any kind of policy. In a mathematical programming setting, we remove this term from the objective function and add operational constraints. The new objective function becomes: where C loss (q t+1 ), C f uel (q t+1 ) are per-unit prices of losses and fuel pour the quarter of hour q t + 1 in the day.
Given the discretization of the stochastic processes, the objective function defined in Eq. (8), and the additional constraints, we can formulate a new approximate optimal policyπ * M

Experimental calculations
We describe below the real and test instances of the ANM problem used in the results section. The implementation is has been done using the modified Python code available from [14] to simulate the system and Pyomo [15] to build the mathematical programs. Table 1 summarizes some relevant data on these instances. This section aims to illustrate the operational planning problem and show the test and real instances. In particular, the policyπ * M t (s t ) calculated by (7)-(8) was applied to every instance and penetration level of the flexible loads. The empirical expected return of the policy, for a given test instance, level of flexibility, network model, and scenario tree complexity, is determined from 10 runs of 288 time steps (i.e. of 2 days).
We also consider that the per-unit curtailment prices are the same for all DGs. We used real values of market prices C curt g from [12], which fluctuate in domestic currency equivalent from 2,100 to 4,200 rubles for one reduced MW of distributed solar generation power. We also use these values for the per-unit cost of losses, i.e. C loss (·). Concerning flexible loads, three different penetration levels exist for each test case. For every configuration, about half of the flexible services offer a downward modulation, followed by an upward rebound effect, and inversely for the other half. The maximal and cumulated modulation magnitude is presented in Table 1 to illustrate the potential offered by flexible loads in every configuration.

Akademgorodok Case17
The main object of the ANM strategy research is a 17node 6 kV electrical network of the residential area of the Akademgorodok microdistrict of Irkutsk ( Figure 2). In 2019, smart electricity meters were installed in 60 multistory residential buildings of this area. The meters were integrated into a single data collection system. This system allows collecting hourly, daily, weekly and monthly data on electricity consumption for each residential building. In addition, one can selectively control a large number of load flow parameters. One of the objectives of this project is to create a model of the digital twin of the Akademgorodok load center using this database. At the moment, the electric network of Akademgorodok does not contain distributed generation sources and load-controlled consumers. However, given the implementation of several national programs, such as the federal projects "Smart City" and the "Demand management of retail consumers" from EnergyNet and JSC "SO UES", the study focused on a prospective scenario for the development of the considered load center, associated with the emergence of distributed generation and flexible consumers capable of managing their demand.
DGs were represented by the options that involved solar power generation plants and hybrid generators using both solar energy and biomass gasification [16]. Solar generation and biomass are promising sources of renewable energy for the Irkutsk region, and the most efficient plants for the development of green technologies in urban areas [17]. In the studied scheme, controlled loads implied consumers who potentially have flexible technical capabilities to manage their demand (for example, electric vehicles and heat pumps).
We model an aggregate set of devices that are assimilated to a single connection point at the 6 kV MV grid (residential consumers and solar panels). At such nodes, a set of residential loads and a set of distributed solar units To determine the curtailment for the next time period, we assume that all the curtailable generators will operate at the active power upper limit P max . This limit is the decision variable that we wish to compute at each time step. As there is a cost per curtailed MWh, we must determine the largest P max that enables operational constraints to be met: where P (k) exo is, for sampled fututure state k ∈ 1, . . . , Ntra js is, for a sampled future state k ∈ 1, . . . , Ntra js, the overall active power balance neglecting the injection of the curtailable generators. The solution to this linear program is straightfoward: P max=min kP −P (k) exo Ncurt . As seen from Figure 3, modeling the scheme of distributed generation leads to an increase in the voltage above the set security limits. We now simulate this policy on a run of 2 days, and then compare with the same simulation run without policy ( Figure 4).
As the presented graph 4 shows, the found optimal control strategy successfully prevents voltage rise by reducing part of generation from distributed solar generators. Figure 5 shows the operating costs. One can also see the peak associated with generation reduction. The remaining values of costs in Figure 5 are determined by the active power losses C loss () and the costs of biomass combustion C f uel () (the price of wood pellets was taken equal to 7 rubles per 1 kW) At the same time, this study did not involve the service for activating flexible loads for the network of the Akademgorodok microdistrict, since currently, there are no developed procedures for managing the demand of small retail consumers in Russian electrical networks. In the next example, however, a test 77-node distribution network scheme demonstrates an option with the ANM model implementing both actions at once: distributed generation reduction and demand management of flexible loads.
It's important to note loads cannot be modulated in an arbitrary way. There are constraints to be imposed on the modulation signal, which are inherited from the flexibility sources of the loads, such as an inner storage capacity (e.g. electric heater, refrigerator, water pump) or a process that can be scheduled with some flexibility (e.g. industrial production line, dishwasher, washing machine).

Test Case77
The test DN is based a 77-bus radial test system [18], which includes 6 curtailable wind generators and also noncurtailable residential photovoltaic panels. In this example, we can test different levels of flexible loads expressed in three different levels of penetration (Table 1). For each configuration, about half of the flexible services involve down-modulation, i. e. a decrease or shift in consumption. The duration of the modulation signals for the 77node scheme is from 6 to 24 time periods. The modeling results are shown for one day in Figure 6.
Experiments show that the policy slightly benefits from an increase of the flexibility level of loads in the deterministic setting but not in the presence of uncertainty.

Conclusions
The advent of digital interval electricity meters and the development of telecommunications and elements of smart grids in recent years have offered an opportunity to increase elasticity, optimize consumption, and reduce energy losses in urban electrical networks through the use of various adaptive solutions. These are the targeted impact on  consumer equipment and (or) change in the operating conditions of the electrical grid in real time, when necessary. Currently, these tasks can be effectively fulfilled using the concept of digital twin. This paper has shown that digital twin, used for control of power grids, can be adapted through RL. Based on that, we propose to use the learning digital twin algorithm for enhancing the control policy of digital twins in continuous domains. We also propose we consider ANM model as an RL agent that aims to learn a policy π a in the digital twin. The effectiveness of this approach is demonstrated on a test 77-node scheme and a real 17-node network diagram of the Akademgorodok district, which is undergoing the stage of active smartization and digitalization.
The ANM concept is an alternative or addition to network reinforcement in the event of the massive integration of renewable energy sources and demand management in distribution systems in the near future. Mathematically, operational planning, which is a preventive version of ANM considered in this paper, is the optimal problem of successive decision making under uncertainty. The properties of the operational planning problem are the need for optimization over a sufficiently long time horizon to take into account the uncertainty of generation and consumption, and modeling of discrete decisions related to the modulation of flexibility services. In an attempt to go beyond one problem-solving method, we formulate this problem as a MDP that does not require a specific way of problem-solving.