Application of high-performance computing for determining critical components of an energy system

. This article presents a package for analyzing the energy system vulnerability developed with new technology for continuous integration, delivery, and deployment of applied software. It implements a framework that allows combining and optimally using various methods for modelling energy systems and provides the comprehensive assessment of their vulnerability with regard to various uncertainties. The essential principles to identify and rank critical elements of an energy system are considered in the article. The investigations made with the package shown that the principles seem to be logical for the subsequent construction of the invariant set of measures for improving the energy system resilience.


Introduction
Currently, there is an increasing interest in the study of the ability of an energy system to survive when faced with the large disturbances. Their probability is small or unknown, but their impact leads to that the energy system cannot perform its functions without additional supporting measures. In addition, reaction of the energy system to such disturbances, the consequences for consumers, compensation for negative consequences and the system restoration process are studied [1].
One of the major problems in studying energy system resilience is a number of uncertainties arising from incomplete knowledge about the conditions and time of a disturbance, the system's response to a disturbance, a disturbance magnitude and scale, etc.
The resilience is considered as an ability of a system to resist disturbances, preventing their cascading development with the mass violation of consumers supply and recovering after their impact [2]. These steps are schematically shown in Fig. 1.   Fig. 1. System performance under a high impact disturbance.
The function F(t) reflects the overall system performance at a time t.
At the time t0, its value is F0. From time t0 to t1, the system is in a stable state and is prepared for the predicted perturbation.
At the moment t1, a disturbance occurs, the system performance drops to the value F(t2), and until the moment t3, the system tries to adapt to the disturbance impact and its consequences.
So, in the time period from t1 to t2, a system tries to absorb the disturbance, and from t2 to t3 it actively resists the disturbance and mitigates its consequences be means of an efficient resource allocation.
Finally, starting from the moment of t3 the system tries in various ways to restore its performance to a certain acceptable level F(t4) [3].
Starting from the moment t4, the system continues to increase its performance, improves its resilience according to the plans made on the basis of the received experience, and prepares for new disturbances [4].
The stages of planning, preparation, absorption and resistance represents that the system adapts to the requirements of the new situation, and the recovery stage returns the system to normal operation. In other words, the resilience is the system's ability to adapt to various large disturbances and recover to the state in that the system was before their impact [5].

Studying energy system resilience
The resilience research is based on the study and analysis of system adaptation and recovery capabilities [6]. The modern scheme of studying energy system resilience is discussed in detail in [7].

Vulnerability concept
The vulnerability ( Fig. 1) represents the size and scale of negative consequences for a system which are result of the impact of a particular disturbance [7]. The vulnerability analysis plays central role in the resilience research [8]. The main purpose of the vulnerability analysis is to identify drawbacks in the system design and control mechanisms, which could contribute to the spread of a large disturbance over the system itself and, also, over interconnected systems [8].
The vulnerability analysis involves the following types: global and spatial, as well as the search for critical elements [7].
The global vulnerability analysis is aimed at obtaining general information about the impact of disturbances on the system performance and is carried out by modeling a series of disturbances with gradually increasing degree of impact. Such computational experiments allow determining the threshold values of the impact for certain disturbance classes [7].
The search for critical elements is focused on determining a component or combination of components which failure causes the biggest decrease of the system performance [7]. The key point here is to detect all, even unexpected, combinations of critical elements [9].
The spatial vulnerability analysis focuses on finding critical geographical areas where the system components are located in some proximity to each other and are affected by spatially distributed large disturbances, such as natural disasters [6]. Several different areas may be affected at once [10].

Modelling energy systems
Modelling energy systems for the resilience research is different for separate and interdependent ones [11]. In the first case, an energy system is usually modelled at more detailed level. In the second case, more aggregated representation of energy systems can be used and the relationships between their components must be taken into account [12]. According to one of the widely used classifications proposed in [13], there are the following categories of interdependencies: • Physical, representing the flow of a resource from one system element to another, • Communication for transmitting status and control data, • Spatial (geographical) connections [10], • Logical relationships that are not included in any of the above categories.
In addition, there is fifth type of relationships called social. It describes the impact of human behaviour on the system components [14].
Taking into account the structural and dynamic complexity of the systems, the relationships between systems and existing uncertainties, it is emphasized in [8] that the integration of various methods and approaches to modelling systems allows to assess their vulnerability from different points of view (topological and functional, static and dynamic).

Current challenges in studying energy system resilience
The current state in studying energy system resilience is characterized by the following difficulties: • Focus on the consideration of separate energy systems and insufficient attention to the study of the relationships between them [15], • Lack of frameworks that allow combining and optimally using various modelling methods for complex systems such as energy systems for a comprehensive assessment of their vulnerability with regard to existing uncertainties [8], • Processing and analysis of large data sets that arise due to the combinatorial nature of most problems of energy system resilience studies, • Need to manage multiple computational experiments and conduct them in an acceptable time.
The last two problems can be solved using highperformance computing.

High-performance computing in studying energy system resilience
Using high-performance computing in the study of the functioning and development of energy systems is considered in [16,17].
The search for critical elements is used to assess an ability of power systems to withstand various combinations of element failures based on the system state assessment. High-performance computing allows evaluating the failures consequences not only of particular elements, but also of their various combinations in acceptable time [18].
Major disturbances in the power systems usually start with a primary disturbance (short circuit in transmission lines due to uncut trees, incorrect operation of protection devices, bad weather), followed by a chain of cascading events. Chains of events that lead to large disturbances are usually long and complicated, so the work on their detection because of their complexity can take months. Here, high-performance computing also makes it possible to speed up the consideration of a significant number of combinations of possible events and perturbation scenarios [19].
If the number of components combinations under consideration is large for the use of combinatorial methods, then simulation modelling is used [8], including Monte Carlo methods focused on highperformance computing [20].
Graph theory is widely used in the vulnerability analysis [8]. There are many software libraries for working on large graphs [21]. Some of them are implemented on the basis of parallel and distributed computations [22].
The study and analysis of the energy system adaptation and recovery capabilities is usually implemented on the basis of mathematical optimization packages [23], which have built-in tools for organizing high-performance computing. In studying energy system resilience, parallel computations are mainly used for calculating large optimization tasks, such as an energy distribution over real energy system networks [16,21] or resource allocation when planning an energy system recovery. Distributed computing is used in vulnerability analysis, where problems are mostly combinatorial in nature, and their solution is quite easily scaled.

Package for analyzing the energy system vulnerability
We developed applied software for analyzing the energy system vulnerability using special tools for creating subject-oriented heterogeneous distributed computing environments. Applying these tools, we implemented the means for analyzing the energy system vulnerability as a distributed applied software package.

Environment
Subject-oriented heterogeneous distributed computing environments can integrate cloud and grid platforms, including resources from public access supercomputing centers. The main components of environments are PCclusters or HPC-clusters. In addition, each environment can include various computational servers, PCs and data storage systems.
Dedicated cluster nodes are used within cloud and grid platforms. At the same time, non-dedicated cluster nodes are used as shared computational resources. When users of environments solving problems, they are given the capabilities to use both the dedicated and nondedicated nodes.
The aforementioned tools support specialized technology for automating the process of solving largescale scientific problems. This technology supports the following main operations: • Extracting subject information from weakly structured sources and converting them into target data structures of packages [24], • Development, modification, and joint applying of applied software for solving different classes of problems, • Continuous integration, delivery, and deployment of applied software in both the dedicated and non-dedicated nodes [25], • Automation of the construction and execution of problem-solving plans, • Visualization of the obtained computation results on electronic maps, • Multi-agent dispatching of computations in a heterogeneous environment [26].
We create subject-oriented environments using the Orlando Tools framework [27]. The Orlando Tools framework implements an advanced modular approach to the development and use of a specialized class of scalable scientific applications (distributed applied software packages [28]).
During environment creation and package development, Orlando Tools provides users with ample capabilities for describing the subject domain model, including both the text and graphical languages for its specification and problem formulations on this model. Problem formulations can be implemented in procedural and non-procedural forms. In the latter case, the synthesis of a problem-solving plan (abstract program) is automatically performed.
A problem-solving plan is a kind of abstract workflow [29]. Resources allocation is carried out at the stage of dispatching computations. A plan generated from a procedural problem formulation can include control constructs for branching, looping, and recursion.
Users form computational jobs to execute problemsolving plans in the environment.
A self-organizing hierarchical multi-agent system with several levels of agents' operation implement dispatching of jobs in the environment [30]. Agents represent resources in the environment, implement of resource monitoring, recognize job properties, and distribute jobs across resources.
Within the multi-agent system, agents can play various roles and perform different functions corresponding to particular roles. Roles can be permanent or temporary. Temporary roles arise at discrete moments in time in the process of local interactions of agents.
Agents are autonomous entities. However, they can unite into virtual communities of agents.
Within the framework of virtual communities, agents cooperate in execution a common job. At the same time, they compete to distribute the computational load for their resources.
A distribution of the computational load is carried out by means of a specialized tender of computational works. This tender is based on the Vickrey combinatorial auction [31]. The computational load is calculated using special models for predicting the runtime of problemsolving plans [32].
A subject-oriented environment for analyzing the energy system vulnerability provides a set of services for preparing subject-oriented data, implementing computations, generating electronic maps, and visualizing the computation results on the generated maps.

Package
We developed the package for analyzing the energy system vulnerability. Its subject domain model includes 22 parameters (p1-p22) and 7 modules (m1-m7). The modules represent applied software of the package.
The modules was developed based on the new technology of the Orlando Tools framework for continuous integration, delivery, and deployment of applied software. Testing of modules in environment nodes within the framework of this technology ensured high reliability of computations. This provided a significant reduction in the time of the experiments.
On this model, we constructed three problem-solving plans (workflows). Plan 1 creates critical element sets of a specific size, simulates the simultaneous element failures for all sets, and evaluates the consequences of these failures (Fig. 2).
In Plan 1, the modules m1-m3 modules can be executed in parallel. The module m4 is designed to perform parameter sweep computations. Instances of this module are executed with different sets of inputs.
In dispatching jobs, the Orlando Tools framework provides a proportion distribution of the computational load caused by the processing instances of the module m4 between the agents representing the resources of the environment. Thus, in comparison with well-known workflow management systems, Orlando Tools does not consider each instance separately.  This significantly reduces the combinatorial complexity of the tender of computational works.
Plan 2 implements carrying out Plan 1 in a loop with element sets of differing sizes (Fig. 3).
Plan 3 forms electronic maps for the selected sets of failed elements and publishes these maps using geoinformation services (Fig. 4).
A detailed description of all parameters and modules of the package is given in [27].

Search for critical elements
The search for critical elements procedure is implemented as follows. First, based on the idea of vulnerability analysis from various points of view [8], several indicators are selected to assess the performance of the energy system under study. Next, the disturbances affecting the system components are modelled, and the system performance drop is measured by indicators. The components are sorted on the base of the measurements made, and multi-criteria analysis can be applied here.
In the package for analysing energy system vulnerability, the problem of generating disturbance scenarios is solved by constructing failure sets [9]. Each of them is a combination of the energy system network elements where a failure might occur. For practical reasons, the size of a failure set should not exceed 3 or 4, since the number of possible failure sets increases rapidly.
One of the advantages of the approach [9] is the identification of hidden elements. The single failure of a hidden element produces negligible impact on a system, but a combination with other elements might have synergistic effect and can cause significant damage to a system.
In addition to the deterministic approach [9], the package implements a stochastic approach to determining critical elements [33].
During 2018-2019 the package was used to identify critical elements of the Unified Natural Gas Supply System of Russia. Its network contains 382 nodes, including 22 underground natural gas storages, 28 sources (in the system model they are represented by head compressor stations), 64 consumers, and 268 key compressor stations, as well as 628 arcs representing main natural gas pipelines, their corridors and branches to the distribution networks.
The results of the calculation conducted with the package have shown that potential gas shortage for consumers exists if any of the 441 components of the Unified Natural Gas Supply System of Russia (242 nodes and 199 arcs) is failed. The threshold for critical elements was a potential gas shortage of 5% of the total demand. 61 components have exceeded limit. These components were formed the list of the natural gas industry's critical elements at the federal level. Among these components there are 25 arcs between key compressor stations and 36 nodes, including 30 key compressor stations, 5 head compressor stations, and 1 underground storage.
Then, 207690 failure sets of size 2 were calculated by means of the package. These failure sets did not include critical elements found earlier. Experts have identified 2865 pairs of components, the failure of each can lead to a shortage of 5% of the total demand and higher. After modelling certain resilience improvement measures on the package, the number of the pairs has been reduced to 2516. As a result of ranking the pairs 20 pairs were selected, the failure of which can lead to a shortage of 10% of the total demand and more.

Conclusions
The package for analyzing the energy system vulnerability has been developed with the new technology for continuous integration, delivery, and deployment of applied software. It implements a framework that allows combining and optimally using various methods for modelling energy systems for a comprehensive assessment of their vulnerability with regard to various uncertainties.
The package aims to overcome the following challenges in the field of energy system resilience research: • Processing and analysis of large data sets that arise due to the combinatorial nature of most problems of energy system resilience studies; • Need to manage multiple computational experiments and conduct them in an acceptable time.
The investigations made with the package shown that the principles to identify and rank critical elements of the Unified Natural Gas Supply System of Russia seem to be logical for the subsequent construction of the invariant set of the resilience improvement measures for the appropriate energy systems.