Information Entropy-based Edge Importance Identiﬁcation of Road Network: A Case of Highway in Sichuan Province

. Due to the severe damage and huge loss caused by natural disasters to road networks, the protection of the road network is essential. Edge importance identiﬁcation can help preserve the road network by protecting key edges. This paper proposes a new network performance measure method and introduces a new edge load redistribution method in cascading failure model. To identify di ﬀ erent edge importance in the network, this paper proposes three edge importance evaluation metrics, including information entropy of degree values, information entropy of iterative factors and two-dimensional evaluation metric based on the Pareto non-dominated set which combines two single metrics. A case study of highway road in Sichuan province with 204 nodes and 322 edges which was a ﬀ ected by Luding Earthquake is conducted to demonstrate the best one of the three metrics, including data from the Department of Trans-port of Sichuan Province. The ﬁnal results of the chi-square test and Kendall’s correlation coe ﬃ cient comparing the importance ranking of the three metrics with the ranking derived from the network performance assessment model indicate that the two-dimensional evaluation metrics have the best performance and that the road network tends to collapse at the same time when attacked against the road network under di ﬀ erent edge rankings, suggesting that the e ﬀ ect of cascading failures should be limited early.

As a crucial part of the urban infrastructure network, efficient traffic and road network infrastructure plays the most important role in the development of any region [2], and in the traffic and road networks, complex networks are often generated due to the characteristics of the traffic network infrastructure performance itself [3]. The cascading failure response of the traffic road network also tends to cause multiple systemic damages to the road network, and by improving the performance of the road network, the risk of complete failure of the road network in the face of disasters can be reduced. Transportation road network development is also one of the important directions of China's transportation strategy deployment. The Outline for the Construction of a Strong Transportation Country clearly proposes to build a modern high-quality comprehensive three-dimensional transportation network, establish a natural disaster transportation prevention and control system, and improve transportation disaster prevention and resilience. The "National Comprehensive Three-dimensional Traffic Network Indicator Framework" issued by the Ministry of Transportation also proposes building a modernized and high-quality national comprehensive three-dimensional traffic network to enhance transportation's safety and reliability. However, identifying the critical sections in the road networks is important to enhance the resilience of the transportation network system. This paper aims to develop an information entropy-based edge importance identification metric for road networks, which could be helpful to control the cascading failure and provide policy suggestions to develop the rescue and recovery strategies.
The rest of the paper is organized as follows: Section 2 makes a review summary of the previous literature, Section 3 makes a statement on the cascading failure model and performance assessment model for the main problem of this paper, Section 4 presents the importance evaluation metrics, and Section 5 conducts a case study using the highway network in Sichuan province. Finally, the conclusions and future research are given in Section 6.

Edge Importance Identification
Edge importance identification is the focus of research in the field of network science. Core edges in a transportation network directly affect the flow of the whole network; a line failure in a major section of a power network will bring down the whole network, and these are related to important edges in the network. There have been many studies to identify the edge importance. Omar et al. identified the edge importance of directed and undirected networks based on calculating the exponent of the matrix associated with the line graph of a given network [4]; Ouyang et al. proposed an edge importance identification method based on nearest neighbor connectivity by removing the edges in the network [5]; after considering the service layer, transport layer and physical layer of the network, Fan, Zeng and Tang defined the edge cross-layer importance (ECI), and then used the network-wide information entropy of edge cross-layer importance-edge cross-layer information entropy (ECE) as the evaluation index of the vulnerability of power communication network [6]. These studies evaluate the importance of edges in the network from different perspectives, as a way to find and identify the edges that have a stronger impact on various characteristics of the network.

Information Entropy
In 1948, Claude Shannon's paper "Mathematical Theory of Communication" was the first paper in the world to establish a mathematical model of the communication process, in which he proposed the concept of information entropy to solve the problem of quantitative measurement of information. To this day, information theory has had a huge impact in various fields. By measuring the entropy value of consumer-perceived information, Wang et al. predict consumer satisfaction [7]; Feng et al. use the K-means clustering method to group the information entropy sections of different branches in the route to quantitatively assess the collision risk of ships in the waters of the route [8]; Zhang and Chen based on information entropy and balance sheet data of banks, proposing a centrality index based on neighboring information entropy to identify banks with systemic importance status [9]. Quantitative analysis of qualitative problems through concepts such as information entropy can be more accurate and precise to analyze the problem. In the case of infrastructure networks, due to their complex network characteristics and variable topological features, their characteristics can be simplified through information theory.

Cascading Failure Model and Network Performance Measure
In this paper, the proposed metrics are verified by cascading failure model, and the load capacity model in cascading failure is used for simulation. During cascading failure, the load redistribution strategy is assigned based on the proportion of the degree values of neighboring edges, and the new network performance evaluation metrics are proposed to verify the different importance metrics.

Road Network Model
In this paper, the road network is specified as G and the total number of nodes is N. Its simple system graph can be expressed as . . , N denotes the set of edges in the road network G, e i j denotes the connectivity of node i and node j in the road network. e i j = 1, there are connected edges between node i and node j 0, there is no edge exists between node i and node j. (1)

Load and Capacity Model
In complex networks, edges with higher degree values of nodes at both ends of the edge are more important and are more prone to massive cascading failures. The initial load of an edge is considered to be closely related to the degree of the nodes at both ends of that edge, so the degree value k i j and the initial load L i j of the road network edge e i j are defined as [10]: where k i and k j denote the degree values of nodes v i and v j , respectively, and a denotes the initial load factor of the road network G.
Meanwhile, since the load and capacity in complex network systems are not purely linear, edges with relatively small capacity are also likely to have a large ratio of unused capacity, this paper defines the initial capacity of network edges as C i j denotes the initial capacity of the edge e i j and b denotes the load capacity coefficient of the sub-network G. From the above equations, by adjusting the load capacity coefficient, what can be realized is that the simulation study of the linear relationship between capacity and load of edges with different ratios in realistic complex network systems to construct a more realistic road network model.

Load Redistribution
Due to the loads in the road network can be transferred to each other, assume that at t moment the edge e i j fails, then at t+1 moment the load of the edge e i j at t moment will choose the adjacent edge for allocation, at this time the load of the failed edge 4 will be allocated to the surrounding edges that have not failed according to the degree values, and its allocation to the edge e pq proportion is : where γ i j denotes the set of neighboring edges of edge e i j and e mn is the edge in this set. After load redistribution, its load changes according to Eq. (1.5), and the load of edge e pq at t + 1 moment becomes L t+1 pq , i.e.
If at this point L t+1 pq > C pq , then the edge e pq = 0, i.e. the edge fails.

Cascading Failure Process
In the event of a natural disaster, the road network G is attacked, the load is redistributed, and a cascading failure ensues. The failure rules in this paper include.
(1) Overload failure: the edge e i j fails when the load on the edge e i j is larger than the capacity [11]; (2) Disconnected failure: the edge e i j fails when the edge e i j is out of the maximum connected branch of the sub-network.
According to the above failure rules, the detailed steps of the cascading failure model for simulating road networks are developed as follows: (i) At t=0, a natural disaster occurs and the road network G is attacked, resulting in partial edge failure; (ii) At t = 1, determine whether there is an edge failure in the road network G that is out of the maximum connected subgraph; (iii) For the failed edges in G, assign their loads to the neighboring edges according to Eq. (1.5) and Eq. (1.6); (iv) Judge the load and capacity relationship of the remaining edges in the road network G, find the failed edges, and if there are failed edges, proceed to step (iii) and let t = t + 1; (v) Repeat steps (iii) to (v) until there is no failed edge and the system reaches stability. figure 1 shows the cascading failure process of a simple road network.

Network Performance Measure
In this paper, the performance of complex networks is evaluated from the following two aspects: (1) the ability to resist risk; (2) the duration or impact level of risk reduction. The indicator indicating the resistance of the network is the number of edges lost after the network is stabilized, and in this paper, we define the indicator of the amount of loss of the road network at t moments as LS (t), expressed as where X (t) denotes the number of valid edges in the surviving maximal connected subgraph. Obviously, the smaller the value of LS , the more effective edges remain in the road network and the better the performance of the whole network system.
The measure of rapidity is defined as T , denoted as where T is the time for the system to reach a steady state, so t ∈ [0, T ]. Since both loss volume and rapidity are negatively related to network performance, this paper defines P as the network performance metric, denoted as where N = X (0) denotes the number of connected edges in the network in the initial state of t=0. Meanwhile, to eliminate the influence of experimental chance factors, n independent simulation experiments are conducted to obtain a set of measurements P 1 , P 2 , P 3 , ..., P n and T 1 , T 2 , T 3 .....T n , and the average value is found as follows:

Edge Importance Identification
In this paper, three importance evaluation metrics are proposed: information entropy calculation based on degree value and iteration factor and two-dimensional evaluation metrics based on the Pareto non-dominated set, where the degree value and iteration factor represent the degree of association and core of edges respectively.

Importance Ranking Based on Information Entropy of Degree Values
When evaluating the importance of edges based on degree values, it is important to consider not only the influence of the degree values of the edge itself on the network, but also the influence of neighboring nodes on the network. Eq. (1.11) represents the degree of influence of the edge e i j on the whole network DI i j , and the formula is where E denotes the set of edges of the whole network, e pq is the edge belonging to E,k i j and k pq denote the degrees of edges e i j and e pq , respectively.
Substituting Eq. (1.11) into the information entropy calculation formula, the information entropy of each edge can be obtained and the importance of each edge can be compared according to the entropy value.
where I i j denotes the degree of influence of edge e i j on the whole network, Φ i j denotes the set of neighboring edges of edge e i j and contains the edge itself, and e pq is the edge of Φ i j among them.

Importance Ranking Based on Information Entropy of Iterative Factors
In the K-shell algorithm, the network is decomposed based on K-shell to get the KS value of each node, which is used to evaluate the core degree of the node in the network [12], but in the process of K-shell decomposition is not accurate enough to measure the influence of nodes with large degree values but located at the edge of the network, so in the IE + algorithm, a new decomposition method is proposed to get the value of the iteration factor IT of a node is used to more accurately measure the core influence of the node in the network [13], and the algorithm steps are as follows: (i) Let IT=1.
(ii) Find the node with the smallest degree values in the network and record the value of the iteration factor IT of the node as 1.
(iii) Remove these nodes with the smallest degree from the network.
(iv) If the number of nodes in the network is 0, end the algorithm; otherwise IT+1, skip to step (ii).
After obtaining the value of the iteration factor IT of the nodes, the IT value of the edge e i j is expressed as the smaller value of the iteration factor IT of the nodes at both ends, which is given by IT denotes the information of the edge's position in the network, the larger the value of IT represents the edge's position in the network is closer to the core, based on the information entropy of degree values, this paper combines the iteration factor IT to calculate the core degree of edge e i j in the network IT I i j : Similarly, the information entropy of iterative factors of edges (ITE) is obtained by substituting Eq. (1.15) into the information entropy calculation formula, and the importance of edges in the road network is evaluated as follows: where IT I i j denotes the core degree of edge e i j in the whole network, Φ i j denotes the set of neighboring edges of edge e i j and contains the edge itself, e pq is the edge of Φ i j among them, the larger the entropy of iterative information finally obtained, the higher the importance of the edge.

Two-dimensional Importance Ranking Based on Pareto Non-dominated Set
In the above paper, the information entropy of degree values and information entropy of iterative factors were calculated by using the information entropy formula through the degree k and the iteration factor IT, respectively, and the two metrics measure the importance of edges from the degree of influence and the core degree, respectively, but in most cases, a single metric may not reflect the influence of edges on the network speech and behavior well. Therefore, this paper ranks the importance of edges based on the Pareto non-dominated set by combining the information entropy of degree values and information entropy of iterative factors in the network, and the advantage of this method is that it provides a comprehensive assessment of edges in the network using the Pareto non-dominated set instead of measuring them by a single metric [14].
To satisfy subsequent calculations, for all edges in the network, the information entropy of degree values and information entropy of iterative factors are normalized in this paper as follows: . (16) Establish a two-dimensional plane orthogonal coordinate system, DE as the horizontal axis, ITE as the vertical axis, each edge corresponds to one of the points, then the edge e i j is located at DE i j , IT E i j (DE, IT E ∈ [0, 1]) , then the point (1, 1) corresponds to the optimal value in the coordinate system, then the edge e i j The smaller the distance from the optimal value, the higher its importance, so in this paper, the two-dimensional performance value of the edge e i j is defined by Eq. (1.19) as Assuming that the edge e i j is a point in a two-dimensional planar orthogonal coordinate system, draw horizontal and vertical straight lines on the point e i j to divide the region into four parts: where T PV pq is the TPV of the Pareto non-dominated points of the point e i j . It can be seen that if there are more edges with higher TPV in the Pareto non-dominated set of edges in the network, the PNPS of the edge will be larger, but the less important the edge is, the lower it's ranking. Thus, the edges are ranked according to the PNPS of each edge in the order from highest to lowest.

Building Road Network
As one of the most frequent earthquake provinces in China, Sichuan Province has a total of eight small seismic zones in its regional area, which is an active area of crustal activity. And on September 5, 2022, a 6.8 magnitude earthquake also occurred in Luding County, Sichuan Province, which brought considerable damage to the surrounding traffic and roads. Therefore, this paper takes the high-speed road traffic network in Sichuan Province as an example, constructs a road network model, obtains the highway road network in Sichuan Province through ArcGIS technology, takes different cities as stations, and takes inter-city highways as connecting edges, and the traffic network is topologized to obtain a road network with earth.
According to the node degree distribution, the load of each road section of the road network is calculated using Eq. (1.1); according to the load, the capacity of each road section is calculated using Eq. (1.2). figure 3 shows the part of schematic diagram of the road network

Simulation Parameter
For road networks, damage to different edges can lead to completely different losses in the network, and this paper focuses on the changes in road network performance caused by damage to different edges. The analysis below sets the initial load factor to a=15 and the load capacity factor b=0.5, while n is set to 30 times for the experiment, so as to reduce the experimental error.
The edges in the road network are ranked from largest to smallest by the network performance loss (NP) of all edges in the network after cascading failure, and the results are shown in Table 1.1. Also in Table 1.1, the road network edges are ranked in importance using The ranking results (partially displayed) are obtained in Table 1.1:

Simulation Results
After ranking the highway traffic network in Sichuan province based on three importance evaluation metrics, this paper adopts the Chi − S quared Test approach to test the difference between the ranking by the actual network performance loss and the ranking by the other three importance evaluation metrics. For the subsequent calculation, this paper first groups the original ranking and rearranges the original ranking with 5 as intervals to get a total of 66 intervals, and then gets the ranking frequencies of different edges according to the frequency of ranking intervals in different metrics R DE , R IT E , R PNPS , R NP . Meanwhile, the null hypothesis in this paper is that there is no significant difference between the edge ranking of importance evaluation metrics and the edge ranking of network performance loss indexes, and according to the Chi-Square calculation of the goodness-of-fit test formula, the chi-square values of rankings between different groups can be obtained as follows: where Metric = DE, IT EorPNPS , R Metric denotes the ranking frequency of the different Metric, and similarly, R NP denotes the ranking frequency based on the network performance loss.
In order to check the correlation between the ranking of network performance and the ranking of the other three importance-identifying metrics, the Kendall tau coefficients were used to calculate the correlation between the ranking of the different evaluation metrics and the ranking derived from the cascade model simulations, Tau-b formula as follows: where N consistent and N inconsistent are the number of consistent and inconsistent in the two rankings respectively, N Combination represents the number of two-by-two combinations in the two rankings, and N Metric and N NP are the calculations of the two rankings. Separately, the calculation of N Metric is now used as an example (N NP is calculated in the same way), where the same elements in the rankings are combined into small sets, s represents the number of small sets in the rankings, and U i represents the number of elements contained in the ith small set.
The importance of the original network edges was ranked according to different metrics and the correlation was verified by comparing the different ranking results with the network performance loss ranking, with the final result as follows:

Results Discussion
According to the cardinality distribution table query, it is known that the cardinality critical value isχ 2 (n = 66, a = 0.05) = 85.965 at the degree of freedom n=66 and significance level a=0.05, so for the three importance evaluation metrics, all satisfy χ 2 < χ 2 (n = 66, a = 0.05), that is, the null hypothesis that the importance identification side ranking of the importance identification and the network performance metrics are not significantly different holds, but the importance evaluation metric PNPS with χ 2 is the smallest among the three, and therefore its variability with the network performance loss ranking is the smallest. Besides, τ (DE, RS ) , τ (IT E, RS ) andτ (PNPS , RS ) denote the Kendall τ rank correlation coefficients between information entropy of degree values, information entropy of iterative factors and two-dimensional evaluation metrics and the ranking obtained according to network performance loss respectively. The two-dimensional evaluation metric ranking has the highest correlation with the network performance loss ranking, indicating that the two-dimensional evaluation metric has the best performance in evaluating the importance of edges for network performance.
Based on the above conclusions, according to Eq. (1.10), Eq. (1.12), Eq. (1.15) and Eq. (1.19), the process of edge cascading failure in the modified road network is simulated and analyzed, and the top 4 edges of each metric are selected for attack according to different ranking situations, and the cascading failure results are shown in figure 4.
As can be seen from figure 4, the basic trend of different importance evaluation metrics is that the number of network failure edges will increase rapidly when the cascading failure time reaches a certain same threshold. Therefore, when the road network is exposed to the risk of cascade failure, the cascade process should be kept as short as possible or the disaster should be controlled early.

Conclusion and Future Research
Although the influence mechanism and role of edges in the cascading failure process of road networks are rarely considered, this paper proposes three new metrics for assessing the importance of edges based on road networks: information entropy of degree values, information Finally, through a case study of the highway traffic network in Sichuan Province, the results show that compared with the single importance ranking of information entropy of degree values and information entropy of iterative factors, the metrics based on the two-dimensional importance ranking of Pareto non-dominated set can better measure the impact of edges on the performance of road networks. Therefore, for the government, a reasonable metric to evaluate the road network can help to make better protection strategies for critical roads, and at the same time, the priority recovery for critical road sections can make the network performance recover quickly when it is hit by disasters. Based on the above conclusions, this study proposes the following recommendations as the direction of future research: (1) the actual road combined network is very complex, so the case study simplifies it and only simulates the single-layer network of the road network. Future research can simulate a more realistic coupled network for further depth of the study; (2) This paper only considers the failure and diffusion processes in the cascading failure process of road networks but does not make corresponding analysis for the recovery process, and the recovery strategy can be added for the cascading failure process in the future to better evaluate the network performance.