Intelligent Exploration of Construction Accidents Based on Knowledge Graph

. The construction industry is characterized by long production cycles, poor mobility of workers, various kinds of outdoor operations and complex construction processes, leading to frequent safety accidents. To explore the occurrence rule of the construction accidents in building construction, this paper applied knowledge graph technology in the ﬁeld of artiﬁcial intelligence to analyze construction accidents. Firstly, deﬁning the conceptual architecture of the domain knowledge graph. Secondly, extracting key knowledge elements from construction accident data. The knowledge graph of construction accidents has been established by using the Neo4j graph database. Further, a construction accident analysis process based on the knowledge graph has been proposed. The intelligent analysis, such as query, statistical analysis and correlation path analysis for accident information have been conducted. The results shows that based on knowledge graph technology, construction accidents in visual graphics or tables could be visualized. The accident information in the form of knowledge could be saved and queried quickly. The study can provide knowledge support for accident prevention and improve the e ﬃ ciency of accident analysis. Besides, it can provide innovative ideas as well as decision support for safety management.


Introduction
The construction industry is a high-risk industry with various construction accidents. In the United States, there were 23057 fall from height accidents during 2000-2020 [1]. In Malaysia, there were 1143 construction accidents during 2015-2019 [2]. In China, there were 6299 safety accidents in housing and municipal construction with 7562 fatalities during 2012-2021 [3]. A total of 488 construction accident cases in seven types, including falls from height, collapses, object strikes, lifting injuries, mechanical injuries, fires and other accident types are considered in this research. The Ministry of Housing and Urban-Rural Development in China issued the "14th Five-Year Plan" for the development of the construction industry, which indicated that: construction should follow the principle of "safety and quality first, safety-oriented". On the one hand, construction accidents cause losses to people's 2 Literature Review

Construction Accidents Analysis
Construction accident analysis has been focused in many research. Gulgun Mistikoglu [12] et al. used C5.0 and CHAID algorithm to construct a decision tree to represent the relationship between the input and output variables (property) of roofers falling, which found that the mortality rate increased with the increase of falling distance. Saeed Reza Mohande [13] et al. analyzed the contingent factors causing construction accidents in developing countries based on a hybrid method of fuzzy Delphi method and DEMATEL, and the results showed that organization, workplace and environment are the most influential factors. Fan Zhang [14] et al. adopted text mining and natural language process (NLP) techniques to analyze construction accident reports based on the proposed five baseline models: support vector machine (SVM), linear regression (LR), K-nearest neighbor (KNN), decision tree (DT), plain Bayesian (NB), and integrated model. Wei Zhang [6]et al. used the accident causality theory and systematic thinking method to build the construction accident causality system (CACS) model and identify the key accident causes. Na Xu [15]et al. proposed a text mining framework for extracting safety risk factors and provided an improved method, which can effectively and efficiently extract safety risk factors from construction accident reports and lay the foundation for the analysis of construction accidents.

Knowledge Graph
Knowledge Graph [16] is a description of the various concepts or entities that exist in the real world and the various relationships among them. As a semantic network with powerful expressive capabilities and modeling flexibility, knowledge graph could build network model for entities, properties, and relationships. Knowledge graph presents great development potential in knowledge question answering [17], knowledge recommendation [18], knowledge visualization [19] and other fields [20]. Knowledge graph has been applied to several fields and has achieved some remarkable results. In the field of hazard identification [21], the knowledge graph construction based on the combination of computer vision algorithms and ontology models could automatically identify hazards. In the field of geological disasters [22], the knowledge graph of geological disaster literature can improve the utilization of the literature, which provides knowledge services and a knowledge base for disaster prevention and response. In the field of urban rail transit, the urban rail transit accident knowledge graph could be applied to realize risk prevention and control, risk level management, risk prediction and reasoning, so as to provide decision support for rail transit safety management [23] [24].
In summary, most of the Construction accidents analysis focus on the analysis of accident causation, and mainly use traditional accident analysis methods, which makes it difficult to conduct accident analysis accurately and quickly. In addition, most of the accident case information in the existing research is stored in the form of ordinary documents, with a low degree of structure and low level of visualization. Knowledge graph technology has been introduced in this research, which is an intelligent and efficient way to organize knowledge. Through entity information extraction and association path establishment, it can extract structured knowledge from massive data within seconds, improve the structured degree of accident case information, and can be visually displayed in the form of visual graphics, tables, etc [25]. Accordingly, knowledge graph technology has been adopted to analyze construction accidents, which aims to obtain accurate and rapid accident analysis and better visualization effects.

Knowledge Graph Establishment Framework
The process of establishing the construction accident knowledge graph includes four steps, which are selecting the data source of construction accident, defining the concept architecture, knowledge acquisition, and knowledge storage, as showed in figure 1. Defining the conceptual architecture is to determine the types of entities and relationships in the knowledge network. Knowledge acquisition is the process of extracting knowledge from different data sources. Knowledge storage is the process of storing the acquired knowledge into a specific structure.

Data Sources and Data Collection
(1) Data source The construction accident cases in this research are all Chinese cases. The accident data were selected from Safetyhoo.com and accident investigation reports issued by the Government Emergency Management Agency(GEMA) of China. Safehoo.com is a professional Chinese website for safety production and emergency management. The website has various kinds of data, including safety management, safety technology, accident cases, legal standards, safety knowledge and emergency management, etc. The search scope for accidents from the GEMA includes 32 first-level administrative regions in China, namely 4 municipalities, 23 provinces and 5 autonomous regions.
(2) Data collection The accident cases in the Safetyhoo.com are collected through two processes, which are data crawling and data filtering.
The crawler technology is used to crawl the accidents in the Safetyhoo.com and the accident investigation report. A total of 9160 accident cases have been crawled and saved in excel format.
Firstly, the 2028 accident cases occurred between 2010 and 2021 were filtered. Secondly, the non-building construction accident cases were deleted by using the key words "railroad", "boiler", "tunnel", "bridge", "old house renovation", and "house demolition", and 710 accident cases were filtered. Finally, 389 accident cases were obtained through manually filtered, and the missing information or the non-building construction accident cases were deleted.
The accident cases in the emergency management websites of 32 first-level administrative districts were traversed, and the cases in GEMA were filtered item by item, resulting in 99 accident cases.
Totally, 488 accident cases were collected. The accidents were divided into object strikes, falls from height, lifting injuries, fires, collapses, mechanical injuries and other accidents according to the occurrence reasons. The other accidents mainly included explosions, electrocution, and poisoning. The accident type distribution is shown in figure 2. According to the Regulations on the Reporting and Investigation of Production Safety Accidents of China, the accident name, time, accident location, site and other relevant information are extracted to establish a table of construction accident cases.

Define Conceptual Architecture
Defining concept architecture is the basis of knowledge graph establishment, which can provide a standardized and unified representation method for construction accident knowledge description and association analysis. The construction accident entities, properties and the relationship between different types of entities were defined according to the Regulations on Production Safety Reporting and Investigation and Handling of China. Analyzing the existing concepts of accident cases and their relationships, defining key knowledge elements as concepts, properties and relationships, and using arrows to identify the direction of relationships. The E-R diagram (Entity Relationship Diagram) was adopted in presenting main concepts and relationship patterns, as shown in figure 3.

Entity, Relationship and Property Extraction
Knowledge extraction is a key step in establishing a knowledge graph.
First, entity extraction. Entity identification is the fundamental task in knowledge extraction. Entity extraction includes the extraction and categorization of meaningful entities from the unstructured text datasets. The main methods of entity extraction include the methods based on rules and dictionaries, the methods based on statistical models, the methods based on deep learning, and a combination of methods [26]. This research uses the combination of natural language processing technology and manual correction to identify entities, and 15 categories of entities have been extracted, including time, location, position, reason of the accident, and others, with a total number of 1165.
Second, relationship extraction. Relationship extraction refers to the extraction of specific relationships among entities from unstructured textual datasets. The methods include template-based relationship extraction, supervised learning-based relationship extraction, and weak learning-based relationship extraction [27]. The relationship extraction is the entity relationship extraction in the limited domain. The template-based relationship extraction method was adopted. The entity relationships in the paper mainly include: accident-type, accidentstructure, accident-reason, accident-level and other 13 types of relationships, and a total of 7438 relationships are extracted. Third, property extraction. Property extraction is to collect specific entity property information. In this research, the content with weak commonality of property values is used as entity properties. Totally, 12 types of property information were extracted, including accident name, accident participant, construction site, casualty, economic loss, etc.

Knowledge Storage and Visualization Based on Neo4j Graph Database
In this section, the 1,165 entities and 7,438 relationships of the construction accident are stored and in the neo4j graph database by using the cypher language. In addition, construction accidents knowledge visualization is also performed as a graph in the neo4j platform.
(1) Knowledge storage The knowledge storage methods mainly include relational databases, RDF triples, and graph databases. Neo4j graph database is a kind of graph database and consists of three modules, node labels, property keys and relationship types, which can store and visualize knowledge in a structured way [28].
(2) Visualization The Neo4j and Cypher tool were adopted in this research to store the knowledge of construction accidents. The 1165 entities are coded by numbers. The load csv command was applied to bulk import the construction accident data to establish a knowledge graph of construction accidents, as shown in figure 4(partial).

Intelligent Analysis Of Construction Accidents
The intelligent analysis of construction accidents includes the accident information query, the multi-dimensional statistics of the number of entities and the association path analysis of construction accidents. The accident information query performed as entity information query, relationship information query and category query. The multi-dimensional statistics of the number of entities are statistically analyzation from different entity dimensions. The association path analysis includes single-layer association path analysis, two-layer association path analysis and multi-layer association path analysis. Three types of questions are contained, which are statistical questions, query questions and query-statistical hybrid questions.

Accident Analysis Process
Based on the accident knowledge graph, the knowledge related to accidents in the Neo4j graph database was obtained by using cypher language, which can provide data support for accident-related information query, statistical analysis and correlation analysis. The multidimensional and multi-level accident analysis were achieved. The analysis results were visualized to provide decision support for construction engineering safety management. The accident analysis process is shown in figure 5.

Accident Information Query
Based on the knowledge graph, the interrelationship among entities and entity properties could be obtained quickly and accurately.
(1) Entity information query Entity information query is the query of node-related information. Due to the data originates from the accident investigation report, the detailed description of the characteristics of the entity objects in the knowledge graph can realize "accident portrait". It can describe the location, time, type, reason, location, economic loss, result and others, as shown in figure 6. In the safety management, the "accident portrait" can comprehensively grasp the accidentrelated information. The accident entity and property information can be visualized more intuitively and clearly.
(2) Relationship information query Relational information query is the query of the relationship among entities. The related accidents can be synthesized to achieve deeper analysis of accidents. According to the content of entities, 3 types of relationships can be summarized, which are accident-city, accident-time, and accident-reason. Taking the accident-city as an example to illustrate the relationship information query.   the total accidents. The four cities have experienced rapid economic development and more investment and construction in recent years. Taking Shenzhen as an example, from 2018 to 2021, the total investment in Shenzhen's construction rose steadily. In 2018, the year with the most accidents, the total investment in real estate development in Guangdong accounted for 12.0% of the total investment in China, with a growth rate of 19.3%. The total investment in Shenzhen accounted for 18.3% of the total investment in Guangdong Province, which ranks in the top 3 in Guangdong. (

3) Category query
The category query aims to query the number of different entities in each category to obtain the distribution of accidents by the density of nodes or the amount of relationships. The command "MATCH p=(m)-[r1: accident_category]->(n)-[r2:belongto)->(s) RETURN p" is used to obtain different types of accidents, as shown in figure 8 . It can be seen that the density of fall from height accidents node is obviously higher and the amount of relationship is significantly larger, and the number of accidents is the largest.
The number of fall from height accidents between 2010 and 2021 is shown in figure 9. From 2010 to 2018, the number of the accidents showed an overall increasing trend, and from 2018 to 2021, the number of the accidents presents the decreasing trend. According to the statistics, the reasons for the accidents increasing from 2010 to 2018 include failure to wear

Multi-dimensional Statistics of the Number of Entities
Accident statistics is an important part of safety management, which can analyze the trend of accidents and get a comprehensive and accurate grasp of the number of accidents, casualties and property losses [29]. Based on knowledge graph, we can conduct multi-level and multi-dimensional statistical analysis of accidents to meet specific conditions through cypher language, and have the advantages of fast and visualization. At the same time, the information can be structured managed effectively.
Accident entities belong to the accident basic information. According to the radar map of the number of entities (figure 10.), 11 types of entities such as accident level, type, category, building structure, use, etc. have clustering characteristics in 488 accidents. For example, accident mainly focus on 7 types, including fall from height, collapse, object strike, fire, mechanical injury, lifting injury and other. Three types of entities, accident city, position and reason, show scattered characteristics. For example, accident city scattered in 123 cities of China, including Dongguan, Shenzhen, Chengdu, Beijing, Fuzhou, etc.

Structure
The maximum number of mixed structure accidents is 291.
mixed structure, steel structure, brick and wood structure, steel, reinforced concrete structure, other structures City The maximum number of accidents in Dongguan is 82. Dongguan, Shenzhen, Chengdu, Chongqing, Zhengzhou...

Result
The maximum number of accidents subject to administrative punishment is 359.

Purpose
Up to 216 residential accidents. residential, factory, commercial office, complex Multi-dimensional accident statistics refers to the statistics of accident occurrence from the perspective of an entity or a certain type of entity. The statistics can grasp the accident occurrence pattern comprehensively, among which "accident extremes" can provide support for management decisions. Based on the knowledge graph, the multi-dimensional accident statistics are shown in table 1. The dimension of accident year presents the highest accident number of 95 in 2018, which has been illustrated in section 2.2.2. For the dimension of accident season, the number of accidents occurred in spring, summer, autumn and winter seasons were 131, 156, 104 and 97, respectively, which didn't present significant differences. Thus, the season had no significant impact on the occurrence of accidents. For the dimension of accident time, 163 accidents occurred in the morning, which is the highest number. For the dimension of the area level of the accident, in the 488 accidents, 416 were urban accidents and 72 were rural accdents. For the dimension of accident reason, poor safety awareness (28.5%) is the main reason. For the dimension of accident category, the 89 (18.2%) falls from the edge accidents show the largest number. For the dimension of accident type, the incidence of fall from height accidents is 51.4%, which is the highest. For the dimension of accident level, the number of the accidents of general accident level is the largest, which is 438 and the proportion is 89.7%, and the mainly accident type is fall from height accident. For the dimension of accident position, the number of accidents in each part is scattered, including 258 parts of external walls, scaffolding, lifts, operating platforms, etc.

Association Path Analysis
Different entities in the knowledge graph are connected by one or more relationships, and the associated path query can be used for intelligent question and answer and visualization display [30]. The association paths in the knowledge graph can measure the closeness of the connection between two entities. The association path analysis includes single-layer association path, two-layer association path and multi-layer association path. Single-layer association path refers to the connection between two different types of entities through one kind of relationship. Two-layer association path refers to the connection between two different types Examples of problems and descriptions of entity relationships 1. How many accidents of falling from the edge occurred (category-accident) 2. How many accidents occurred as a result of poor safety awareness (reason-accident) 1. How many cities where scaffolding accidents occurred (category-accidents-city) 2. How many scaffolding fall accidents occurred in the city (category-accident-city) 1. How many accidents of falling from height occurred in buildings of mixed structure (structure-accident-category-type) 2. How many object strikes occurred in the spring (season-accident-category-type) Query-statistical category Number of problems 13 157 12 Examples of problems and descriptions of entity relationships 1. What are the accidents that occurred in each accident position, and how many of them respectively (position-accident) 2. What are the accidents that occurred in each time period, and how many of them respectively (time-accident) 1. What are the accidents in each accident category in each season and how many of them respectively (category-accident-season) 2. What are the accidents of each building structure in each year and how many of them respectively (year-accident-structure) 1. What are the accidents in those cities for each accident type and how many of them respectively (city-accident-category-type) 2. What are the accidents of each accident type in each time period and how many of them respectively (time-accident-category-type) Fall from height Changsha of entities through two kinds of relationships, and the simple two-layer association includes three entities. Multi-layer association path refers to the connection between two different types of entities through three and more relationships. The questions in association path analysis include statistical questions, query questions and query-statistical hybrid questions. Based on the above 3 kinds of questions, the semantic search and intelligent question and answer can be achieved. The query questions are questions that query the relevant information of an entity. Statistical questions are questions that count the quantity information related to a certain type of an entity. The mixed querystatistical questions are questions that not only query the entity-related information, but also count the quantity information. Examples of each type of questions are shown in table 2.
Take the multi-layer correlation path as an example, query "What are the cities where fall from height accidents occur?". Enter "MATCH (c:city)<-[]-(m:accident)-[r:accident_category]->()-[]->(s:type{name: "fall from height "}) Return distinct s.name AS fall from height, c.name AS city name", and the result is shown as table 3.. For the query "How many cities have fallen from height accidents?", the result of "return distinct count(*)" is 64. For the query "What cities and how many accidents of each type occurred", the results are shown in table 4. By querying and counting the accident information through the knowledge graph, it can effectively integrate and manage the information, and improve the efficiency of interpretation.

Conclusions
The investigation reports of construction accidents are unstructured data. It is difficult to clearly present the information of the accident and the reasons leading to the accident. As an important branch of knowledge engineering, knowledge graph can realize fast and accurate query and analysis of construction accidents with perfect visualization effect. In this paper, a total of 488 construction accidents are collected, and the Neo4j graph database is adopted to construct a knowledge graph for the construction accidents. Then, the multi-dimensional and multi-layer analysis of accidents have been conducted to realize query, statistical analysis and correlation path query analysis for accident-related information. The conclusions include (1) Knowledge graph technology can effectively integrate and store construction accidents knowledge, which can provide more comprehensive knowledge support for various types of construction accidents. The introduction of knowledge graph into the analysis of construction accidents provides a new way for safety production management. (2) Analyzing construction accidents based on knowledge graph and displaying the results with visual graphics and tables, which can reflect the accident-related knowledge comprehensively and intuitively and greatly improve the efficiency of interpretation. (3) Knowledge graph of construction accidents can provide knowledge support for semantic search and intelligent question and answer. It provides decision support for safety management of construction. However, there is a large amount of incomplete information in the accident report, such as the lack of information on the accident participants, building structure, and casualties. In the future research, the construction accident knowledge complementation and knowledge inference will be conducted.