Determination of the estimated cost based on aggregated unit prices using information modeling (BIM) and text mining technologies

. This article discusses a method for determining the cost of a construction project based on aggregated unit prices based on information modelling and text mining. The features of estimating the cost of investment projects at the pre-project station are indicated, such as high uncertainty of technological and cost parameters, as well as the advantages and prerequisites for the transition to pricing in the system of consolidated unit prices are shown. The authors consider several existing text analysis methodologies and their goals, as well as the algorithms used. The most frequently used tools for solving various Data Mining tasks in research are identified, and the sequence of actions for modeling the estimated cost at aggregated unit prices using information modeling and text mining technologies is also given, since this method provides undeniable advantages in comparison with classical methods.


Introduction
Currently, the most common international standard for the classification of estimates in the Russian Federation is AACE (International Association for the Development of Cost Engineering).The key idea of this standard is a system of accuracy classes for estimating costs and methods for determining the cost of the project and the expected calculation error recommended for each of the stages of development of design solutions.This standard formed the basis of the interstate standard "Cost Engineering.Terms and definitions".For the stage of development of design solutions identical to the stage of preparation of project documentation in accordance with the requirements of the legislation of the Russian Federation, it is recommended to "phased cost determination", that is, the use of a price guide by type of work (basic market prices per unit of production).
Following the example of world practice, including by attracting construction and engineering organizations that operate outside the Russian Federation, many companies operating in the construction market of the Russian Federation saw the need to develop their own database of consolidated industry prices per unit of production.
A feature of the evaluation of the cost of investment projects at the pre-project stage is the high uncertainty of technological and cost parameters with the decisive influence on the project costs of the key decisions taken.An enlarged, but adequate assessment of the project cost can guarantee the effectiveness of capital investments in the project.In this paper, we consider the option of determining an enlarged unit price from an information model, which can allow us to quickly and efficiently determine the estimated cost of an object already at the design stage.Of course, with this option, the accuracy of the definition depends on the attributive fullness of the information model, however, the option of determining an enlarged unit price according to the BIM model based on text mining has its undeniable advantages: -reduction of labor costs in the formation of construction cost calculations; -the possibility of forming a budget using a database of aggregated unit prices at all stages of the project implementation with reference to the information model; -the ability to take into account the actual data on the implementation of the project, accumulate information about the details of implemented projects directly in the information model, as well as update the Database of large unit prices taking into account such information.
Currently, some research has been conducted in the field of the use of text mining in construction projects, but this technology has not been investigated and applied to the determination of the cost in aggregated unit rates in combination with information modeling.
The relevance of the topic is justified by the fact that the current implementation and definition of methods for determining the cost of a project from a BIM model in the context of enlarged unit prices will provide opportunities and advantages that are difficult to achieve with other non-labor-intensive and kind of autonomous methods.
The objectives of the study are: -Determination of the basic information contained in the Database of aggregated unit prices for its further identification or association with information from the BIM model -Review of the use of text mining in the construction industry, identification of their goals and implementation algorithms -Application of text mining technology to the developed information model of a building or structure using aggregated unit prices test database or the current database of a construction company, and based on the results, development of recommendations and suggestions for implementation.
The scientific novelty of the study is to develop a methodology for determining aggregated unit prices based on the information provided in the BIM model.
Theoretical and practical significance is justified in the possibility of improving the approach to determining the cost in the process of information modeling at the early stages.Methodology and methods of research.The paper plans to use methods of analysis, synthesis of scientific literature and generalization to identify the use of text mining in the construction sector, used in conjunction with information modeling technologies.The methodology and methods of the planned study are schematically presented in Figure 4.

Consolidated unit prices
The enlarged norms and prices take into account direct costs, including the cost of basic materials, products and structures.The consolidated rates are intended to determine the cost of construction at the pre-investment stage, the early design stage, the stages of formation of tender documentation and contractual prices.Also, enlarged prices are often used in commercial estimates [1].
The prerequisites for the transition to pricing in the system of consolidated unit prices are associated with the advantages of this pricing method.From the point of view of planning the cost of construction projects, pricing in the system of consolidated unit prices has the following undeniable advantages: 1. Due to the degree of consolidation of types of work in the Database of consolidated unit prices, it becomes possible to develop investment project budgets both at the pre-project stage and in the presence of project documentation in a single fixed pricing structure, which allows you to control the project budget during implementation from the initiation stage to commissioning, as well as analyze deviations in the context of types works and structural elements, which increases the visibility of factor analysis of changes.
2. The base of consolidated unit prices is formed on the basis of real market prices for the purchase of services, i.e., reflects the price very close to the results of the future purchase of construction and installation works on the project, which is at the start.
3. The use of a Database of enlarged unit prices reduces the labor costs of forming the budget of an investment project, since it allows you to quickly form calculations of the cost of implementing investment projects due to a reduced list of prices compared with the corporate budget and regulatory framework.However, when forming a statement of work volumes in the context of the list of enlarged prices, a large amount of labor is also required.
4. The application of the estimated regulatory framework implies a significant number of monthly reports of completed works and a large number of positions in them significantly increases labor costs for inspection, acceptance, accounting of completed works.The base of enlarged unit prices allows optimizing the process of acceptance of completed works and registration of primary accounting documentation, due to a decrease in the number of lines in the act, compared with the acts formed according to estimates in the construction and regulatory framework.

Text mining in the construction industry
The information age is characterized by a rapid growth in the volume of data, mostly unstructured.The availability of this data opens up new opportunities as well as new challenges for both researchers and research institutes.In the article [2], several existing text analysis methodologiesн were considered and a formal process for applying text mining methods using open-source software -(R) was presented.The applied approach to text analysis can be described in several consecutive steps.Given the unstructured nature of text data, assigning a set of meaningful quantitative metrics to this type of data requires a consistent and repeatable approach.This process can be roughly divided into four stages: data selection, data cleaning, information extraction, and analysis of this information.
The main purpose of text analysis is to capture and analyze all possible values embedded in the text.Text mining involves information retrieval and preprocessing, classification and clustering, as well as more complex processes such as extracting relationships or complex patterns [3].
Currently, data mining is not widely used in construction projects [4].However, text mining methods can be quite useful and applicable to achieve various goals.Table 1 shows some studies in the field of text mining in the construction field [5].Obtaining similar examples of risk management in construction projects NLP construction projects, i.e., the vector space model and semantic query extension.
X. Lv, N. M. El-Gohary [10] Development of an information retrieval model to support the environmental assessment of a transport project.
Semantic annotation, Semantic query processing, and Semantic Document Ranking (SDR) A. J.-P.Tixier, M. R. Hallowell, B. Rajagopalan, and D. Bowman [11] Analyzing and predicting the occurrence of accidents at a construction site by studying historical accident reports.NLP M. Alsubaey, A.Asadi, and H. Makatsoris [12] Develop of an early warning model that predicts project failure through analysis Naive Bayes algorithm T. P. Williams, J. Gong [13] Using numerical and textual data to predict cost overruns.
Ridor, K-star, neural network and M stacking.
Al Qady, A. Kandil [14] Clustering of construction project documents into semantically related groups.
One-pass clustering algorithm.
H. Fan, H. Li [16] Search for similar cases of dispute resolution using text analysis.NLP J. Hsu [17] Extract CAD documents based on text content.Extraction of regulatory documents on construction.

Model of the vector space
NLP, rule-based approach, and semantic approach N. Ur-Rahman, J. A. Harding [19] Detection of hidden information in post-project review documents.
Clustering of K-means and a priori analysis of association rules.
For various DM (Data Mining) tasks, various computational tools were used in the research.The number of articles distributed by tools according to the authors' research is shown in Fig. 1 [20].DM methods cannot directly extract knowledge from text data, however, this is made possible by using text mining (TM) technology to process text data to extract useful patterns, trends and rules.Many researchers have applied TM methods to extract potential knowledge from unstructured textual data in the construction industry, as shown in Table 1.Building Information Modeling (BIM) has recently gained popularity in architecture, mechanical engineering and in the construction industry.A new frontier of development is coming in the construction industry.From January 1, 2022, the use of information modeling technologies, or BIM technologies, will become mandatory at all state-ordered facilities in the Russian Federation [21][22][23][24][25][26][27][28][29].
The advantages of using information modeling technologies for the estimator are saving time on reading drawings, calculating the amount of work in automatic mode, control and speed of working out changes in the project [25][26][27].It is worth noting that the introduction of text mining technologies into the BIM process to determine the cost of aggregated unit prices will add another weighty argument in favor of information modeling, since the selection of estimated standards will lose the routine associated with calculating the amount of work and the selection of prices.Examples of information models of an industrial building (architectural and HVAC) are shown in Figure 2,3.The integrated database of unit prices is an internal database of Companies with prices in the structure of consolidated unit prices, which consists of a list of consolidated prices with a technical part and a reference Database of consolidated unit prices provides the necessary and sufficient amount of information to perform tasks related to budgeting, planning the cost of construction projects and the conclusion of contracts.

Proposed methodology
Figure 4 shows the methodology as a diagram, where the actions/tasks partially completed in this document are highlighted in blue, and the remaining colors indicate planned tasks and actions.Figure 5 shows an abstraction of exactly how the "binding" of the enlarged unit prices to the elements of the BIM model will work.It is assumed that the information will be unloaded from the information model, then after processing the data, it will be intelligently compared with the cipher in the Database of aggregated unit prices, as a result of which information identified and linked to the prices will be obtained, i.e., the cost of the object will be determined.To achieve the objectives of the study, it is necessary first of all to select or develop an information model of a building or structure, to find the best way to download information from a BIM model for further analysis.In this paper, it is not planned to consider the issue of information content of the model, i.e. it will be planned to work with what is.A test base of enlarged unit prices will be created for this experiment.In case of an incorrect result of determining the cost, it is assumed to choose another algorithm for text mining or other software for data processing and conduct a repeat experiment.
The lack of information in the 3D model is the most frequently mentioned difficulty in the existing literature.3D models can contain only 50% of the necessary information, which implies investing a lot of time in managing, viewing and correcting the 3D model file, and adding information later to be able to perform a cost estimate [22].This factor would be improved if the 3D model was complete and had reliable data from the very beginning, which still does not happen in most cases.Thus, the introduction of BIM for cost based on text mining will improve the input data, as well as the understanding and knowledge of the 3D model, which in turn will lead to a more reliable cost estimation [28].
Summarizing the above, we can say that in the course of this work, the advantages of using aggregated unit rates were identified and analyzed, as well as the advantages of working together with BIM, a review of the use of text mining in the construction industry, including algorithms and usage goals, identified software and tools used in the research to solve various DM tasks (data mining).The most common tools according to research [20] are Matlab and R. The methodology of the planned research has been thought out, specific actions and a sequence of actions to achieve the goals have been determined.

Table 1 .
Text mining in the construction industry.