Quality and accuracy of digital twin models for the neighbourhood level building energy performance calculations

. For scaling renovation, there is a need for assessing the energy performance of buildings at the neighborhood level. Traditional methods for assessing individual buildings are manual and time-consuming – not sufficient to enable the neighborhood level assessment of the energy performance of buildings. Instead, new methods based on existing data in national registries and building typologies are required. The aim in this article is to develop for obtaining necessary initial building geometry information for energy performance calculations from the Estonian national Building Registry (EBR), including also Digital Twin (LOD) geometric models of buildings, and the quality and accuracy of this data is assessed. Altogether 417 representative buildings were used for qualitative and quantitative analysis. A sub-sample of 41 buildings were selected for more detailed analysis and development of methods. Two methods were developed to extract and enrich initial building geometry information for energy performance calculations: (a) the method combining the EBR and building reference data; and (b) the method combining the EBR and building typology data and LOD models of apartment buildings. The estimated accuracy of the first method (a) is around 98% and the second method (b) around 94%. Both methods underestimate the actual envelope area and thermal bridge lengths.


Introduction
In the European Green Deal [1] strategy, the European Union has set targets to decarbonize the EU's energy system and improve the energy performance of buildings by 2050. However, the 'single-building-attime' approach for renovating and modernizing building stock is not enough to meet those targets. The annual renovation rate must increase from the current annual 1% average renovation rate to beyond 3% to meet targets [2].
In the RESTO (Renovation Strategy Development Tool) project [3] that this research is a part of, the focus is on scaling the assessment of energy performance from a single building to multiple buildings. The objective is to enable rapid assessment of the energy consumption and saving potential of existing buildings at the level of multiple buildings. The digital solution is expected to support the local governments and large-scale real estate owners that are responsible for planning and developing the built environment and renovating buildings. This paper focuses on apartment buildings that make up a significant share of the Estonian building stock. Based on 2020 statistics, 58 % of Estonian citizens live in apartment buildings [4]. The floor area of apartment buildings (26 million m²) is 53 % of the total dwellings' floor area [5]. 18 million m² of this floor area was built before 2000. That is, 70 % of apartments buildings need to be renovated by 2050 [6]. * Corresponding author: ergo.pikas@taltech.ee These days, general building information is typically available in the national building databases. For example, in the Estonian Building Registry (EBR) there is information for both buildings and infrastructure objects about planned, under construction and constructed structures. This platform is used by building owners and local governments for processing construction documents [7]. The EBR also contains digital twin models of buildings at the level of detail (LOD) 0, 1 and 2 (see Section 0 for additional details). Different levels of LOD represent gradually more information about buildings [8].
EBR and LOD models with the building-specific reference information [9] create a unique possibility to speed up the assessment of building energy performance assessment at the district level. However, data availability and quality per building in the EBR and LOD models varies significantly. That is, it is possible to use these sources to obtain necessary information for building performance calculations at the neighborhood level, but the limitations need to be studied.
This paper aims to develop methods to extract and enrich necessary building geometry information for assessing the energy performance of typical Estonian apartment buildings at the neighborhood level. Data availability and quality in the EBR and LOD models are evaluated to develop methods for obtaining envelope areas and linear thermal bridge lengths of connections, and the accuracy of obtained information is assessed.

Research Methods and Materials
For developing methods and assessing the accuracy of information, representative apartment buildings were selected for visual qualitative and quantitative inspection and assessment. Two datasets were utilized in the study: (a) 417 buildings with ground-truth EBR data; and (b) 41 buildings of the same 417 buildings with additional detailed project information and documentation. In the qualitative inspection data availability and quality in the EBR and LOD2 models were assessed and methods M1 and M2 were developed. In the quantitative analysis, the information accuracy was estimated. The main phases of the research include (see Fig. 1): (1) defining information needs; (2) selection of representative apartment buildings; (3) data gathering and organization; (4) qualitative analysis; and (5) quantitative analysis.

Defining Information Needs
For heat loss calculations, envelope areas and thermal bridge lengths are needed as presented in Table 1. The envelope areas in contact with the spaces where indoor climate conditions need to be kept are required for energy performance calculations. These include interior dimensions of external walls (blue in Table 1), roof (cyan in Table 1) and windows (transparent in Table 1). As basement is commonly an unheated space the basement ceiling (yellow in Table 1) is taken as an external surface and the basement walls are excluded. Window and external door areas are not available in EBR and LOD. These were derived based on the detailed analysis of 41 buildings. External door areas were considered as part of window areas to simplify the calculations. External door areas are not available in the EBR and these areas make up a very small portion of the envelope area.
Linear thermal bridge lengths are needed for the external envelope connections (see Table 1), including external wall to wall (EW-EW, red), external wall to window (EW-Window, white), external wall to roof (EW-Roof, blue) and external wall to basement ceiling (EW-Basement, orange) connections. The internal wall and ceiling connections with the external wall, roof and basement are considered as an additional area of envelope surfaces. Balcony thermal bridges to external wall connections are not considered. It is not possible to reliably identify based on the digital data sources whether buildings have balconies or not (see Section 3.1). Most selected apartment buildings did not have balconies (see Table 2).

Selection of Representative Apartment Buildings
In total, 417 apartment buildings and their EBR IDs were obtained from the Estonian national funding agency supported renovation projects' database. These represent typical apartment buildings with the different external wall construction material type, including brick, concrete panel, block and wood structures. For all 417 buildings, EBR and ground-truth data were gathered from different data sources to assess the availability and quality of building geometry related information. Project specific information was collected for a subsample of 41 buildings out of 417 (see Section 2.3) to develop methods and assess the accuracy of information from the EBR and LOD. The overview and characteristics of the selected buildings is given in Table 2.

Data Sources
In this research, mainly four different data sources were used to develop methods and to estimate the accuracy of calculations, including Estonian Building Registry (EBR), digital twin models (specifically LOD2 models), energy performance certificates (EPC), and project information. Additionally, Estonian Land Board aerial, Google street view images and site visits were used to gather data. In the following, the four main data sources are described.  Digital Twin Models (LOD2): The EBR also includes digital twin models of buildings at the LOD0, LOD1 and LOD2 levels of detail. These models are created by automated processes combining data from the Estonian Topographic Database and airborne laser scanning data. LOD0 represents the building footprint, LOD1 is the building geometry where the roof has been simplified to a flat surface, and LOD2 is where the roof shape is close to the actual shape. An example of the Estonian LOD models is shown in Fig. 2 [10]. In this work only LOD2 models were used. LOD2 model dimensions and areas were measured and calculated using an IFC-viewer and CAD software.
Energy Performance Certificates (EPC): The EPC describes how much a building consumes energy per heated floor area annually [11]. The energy consumption is defined by the building function, shape, construction materials of structures and technical systems. The external wall and window areas presented in the building input data document [12] for energy performance certificate calculations were collected for 396 buildings out of the 417 buildings. For 21 buildings, there was no EPC available in the EBR. Those data were used to obtain the typical window to wall ratios and to assess the accuracy of the developed geometry calculation methods.
Building renovation project documents: The building renovation projects and documents were collected for the sub-sample of 41 buildings from the EBR. Renovation project information was used as "reference data" to compare the geometric information in the EBR and LOD2 models to. Additionally, project information was used to define the typical room heights and ceiling thicknesses based on the building external wall material type. This information was also used to define the ratio between window to wall areas and window to wall linear thermal bridge lengths and whether buildings have a basement, balconies, and recessed balconies.

Fig. 2.
Examples of LOD2 models of selected Estonian apartment buildings.

Qualitative Analysis
The qualitative analysis was conducted in several steps. First, the availability and quality of data in EBR was evaluated to identify whether information from EBR can be reliably used in the development of calculation methods. Second, the LOD geometry was compared to Land Board aerial and street view images to identify which features existed or not in the LOD2 models. The following building geometry aspects were addressed: x Shape of the building x Building has balconies or not x Linear thermal bridge types x Building has flat or sloped roof type x Unheated building parts Third, the findings of the first two steps were used to develop two methods M1 and M2 to calculate envelope areas and thermal bridge lengths as shown in Table 3.
Additionally, building's heated floor area, heated height, and internal perimeter were calculated. Heated floor area is crucial because, the energy performance is measured per heated floor area. The overview and development of methods is described in Section 3.2.

Quantitative analysis
The main purpose of the quantitative analysis was to assess the accuracy of the developed methods and identify proper fusion of information sources for retrieving necessary envelope areas and thermal bridge lengths for the energy performance calculations. For estimating and comparing the accuracy of information, the results of two developed methods were compared to the project information of 41 buildings (see Section 4). The method M2 façade area accuracy was also tested on 355 apartment buildings. Testing method M1 on large scale building data was not yet possible at this stage of the project.

Availability and Quality of EBR Data
Basement and number of floors: Most apartment buildings built between 1950s and 1990s have a basement. While the number of floors was always available in EBR, the number of floors below ground was not. In the EBR, the basement was defined only for 32% of 417 buildings. However, based on the ground truth data, 94% of buildings had a basement. That is, it is not possible to rely on the information from the EBR database. In the calculations, it was assumed that apartment buildings have a basement. Roof type and building height: Another crucial part of the external envelope is the roof type: flat or sloped roof. Flat and sloped roof types (mostly with a cold attic) have an impact on the total height of the building. For energy performance calculations, only the total height of the heated rooms is needed. Although it can be determined with some level of confidence based on the roof material, it is not defined in the EBR what type of a roof building has. For assuring the reliability of calculations, it is proposed that building refence data is used to calculate building and heated room heights.
Height, length and width: The building height was defined for 66% of the 41 apartment buildings in EBR. The building length and width were present for 41%. All three together, including the building height, length and width, were available for 41% of 41 buildings. Therefore, height, length, and width information was often not available in EBR. Information from the LOD2 models ought to be used.
Building shape and area: The most common building floor plan has a rectangular layout and the shape of a cuboid. There are variations to this basic shape (see Fig. 3). The building net area was present in the EBR for all selected buildings, but the net heated value was available only for 51% of the 41 selected buildings. All in all, only building net area and the number of floors above ground could be reliably used for developing calculation methods. Fig. 3. Examples of floor plan layouts from the 41 selected buildings.

Analysis of LOD2 model
The LOD2 models represent the overall shape of the building (see Fig. 3 and Fig 4). The most common building shape has a rectangular floor plan with a flat or sloped roof. If the building shape and floor plan is more complex (e.g., building has protruding walls), the LOD2 model is often simpler than is the actual geometry of a building. Apartment building LOD2 model Those LOD2 models do not contain information about the building floors and the windows, nor is it possible to say if the building has a basement or not. There are also instances in which the LOD2 model and actual building width differ. This is common for buildings with complex geometry and an overhanging roof. If a building has an overhanging sloped roof, then the LOD1 and LOD2 model's façade will start from the edge of the sloped roof causing the building envelope to be larger (see Fig 4 example A). Balconies are not included in the LOD2 models (see https://doi.org/10.1051/e3sconf/202339604021 IAQVEC2023 the roof or large ventilation shafts often have them partly represented in the LOD2 roof geometry. Altogether, all these limitations and features can influence the automated calculations of building geometry.

Development of Methods
Two methods M1 and M2 were developed to obtain necessary information of buildings' geometry. In method M1, the LOD2 model is used as a data source and in the method M2, only reference building data is used. The M2 method is regarded as a statistical and simplified approach. An overview of the developed methods is given in Table 3. The calculations steps for initial parameters (see Table 3) were the same for both methods, and these were used in the further steps to calculate envelope areas and linear thermal bridge lengths.

Initial Parameters
Heated floor area: The apartment building's heated floor area value is not always available in the EBR database and never available in the LOD2. An approach to derive this information from the building net area by using the number of floors from EBR was used for both methods. In this calculation, it is assumed that apartment buildings have a basement. The basement net area needs to be subtracted from the building net floor area. Therefore, the heated area was calculated as follows: where: ‫ܣ‬ ௧ Heated floor area (m²), ‫ܣ‬ ே௧ Net floor area (m²), ‫݈ܨ‬ Number of heated floors above ground, ‫݈ܨ‬ Number of unheated floors below ground.
This calculation is a simplification and there are several assumptions influencing the calculation accuracy. It is assumed that the basement is as large as typical floors of a building. Buildings may also have unheated corridors, storage and technical spaces that impact the actual heated floor area. However, this was not accounted for as there is no reliable source of information for identifying these types of spaces. It is expected that this calculation of heated areas will converge to the mean value when many buildings are considered as part of the calculation sample. Heated height: The heated building height describes the height of the heated volume from the first floor's floor surface to the last floor's ceiling inner surface. In this research, the LOD2 models' height was not used in the calculations because the basement and unheated roof space heights varied significantly. Therefore, in both methods, the heated height was calculated based on the typical room height and ceiling thicknesses of the apartment buildings as follows: where: ℎ ௧ Building heated height (m), ℎ Space head height (m), ℎ Thickness of heated ceiling (m), ‫݈ܨ‬ Number of heated floors above ground.
The typical heated height and ceiling thicknesses were derived from the renovation projects of 41 buildings.
The mean values for all buildings exept for buildings with wooden structures are presented in Table 4. The wooden buildings were excluded because there were not enough wooden apartment buildings available in the dataset (see Table 2). The general mean value was used instead. In the future, a validated dataset needs to be created for wooden apartment buildings to test M1 and M2 methods. Building's internal perimeter. The building internal perimeter is required as an input value for the facade area calculation. Two methods were developed and compared. Method M1: The external perimeter length was obtained from the LOD2 building model. The external perimeter was multiplied with the average ratio of 0.97 (derived from the detailed analysis of 41 apartment buildings) to obtain building internal perimeter length. The actual wall thickness was disregarded because it is currently not possible to reliably identify whether the external wall has its original construction or has been renovated and insulated based on the LOD2 model.
Method M2: The perimeter value was calculated based on heated floor area (Equation 4) by simplifying the building floorplan into a rectangle and fixing the shorter side width of the building. The fixed width was chosen to minimize the difference between the calculate façade area and the value taken from EPC. Also, the width was chosen because it varies less (shown in Table  1). It resulted in 9 meters, which was the mean minimum width. By fixing the width, the inner length of the building was calculated from the heated floor area of one floor: Building façade inner width (m), b Building façade inner length (m).
The window to wall ratio (WWR) ( Table 3) was calculated for every building with pre-defined external wall type and was calculated based on the window and external wall areas gathered from the 396 apartment buildings' EPCs. The average WWR for brick buildings is 0.23, concrete panel 0.24, block 0.23 and wood 0.17.

Envelope Areas
The facade area was calculated by multiplying the heated height with the building internal perimeter length of the given method (M1 or M2). Window area was calculated by multiplying the façade area and the window to wall ratio. The external wall area was calculated by subtracting the calculated window area from the façade area. Roof and first floor area on cold basement or floor on ground area was calculated using two different methods. Method M1: The roof and the first floor on cold basement or the floor on ground area was extracted from the LOD2 model and the external wall thickness was subtracted by multiplying the LOD2 area with 0.89. This was the mean ratio between floor external and internal areas based on the analysis of 41 apartment buildings. Method M2: The area was calculated the same way as the floor area in Equation 4, dividing the heated floor area by the number of floors above ground.

Linear Thermal Bridge Lengths
External wall outer corner. For calculating the thermal bridge lengths, the building heated height was multiplied by the number of outer corners. Method M1: The EW-EW outer corner count was extracted from the LOD2 model. Method M2: The EW-EW outer corner count was taken from the average corner count based on building type. The average corner count was derived from the information of 417 apartment buildings.
External wall inner corner: The EW-EW inner corner thermal bridge length was calculated by multiplying the building heated height and the number of inner corners. Method M1: The EW-EW inner corner count was extracted from the LOD2 model. Method M2: The EW-EW inner corner count per building type was taken as an average corner count from the analysis of 417 buildings.
External wall to roof: The EW-Roof linear thermal bridge and the EW-basement ceiling length were considered equal to the building's internal perimeter.
External wall to window: The EW-window linear thermal bridge length was calculated from the total window area. To obtain the window connection linear thermal bridge length the window area was divided by 0.37. The ratio was calculated based on 41 apartment building projects' window area and linear thermal bridge length data.

Building Envelope Area Accuracy
The one main parameter to calculate building envelope areas and thermal bridge lengths was the heated perimeter length. The comparison of results presented in Fig. 5 show that both methods M1 and M2 underestimate the perimeter length. M1 had a mean difference of -1.7% and M2 of -1.3% when compared to the project data. With less variation, M1 was more accurate with a interquartile difference of 6% while the M2 interquartile range was 19%. The LOD2 model perimeter's mean accuracy for 41 buildings was 98%  when compared to the project information. An M1 outlier was due to simplification of the actual building façade geometry in the LOD2 model. The comparison of M1 and M2 façade, window and external wall areas are presented in Fig. 6. In the M2 method, the interquartile range for façade area was larger. Because of this, the total façade area was larger for M2 than for M1. The standard deviation between the project information and M1 information was 4% and 11% between the project information and M2 information. That is, M1 was more accurate, but differences occurred when the LOD2 geometry and model's floor plan had been simplified. Especially, when the LOD2 models included the recessed balconies as part of the building façade geometry or when buildings had an overhanging roof (see Section 3.2).
The window area results showed that both methods have a similar accuracy. M1 standard deviation was 17.3% and a median inaccuracy 7% and M2 standard deviation was 17.0% and the median inaccuracy 8%. M1 variability was caused by buildings, which had recessed balcony and overhanging roofs. Overall, M1 was more accurate and reliable in most cases to calculate the perimeter length and areas of building envelope parts. Fig. 6. Difference between envelope areas for methods M1 and M2 when compared to project data (BC stands for basement ceiling).

Building Linear Thermal Bridge Length Accuracy
In Fig. 7, it is visible that the main differences in linear thermal bridge calculations between the M1 and M2 methods were caused by the number of external wall corners. This illustrates the simplified nature of LOD2 models used in the M1 method and the statistical approach used in the M2 method. Because of this, M1 underestimated the outer corner length and M2 overestimated the outer corner length. The external wall inner corner total length was inaccurate for both methods. M1 was with less variation and more accurate because the information was obtained from the LOD2 models instead of the statistical information used in M2. The roof and external wall connection perimeter and basement ceiling and external wall linear thermal bridge lengths were the same for these types of apartment buildings. While the results for both methods were relatively accurate, M1 was more accurate.
The external wall and window linear thermal bridge lengths were relatively accurate when compared to the actual project data. This was calculated by dividing the total window area with 0.37, derived from the detailed analysis of 41 buildings. This accuracy is indicating that selected apartment buildings have a relatively standardized window sizes. Fig. 7. Accuracy of methods based on external wall (EW) thermal bridge type (BC stands for basement ceiling).  Fig. 8 in order to improve the readability of the chart. Total envelope area is the sum of the roof, external wall, window and basement ceiling areas. M1 and M2 mean envelop areas were 98% and 94% of the total project envelope area respectively. M1 underestimate the window (mean -8%), external wall (mean -1%) areas (see Fig. 6). M2 underestimates the roof (-14%), basement ceiling (-14%) and window (-5%) areas and has generally larger variation for the areas. The metod also overestimates the external wall area by 1%. M2 had more outliers than M1 because of the statistical approach used to assess the length and width of the building.  E3S Web of Conferences 396, 04021 (2023) https://doi.org/10.1051/e3sconf/202339604021 IAQVEC2023 better readability of the graph. The roof and basement ceiling areas were disregarded because this information was not available. The results show a generally good accuracy for the M2 method. The M2 method mean area was -9%. The outliers were caused mostly by complex buildings, which had C, Z or L shape.

Discussion and Limitations
M1 and M2 methods to extract necessary building geometry information for scaling up the assessment of energy performance of typical Estonian apartment buildings were developed. Information availability and quality form the EBR and LOD2 models were assessed to identify the reliability and limitations of the developed methods. Overall, the results highlight that the methods M1 and M2 have relatively good accuracy. The method M1, in which information from the LOD2 models were used, was more accurate. This demonstrates that it is possible to scale the energy performance assessment to multiple buildings.
There are limitations to using the LOD2 models. Especially when buildings have in reality more complex shapes. For example, when buildings have recessed balconies or sloped roofs, the area calculations are less reliable. Also, LOD2 models do not contain balconies, which influence the thermal bridge length calculations. These were the main causes for outliers in the M1 method calculations. Unfortunately, this information is not available in the EBR nor LOD2 models.
Future research is required, and technologies ought to be developed to better capture (such as laser scanning, photogrammetry and computer vision) the buildings conditions at scale. These technologies could be used to develop LOD3 models that in addition to roof structures also include information about windows, balconies, external doors and recessed balconies. The same technologies could also be used to identify whether anything has been done with buildings, for example, external walls have been insulated.

Conclusions
This article aimed to develop methods based on the data from the EBR and LOD2 models for obtaining necessary building geometry information for the energy performance assessment on a neighbourhood level. For that information availability and quality was evaluated and accuracy of calculations were estimated. The accuracy was around 98% for the M1 and 94% for the M2 methods. Overall, although there are limitations and future research is required, the results demonstrated that methods could be developed to scale the energy performance assessment to multiple buildings.