From white-box to grey-box modelling of the heat dynamics of buildings

Identifying the parameters of grey-box models requires enough data collected from sensors installed inside and outside of the building for long enough period of time. Consequently, this process is time consuming, costly especially in large buildings that require more sensors, and can only be conducted after the building is constructed. This paper introduces a procedure for identifying grey-box models from white-box models. Following this procedure, grey-box models can be identiﬁed using data generated by a white-box model, without any requirement for mounting sensors in a building. This reduces the cost and time of modelling, control design and prediction. The introduced procedure is utilized to ﬁnd a grey-box model of the heat dynamics of a four-ﬂoor building in Spain. Simulation results demonstrate the effectiveness of this procedure.


Introduction
High quality models able to predict the future evolution of thermal dynamics are required for the advanced modelbased control implementations.In particular, the model should capture nonlinear behaviours of the thermal dynamics in the presence of process noise due to approximation errors or unmodelled inputs and measurement noise due to imperfect measurements.Grey-box models, that consist of a set of stochastic differential equations (SDEs) describing the dynamics of the system in continuous time and a set of discrete time measurement equations, allow incorporation of prior physical knowledge and utilization of statistical methods for parameter estimation.Physical knowledge of buildings and the information embedded in the collected data from the buildings are two main requirements of establishing a grey-box model.Once the data is measured, the physical knowledge can be formulated by a set of first-order stochastic differential equations.Since the goal of finding a grey-box model is to design an advanced controller, such as a model predictive controller (MPC), it is meaningful to investigate a grey-box model that consists of a set of first-order linear stochastic differential equations.However, measuring sensors' data installed inside and outside of the building for long enough period of time is time consuming and costly, especially in large buildings that require more sensors, and can only be conducted after the building is constructed.In this paper, we describe the process of finding a white-box model of a building.Then, we utilize the generated data from the white-box model to estimate the parameters and find the best grey-box model.Statistical analyses are used to establish the model selection procedure introduced by Bacher and Madsen (2011).The contribution of this paper is introducing an inexpensive and comparably fast procedure for finding the optimal grey-box model.The information for finding the grey-box model is generated using a realistic white-box model.The white-box modelling is a well established method for building performance modelling.There are multiple wellknown softwares available, such as TRNSYS, IDA ICE, Energy+ etc, which all allow dynamic modelling of buildings.The white-box model is fundamentally based on conservation of mass, energy, and momentum.The common approach of making a white-box model is bottom-up: building envelope, occupants, schedules, heating, ventilation and air conditioning (HVAC) systems and parameters, and so on.The modeling is time-consuming due to many more required model parameters, but the simulation results are more accurate than other models Li et al. (2021).A common practice in white-box models is to introduce the occupancy patterns as a schedule, including a deterministic behaviour instead of stochastic one.The current work introduces stochastic occupancy behavior in white-box model of TRNSYS, with the objective to reproduce a realistic use of the dynamic systems of the building, such as domestic hot water usage, heating demand and setpoints, solar shading, window opening and ventilation.The present paper implemented a methodology to obtain optimal grey-box model based on realistic white-box models, which are developed in the framework of European project syn.ikia.The work is focused on the Mediterranean demonstration site building.Syn.ikia's concept relies on the interplay between novel technologies at the neighborhood scale, energy efficiency and flexibility of the buildings, good architectural and spatial qualities, sustainable behavior and citizen engagement.

Case stuty
The Mediterranean demonstration site is located in Santa Coloma de Gramenet and is placed in a neighborhood that is involved in an urban regeneration process.The demonstration site includes 38 dwellings, 2 commercial premises, and 38 parking spaces.The building has 5 residential floors and 2 blocks, and an open patio in between them (Figure 1).The design of the building follows an integral design process to achieve a positive energy building, with local energy production from photovoltaic panels, which are placed both on north-west and south-east block roof.Details about the final building design can be found in Tamm et al. (2021).
The building has centralized heating and Domestic Hot Water (DHW) system that consists of three air-to-water heat pumps.There are two centralized water tanks, one for heating and other for DHW, which distribute the heat to the households in two different distribution systems, allowing to work at different temperatures.There is a heat exchanger substation that provide the energy to each household.The heating emitters are low temperature radiators and the air temperature of the household is controlled by a thermostat, located in the living room.

White-box model
The white-box model is developed in TRNSYS 18 which has 32 thermal zones.In general, there is 1 zone per household, but for detailed comfort analyze purposes there are 4 dwellings that each have 4 zones -one zone for each room (Figure 1, circled with orange).Two of the five residential floors are simulated as shading objects, for computational simplifications.Two of the detailed dwellings are on the typical floor and two on the attic floor in order to observe critically behaving zones.Both floors have one dwelling from the north-west block and one from the south-east block to monitor closely the effects of orientation.In order to reduce the gap between white-box model and reality, the model has stochastic occupancy profiles, appliances energy consumption and DHW needs of each individual dwelling, that has been generated by a high resolution stochastic model Tejero et al. (2018).The high resolution stochastic model is based on the Markov chain theory, that defines that the current situation depends only on the previous time step or period.The TUD (Spanish Time Use Data) have been used to develop the model, creating the transition probabilities matrices for switching between three possible states: out, passive and active.These probabilities vary depending on the time of the day and the number of occupants of the household.Under the "active" state, several activities have been included, and each activity is related to the use of one or more electric devices and/or DHW needs, as Table 1 shows.The activities are defined through a set of "activities probabilities" and their "time duration" which allow determining at every moment what activity is performed by an "active" occupant.Once the activity is known, the activation of the appliance is modelled using the "Appliances use probabilities", which determine which equipment is switch-on, if there is more than one possible equipment.The time resolution of the stochastic model is 3-min and is implemented as a Type in TRNSYS.
Next the household characterization determines the number of occupants and the equipment stock.In the current study, 4 levels of occupancy (1 to 4 people) and 3 stock of appliances (Dev1, Dev2 and Dev3) are used.The submodel generates a set of random number at every time step (3 minutes) in order to be compared with the different probabilities and to determine the occupants state, their activity, the use of the appliances and the activity duration.
The appliance stock varies according to the level of occupancy: 1 or 2 occupants dwelling can have the stock of appliances Dev1 and Dev2; 3 or 4 occupants dwelling can have the stock Dev2 and Dev3, being further explained in Tejero et al. (2018).
The white-box model includes the stochastic occupancy as a main driver of all the systems of the building.The building has operative temperature driven ventilation control, that imitates the occupant(s) behavior when opening/closing the windows, meaning that the windows are opened only if the limit temperature is exceeded and in the case if the occupant(s) is present.The same goes for solar shading control, which reproduces the occupancy behaviorthe shading is drawn if the radiation exceeds the maximum value and if the occupant(s) is present.The lighting consumption is connected with the occupancy profiles, meaning that the lights are opened in the household if the illuminance drops below the set value and only if the occupant is present.The heating system is also linked to the occupancy behavior of each household, meaning that if the occupant(s) is not present, the heating system room thermostat set-points (21°) are dropped to the level of set-back values (18°C), which are also used during night time.Finally, the consumption of appliances and the DHW needs are direct outputs of the high resolution stochastic model, which are linked to the activity profile of the occupant(s).
The profile of the occupancy, appliance consumption and DHW needs vary day by day and also by household.Figure 2 represents the power profile of one week as an example of the stochasticity, where all the grey lines represent the time-step power consumption of all the households, and black line symbolize the average.It can be well observed that there are great differences between the households and a great variability between days.Figure 3 introduces an example of a power and occupancy profile of 1 household during one winter week, where the stochasticity can be well observed in the occupancy behavior (grey lines) which consequently affect the power consumption of the thermal demand and electrical devices.Table 2 represents the main results of the white-

Grey-box model
High quality models, which are able to predict the future evolution of thermal dynamics, are required for the advanced model-based control implementation.In particular, the model should capture nonlinear behaviours of the thermal dynamics in the presence of process noise due to approximation errors or unmodelled inputs and measurement noise due to imperfect measurements.Although, white-box modelling is a common method to simulate and provide an in-detail description of building thermal dynamics, it is not a proper candidate for the heat dynamics controller design due to its complexity and the requirement for various information about the building structure and heating, ventilation, and air conditioning (HVAC) systems.In this section, we describe grey-box modeling mathematical background.Grey-box models, which consist of a set of stochastic differential equations (SDEs) describing the dynamics of the system in continuous time and a set of discrete time measurement equations, allow incorporation of prior physical knowledge and utilization of statistical methods for parameter estimation.Often physically meaningful parameters made these models a proper choice for control purposes.In general, a grey-box model can be written in the following form: where, where t 2 R is the time variable, t k 2 Z + is the sampling instant, x t 2 R n is a state vector, u t 2 R m is the input vector, y k 2 R `is the output vector, and ✓ 2 R p is the parameter vector to be identified.Also, f and h are nonlinear continuous functions, !t is a standard Wiener process and e k is a white noise process.
Physical knowledge of buildings and the information embedded in the collected data from the buildings are two main requirements of establishing a grey-box model.The physical knowledge can be formulated by a set of firstorder stochastic differential equations.Since the goal of finding a grey-box model is to design a controller, the desired simplified grey-box model consists of a set of firstorder linear stochastic differential equations, i.e., f and h are linear continuous functions.In the following, we answer to these two questions: 1) How to find the best structure of linear functions f and g? 2) How to find the parameters of these functions in an optimal manner?

Model structure
Finding a proper structure for linear continuous functions f and g is not a straightforward process.However, considering the physical knowledge about buildings and thermal systems ameliorates it.
A list of electric circuits resembling the heat dynamics is given by Bacher and Madsen (2011).This list provides a set of structures for the deterministic part of grey-box models.The electric elements of these circuits are equivalent to the elements of thermal systems.Heat flow rate, thermal capacitance, thermal resistance and temperature are analogous to current, electrical capacitance, electrical resistance and voltage, respectively.Names of these models stand for the main nodes of the electric circuits, for example, a model with interior, heater and sensor nodes is denoted as T i T h T s .Although, this list is not unique, various controller designs for the heat dynamics in the literature employ one of the circuits in this list.For example, the paper by Thilker et al. (2021) uses one of these model structures for water-based heat dynamics of a three-floor school.
As an example, consider Figure 4, which demonstrates the electric circuit equivalent to thermal dynamics of a house.This model is known as T i T e in the work by Bacher and Madsen (2011).The circuit is divided into different parts by dashed lines to show how this circuit resembles a thermal dynamic system.It also shows the interior, heater, solar, envelope and ambient elements of a thermal system.T i and T e , that are considered as the states, represent the interior and environment temperatures, C i and C e are the thermal capacities of interior and envelope, and R ea and R ie are the thermal resistances between ambient and envelope, and envelope and interior, respectively.Also, T a is the ambient temperature, Φ h is the total heat input, Φ s is the solar irradiance, and A w is the effective window area.
Using the schematic provided in Figure 4, which shows the deterministic part of a thermal system using electric circuit, and adding a stochastic part to it, we get the following grey-box model where y k is the interior temperature generated by the white-box model, !i and !e are standard Wiener processes, and σ 2 i and σ 2 e are the incremental variances of the Wiener processes.The parameters of (2), i.e., C i , C e , R ea , R ie , A w , σ e and σ i , should be found so that the grey-box model behaves as close as possible to the data generated by the white-box.To this end, the optimization problem that finds the parameters leading to the maximum likeli- ) is a conditional density denoting the probability of observing y k given the previous observations and the parameters, ✓, and where p(y 0 |✓) is a parameterization of the starting conditions.Kalman filter is used to calculate the likelihood function, and an optimization algorithm can be applied to maximize it.This can be done using the computer software CTSM-R to calculate the maximum likelihood and estimate the parameters simultaneously.Detailed discussion about the optimization problem and methods of solving it can be found in the works of Madsen (2007), Juhl et al. (2016) and Kristensen et al. (2004).It is noted that this paper considers an apartment as a thermal zone.The similar procedure can also be applied for a building or an apartment with more thermal zones.In this case, one can follow the procedure and find a grey-box model for each zone.Another approach is to classify similar zones and find one grey-box model for each class.In this paper, data generated from the white-box model has been employed to find the grey-box model of an apartment in the SE (South-East) side of the first floor of the Mediterranean demo site building (see Figure 1).Using CTSM-R and the generated data, the estimated values of the parameters of (2) are calculated and given in Table 3.The optimization problem (3) should be solved for all of the thermal structures provided by Bacher and Madsen (2011).It is noted that the estimated values of system parameters should be recorded for each model structure for the further use.

Model selection
After finding the estimate values of the parameters of various structures, it is time to select the best model among them.Three different statistical tests are employed to find the best model.These tests are likelihood ratio test, Akaike and Bayesian criteria.

Likelihood ratio test
Consider two models with parameter spaces ⌦ 0 2 R r0 and ⌦ 1 2 R r1 , with r 0 < r 1 are able to represent the thermal dynamics of a building.In the sequel we review a statistical method that enables us choosing the best model.Assume that ⌦ 0 ⇢ ⌦ 1 , i.e. the model ⌦ 0 is a sub-model of ⌦ 1 .We define a hypothesis test as where H 0 is the null hypothesis, H 1 is the alternative hypothesis, and ✓ denotes the parameters.Using statistical analysis helps us decide on retaining or rejecting the null hypothesis H 0 , that is, we can decide about selecting ⌦ 1 or ignoring it.The likelihood ratio test calculates the ratio of supermum where Y N is the observed values and is the likelihood function.
If H 0 is rejected then the likelihood of the larger model, ⌦ 1 , is significantly higher than the likelihood of the submodel, ⌦ 0 .In this case, it can also be concluded that Y N is more likely to be observed with the larger model.Therefore, the larger model should be selected over the sub-model to describe the information embedded in data.
To compare the models, it is required to calculate the maximum of logarithm of likelihood (log-likelihood) function for all of the models introduced by Bacher and Madsen (2011).The calculated maximum log-likelihood for the models with the same number of parameters are given in Table 4, where the information about the models T i , T i T h T s , T i T e T h T s A e R ia and T i T m T e T h T s can be found in the paper by Bacher and Madsen (2011).Also, the information about the model T i T e is provided in the previous section.
Applying the likelihood ratio test for each two models, the results are provided in Table 7: BIC values allocated to the candidate models.
Model name BIC T i -696140 Akaike's criterion Another commonly used method for model selection is the Akaike's criterion proposed by Akaike (1974).Different from the likelihood ratio test that compares two models with different dimensions, Akaike's test allocates a number (AIC) to each model.Then, the model with the smallest allocated number should be selected.The formula to find AIC for each model is given as where k is the number of model parameters to be estimated and L is the maximum likelihood.Applying Akaike's criterion for the models with maximum log-likelihood, one can calculate the AIC values.These results are reported in Table 6.By comparing AICs, the models T i T e has the smallest allocated number.Therefore, Akaike's criterion suggests that the model T i T e should be selected.

Bayesian criterion
Bayesian criterion is another alternative for the model selection.Similar to Akaike's criterion, this test also allocates a number (BIC) to each model and the model with the smallest BIC should be selected.Compare to Akaike's test, this method is dependent on the data sample size.For more information about Bayesian criterion, see the work by Konishi and Kitagawa (2008).The formula to find BIC is given as where k is the number of model parameters to be estimated, L is the maximum likelihood, and z is the data sample size.Sufficiently large sample size the tion for using this criterion.Applying Bayesian criterion for the candidate models, one can calculate BIC for each model.These results are provided in Table 7.By comparing BICs, the models T i T e has the smallest BIC number, that is, the model T i T e should be selected.

Figure 1 :
Figure 1: Architectural drawings of floor plan of the Mediterranean demo site building's attic floor (top) and the elevation section (bottom).and Model selection Sections.Simulation results are presented, and a summary is given in the next sections.

Figure 3 :
Figure 3: Weekly power and occupancy profile of one household, thermal demand above, electrical consumption below.

Figure 4 :
Figure 4: Electric circuit equivalent to the thermal dynamics of a building.

Figure 5 :Figure 6 :Figure 7
Figure 5: Input, output and residual signals of the grey-box model.
This paper is organized as follows: a description of the Mediterranean demonstration site building is provided in the Case study Section.White-box model, including stochastic behaviours of occupancy and data generation, is introduced in the White-box Section.Backgrounds required to familiarize the reader with the grey-box modeling, an introduction to various model structures and model selection procedure are given in Model structure

Table 1 :
States and activities derived from TUD and the assigned electric devices together with the corresponding DHW extractions(ext.)

Table 4 :
Models with maximum log-likelihood.

Table 5
, where the modelsT i , T i T e , T i T h T s , T i T e T h T s A e R ia and T i T m T e T h T s are denoted as M 1 , M 2 , M 3 , M 4 and M 5 , respectively.
P-value indicates the statistical significance.The p-value less than 5% provides strong evidence that the null hypothesis should be rejected.Hence, considering the results given in Table5, the models M 2 : T i T e or M 4 : T i T e T h T s A e R ia should be selected.

Table 6 :
AIC values allocated to the candidate models.