A Machine Learning approach for personal thermal comfort perception evaluation: experimental campaign under real and virtual scenarios

. Personal Thermal Comfort models differ from the steady-state methods because they consider personal user feedback as target value. Today, the availability of integrated “smart” devices following the concept of the Internet of Things and Machine Learning (ML) techniques allows developing frameworks reaching optimized indoor thermal comfort conditions. The article investigates the potential of such approach through an experimental campaign in a test cell, involving 25 participants in a Real (R) and Virtual (VR) scenario, aiming at evaluating the effect of external stimuli on personal thermal perception, such as the variation of colours and images of the environment. A dataset with environmental parameters, biometric data and the perceived comfort feedbacks of the participants is defined and managed with ML algorithms in order to identify the most suitable one and the most influential variables that can be used to predict the Personal Thermal Comfort Perception (PTCP). The results identify the Extra Trees classifier as the best algorithm. In both R and VR scenario a different group of variables allows predicting PTCP with high accuracy.


Introduction
Users spent much of their time indoors thereby the quality of the environments inside buildings and the occupants' satisfaction and well-being is topical today [1]. The Indoor Environmental Quality (IEQ) is a holistic approach that includes thermal and visual comfort, indoor air and acoustic quality. Thermal Comfort (TC) is by far the most analysed aspects [2], and it is often correlated with the energy consumption of buildings [3].
Thermal comfort is defined as the condition of mind that expresses satisfaction with the thermal environment. The classical approach on TC is based on the studies of Fanger [4] that found that TC is influenced by six factors related to the environment: air temperature (AT), relative humidity (RH), air velocity (AV), and mean radiant temperature (RT) and to users' conditions: metabolic rate (MET) and thermal insulation of clothing (Iclo). Fanger's model established two indexes to quantify the thermal comfort of occupants based on the heat balance approach: the Predicted Mean Vote (PMV) and the Predicted Percentage of Dissatisfied (PPD).
Starting from the experience of Fanger, the increase in use of new technological solutions have allowed improving the classical assessment to overcome the limits of the steady-state model and those inspired to this approach, paving the way for the development of adaptive models where people are actively involved in the control of the thermal environment [5,6].
In recent years, the resources made available by the IoT and ML techniques have switched to a prediction of an individual's thermal comfort response, passing from an average assessment of population's TC perception to a Personal TC perception (PTCP). Personal thermal comfort models are built based on pervasive collection and analysis of a large amount of environmental and personal data and their relationships with the users' thermal perception of the indoor climate. This domain of research is investigated by several authors through field studies involving occupants in different thermal configurations [7], [8], [9].
Today VR is reaching almost all sectors. Building Information Modelling is the area where VR can be widely used. Researchers investigated the potentials of this technology in specific fields such as energy efficiency [10] and TC [11], finding interesting perspectives for future developments. In the present article, TC is evaluated in an overall perspective considering the thermal perception of the users and the effect of internal and external variables.
Considering the complexity of the domain of PTCP models, the article presents the results of a research aimed at experimenting the thermal perception of users in real and virtual environment. The text involves 25 participants within a test chamber and exposed to a R and VR scenario. Each participant is required to provide his PTCP by responding to a questionnaire in distinct phases of each scenario characterized by different colours, while biophysical parameters and indoor environmental variables are detected using IoT devices.
The environmental data and the answer to the questionnaire are used to assess the TC level expressed by the PMV. Then PTCP is considered as a target feature in ML approach.

Test setup
The research was carried out in a test cell located in San Giuliano Milanese, near Milan, and involved 25 participants who were asked to answer a questionnaire about their thermal perception inside the cell. Each participant was immersed in R and a VR scenario; the goal of the research is to compare the thermal perception in the two scenarios, investigating the potentialities of virtual reality in thermal comfort studies. ML techniques were applied for this purpose in order to manage the great amount of data collected during the experimental campaign.
The test was performed for 8 days and each participant alternated from R to VR scenario within the test cell. In each scenario, the participants were asked to watch a video reproducing red and blue coloured lights. In R scenario, this effect was reinforced with strip LED. In the VR few dynamic points lights were created, placed at the main lighting sources with no use of the baked light. The deferred rendering method was used for this purpose. The LED .ies files increased the reliability of the lights with bulbs used in the room, and the transition of the RGB LEDs in the various phases of the experience were synchronized to the video through the unreal editing tools.
The test during R and VR scenario was similar, just the devices used to reproduce the scenarios were changed. In R scenario the video was transmitted on a desktop monitor and the users' feedback were recorded through a smartphone. In VR scenario the whole scene took place through the use of VR headset and full tracking controller. The experience in R and VR were separated by a period of 45 minutes. Much attention was paid to realize a virtual scenario as close as possible to the real conditions, avoiding performance drops in order to persuade the participant to be immersed in a "real" environment. Some expedients were used: a high detail modelling dedicated only to near objects, a frame rate of 90 fps, few material textures avoiding small elements like LED strips and distant details, since they are not in any case visible to the end user. Figure 1 shows four frames of the experimentation for R and VR, considering two distinct participants involved in the test.
The participants filled out a web-based questionnaire submitted via Google Forms defined according to the guidelines provided by the Standard ASHRAE 55 [13].
The test began when the participant entered the cell. The first period, of about 20 minutes, was dedicated to the acclimatization of the individual. In this time, the participant filled the first part of the questionnaire related to generic data that could affect the perception of thermal comfort (Table 1, Part I). Afterwards, the participant was asked to watch a 17-minute video divided into two parts: in the former, the video reproduced heatrelated scenes, such as volcanoes, while LED strips emitted red light; in the latter, the video reproduced cold-related scenes, such as snow and glaciers, while the LED strip lights emit blue colour. The video is off between part one and part two and, at the end of the video, the LED lights is set on white colour, allowing the participant to fill the Part II of the questionnaire, about the thermal perception (Table 1, Part II). The answers about the thermal sensation of each participant represented the PTCP.
More details about the experiment can be found in [14].

Monitoring system
The test cell was equipped with a monitoring system for the collection of the environmental variables affecting thermal comfort. This system consisted of: • two thermo-hygrometric sensors, with a measure range of 0-100% and accuracy of ±2% for relative humidity and −40 to +60 °C and ±0.1 °C for air temperature; • a black globe thermometer, with a measure range of −40 to +60 °C and accuracy of ±0.1 °C; • two hot wire anemometers, with a measure range 0-5 m/s and accuracy of ±0.02 m/s. The sensors were connected to a 32-bit ARM data logger recording the monitored values with a 5-s detection frequency The biometric data of participants were measured with two smart wristbands Empatica E4, placed on both arms. The device integrates a PhotoPlethysmoGraphy (PPG) sensor for the detection of the Heart Rate, an EDA sensor for the measurement of the electrical properties of skin, an infrared thermopile for the measurement of the skin temperature.
In the test cell a desktop with a monitor and a chair were located (Figure 2). On the desktop a smartphone was located and used only in the real scenario to record the feedback of the users. The equipment was completed by one RGB strip LED installed on the rear of the monitor and in the upper edge of the desktop, an HTC VIVE viewer, a tripod with a camera for recording each experience. The test cell was equipped with a mono split with direct current inverter with a heating capacity of 2800 W. It was connected to a thermoregulator Vemer HT NiPT-1 accessible through the electrical panel.

Air conditioning settings
The test was performed in heating mode. The operating conditions of the test cell were analysed through Computational Fluid Dynamics (CFD) to assess the presence of local discomfort areas caused by the air speed levels due to the air-conditioning system and to find the best configuration of the installed DC mono split. The device could operate considering 3 different levels of air flow rate: 560, 430 and 330 m 3 /h, and different angle of direction of the deflector of the split. Figure 3 shows the control horizontal (at 1.1 m from the pavement) and vertical planes at the position of the participant. The mesh of CFD model were with cell size of 0.07 x 0.05 x 0.05 m; near the split and the suction the meshes were refined in order to capture the smallest detail in the model SimpleFoam solver (OpenFoam steady-state solver) [15], [16].
The optimum flow rate (330 m 3 /h) and the best direction of the deflector was identified through the CFD simulations, maintaining air velocity closest to the threshold values required by EN ISO 7730 for single office and standard performance (category B), equal to 0.16 m/s. Figure 3 shows the speed vectors in the optimized situation.

Environmental data and PMV
The environmental data collected by the monitoring system during the test show how the thermal characteristics of the test cell both in R and VR scenario are in general compliant with the category B for office building defined by EN ISO 7730. Table 2 shows the thresholds required by the standard and the monitored variables during the test. The operative temperature, To, calculated according to the international standard, has values in the range 22±2° C; the air velocity calculated as the mean value between AV1 and AV2, is constantly lower than 0.16 m/s and the vertical temperature difference between head and ankle is lower than 3°C. The monitored RH average value between RH3 and RH4 is in the range 30-40% in accordance with [17] for winter period. With these values the calculated PMV is in the range -0.5<PMV<0.5.  Table 3 shows the dataset defined by collecting all biometric and environmental data with Binary Labels value equal to 1 (filtered data) used to apply the ML algorithms.  The importance of the individual features determining target value PTCP_R and PTCP_VR was done using the Extremely Randomized Tree technique to detect the importance of each variable in predicting PTCP. Figure 4 shows that in R scenario, among the biometric variables, Tskin overcome the threshold of feature importance, defined equal to 0.052 to consider, for PTCP_R and PTCP_VR prediction, four input variables. In both scenarios, data related to the 3D accelerations are not relevant cause to the sedentary activity pattern. At the same time, EDA and HR have a low importance in predicting PTCP_R. In VR scenario non biometric variables reach the threshold, but as in R scenario, Tskin has the highest value.

Comparison of real and virtual scenarios
The variable Colour has the highest values of importance in predicting PTCP in both scenarios. Considered a threshold feature importance value (dotted line), the following features are identified to have the major impact in predicting PTCP: • PTCP_R prediction through ML, most important features: Tskin, Colour, RH4, AT4; • PTCP_VR prediction through ML, most important features: Colour, RH3, RH4, AT3.

Fig. 4. Feature importance selection for PTCP_R and PTCP_VR prediction.
The dataset is split into two subsets: 60% for training, 40% for test. Six different algorithms are compared to identify that with the best performance in predicting PTCP in both scenario: Linear Discriminant Analysis, Logistic Regression, Classification and Regression Trees, Extra Tree Classifier, Linear Support Vector Classifier, Random Forecast Classifier. The metric of 'accuracy', defined as the ratio of the number of correctly predicted instances in divided by the total number of instances in the dataset, is used to evaluate the reliability of models in predicting PTCP. The k-fold cross validation (k = 10) is used to evaluate the performances of the different algorithms performed considering the tuning of hyperparameters. ETC obtains the highest average accuracy (R=0.997, VR=0.996) and lowest standard deviation (R=0.003, VR=0.004). The high performances on both scenarios are confirmed also on the validation set.

Conclusion
The proposed approach allows verifying differences in terms of PTCP prediction considering participants monitored in R and VR scenarios. As described above, in both VR and R scenarios, the Colour feature is the most important for the prediction of PTCP. Among the other useful features, Tskin is the only biometric variable with a relevant importance in PTCP_R prediction. On the contrary, in VR, even if Tskin emerges among other biometric data, it does not reach a sufficiently high value above the threshold. This paper also allows highlighting how VR can used in the design of comfortable environment because it speeds up the whole process, reducing the required resources to build traditional real settings. Introducing VR environments also increases the complexity of collectible data, allowing for a deeper analysis and correlation between PTCP, the provided sensorial content, and users' behaviour. In this early stage, it is important to collect as more data as possible to compare, in R and VR, participant's feedback to develop useful models. The extension of application of VR from Indoor Lighting Quality assessment to Thermal Comfort permits to analyse different possible scenarios. A possible future research can develop a holistic methodology also including other aspects of Indoor Environmental Quality, such as Acoustic Quality and Indoor Air Quality.

Ethic Committee Approval
The protocol used in this work was approved by CNR Ethics and Research Integrity Commission (Protocol n. 0005610/2020).