Review of modern demand control solutions and technologies for HVAC operation

. HVAC systems, which use traditional control strategies with fixed ventilation rates or with ventilation rate schedules, do not adjust according to the required IAQ and thermal comfort. As a result, building spaces are being over or under-ventilated. In this paper, the latest modern solutions for demand-controlled HVAC system operation are analyzed, based on the review of existing studies. Such modern technologies as human detection systems, computer vision, and neural network applications are looked at. Different types of human presence detection are presented based on the applied technology. The most common ones are indirect detection based on the usage data of existing IT equipment, and direct detection through the use of passive infrared sensors, wearable tags, and vision sensors. Also, the potential solutions of human activity monitoring, skin temperature, and clothing level detection systems are examined. The studies discussed in this paper show real application examples and prove the benefits of using the technologies for the control of ventilation systems in various building types. Research has shown that such technologies have a favorable effect on both indoor air quality and system energy consumption. In the future, the ventilation system should be equipped with cameras for a more accurate analysis of the room and occupancy. Also, the systems must consider occupant behavior, activity, and other information, which can be used for indoor environment quality improvement. Based on the gained knowledge a sensor capable of human detection, accounting, and location marking is developed.


Introduction
HVAC systems for public buildings are responsible for a large amount of consumed energy -up to 50% of total energy [1]. To improve the situation, it is necessary to adjust and control these systems that they heat, cool or ventilate the premises only when and as much as is necessary. To optimize the ventilation systems of naturally ventilated classrooms it is proposed to use automatic control of window opening based on pupil location and IAQ parameters [2]. To gain such a possibility one potential solution would be to utilize human locating sensors. Occupancy sensing is an enabling technology for smart buildings of the future; knowing where and how many people are in a building is key for saving energy, space management, and security [3].
There are several different approaches that can be used to detect human presence and location for HVAC control. These include the following: x Human detection systems using IT equipment: These systems use computers or other IT equipment to monitor the presence and location of people in a building. For example, they may use sensors, cameras, or other devices to track the movement of people or to detect the heat and other emissions generated by their bodies. [4,5] x Human detection systems using passive infrared (PIR) sensors: PIR sensors are commonly used to detect the presence of people in a building. These sensors work by detecting the infrared radiation emitted by the human body, which is invisible to the naked eye. They are typically used in security systems, but can also be used for HVAC control. [6,7] x Human detection systems using wearable sensors or tags: In this approach, people are equipped with sensors or tags that transmit information about their location and movements. This information can be used to track the presence and location of people in a building and to adjust the HVAC system accordingly. [8,9] x Human detection systems using vision sensors: Vision sensors, such as cameras, can be used to detect the presence and location of people in a building. These sensors can capture images or videos of the environment, which can then be processed using computer vision algorithms to identify and track people. [10,11] x Human detection using neural network systems: Neural network-based systems can be used to detect the presence and location of people in a building. These systems use machine learning algorithms to learn to recognize the features of a human body in an image or video and to track their movement over time. [12,13] Each of these approaches has its advantages and disadvantages, and the best approach for a particular application will depend on the specific requirements and constraints of the system. However, one of the more promising solutions for human sensing would be through the use of video cameras coupled with computer vision algorithms as this does can provide high accuracy and does not require any active human involvement. An existing review article [14] regarding the usage of such systems states that the most successful way seems to consist of the combination of Deep Learning with classical Machine Learning models because this seems to imply high levels of accuracy and less computation with respect to handdesigned features and classification. In another research [15] guideline for the selection of algorithms is given. It is based on the necessary application and includes the variation of the human poses and viewpoints, occlusion, size of the human object, full or partial detection (only the head-shoulder is of interest), real-time requirement, and format of the input (i.e. static image, video, stereo, etc.). However, for such a system to work a wide-angle camera should be mounted near the ceilings. But in such a case multiple cameras must be used in case of larger premises. To avoid complicating the system a single fisheye camera with 360° vision can be used instead. In such situation the standard people detection algorithms do not provide good results due to distorted image and unique radial geometry [3]. Convolutional neural networks (CNNs) are a type of deep learning algorithm that is particularly well-suited for image classification tasks. CNNs can automatically learn spatial hierarchies of features from input images, making them effective at detecting objects in images and frames. By dividing an image or frame into small parts, or patches, and then processing each patch individually, CNNs can perform all of their calculations in parallel, allowing them to quickly and accurately classify objects in the image or frame, no matter where they are located. The most commonly used human recognition algorithms such as YOLO [16], SSD [17], or R-CNN [18] work fine if the persons are standing upright. In cases when the images are taken by a fisheye camera these algorithms perform poorly [19].
At the same time, CNN-based people detection methods have been proposed for overhead, fisheye images. In research [20] a variant of the YOLO object detection algorithm was used to detect people in realtime on embedded devices. To do this, the researchers modified the YOLO algorithm to use only grayscale images, rather than color images, as input. This helped to simplify the network structure and improve processing speed. Additionally, the researchers used background subtraction techniques, based on an adaptive Gaussian mixture model, to further improve the accuracy of the people detection. By making these changes, the researchers were able to achieve the high processing speeds needed for real-time people detection on embedded devices. In a different research [21] data augmentation technique called rotation-invariant training to improve the performance of the YOLO object detection algorithm on rotated perspective images was used. To do this, the researchers added rotated versions of standard images to the training dataset to simulate the various poses and orientations that people can have in fisheye images. This helped to improve the algorithm's ability to detect people in these images, even when they are not perfectly upright. Additionally, the researchers proposed a clustering-based method to refine the bounding boxes produced by the YOLO algorithm, as an alternative to the commonly used non-maximum suppression technique. By applying these techniques, the researchers were able to improve the accuracy and robustness of the YOLO algorithm on rotated perspective images. In another YOLO-based people detection method [22] the researchers proposed a method that extracts highly-overlapping windows from the input images, to avoid misses, and then dewarps these windows using an omnidirectional-to-perspective image mapping. This allows the YOLO algorithm to be applied directly to the dewarped windows, which are in a standard perspective format. To further improve the accuracy of the people detection, the researchers also proposed several variants of non-maximum suppression as a post-processing step. By applying these techniques, the researchers were able to improve the performance of the YOLO algorithm on people detection tasks.
In research [23] a novel neural network architecture for bounding box regression, which is a common task in object detection algorithms, was proposed. Unlike many existing algorithms, which use networks designed for perspective images, the proposed architecture is specifically designed for this task and can outperform existing methods on some benchmarks. The details of the architecture and its performance on different benchmarks are described in the research paper.
The research [24] used orientation-aware convolutional layers and orientation-aware regression in their novel neural network architecture for bounding box regression. Orientation-aware convolutional layers are convolutional layers that have been modified to take into account the orientation of the input data. This can help the network to better capture the spatial relationship between different parts of the input, which is particularly important for bounding box regression. Orientation-aware regression, on the other hand, is a method for predicting the bounding box coordinates that take into account the orientation of the bounding box in the input image. This can help to improve the accuracy of the bounding box predictions, especially for objects that are not perfectly upright in the image. By using both orientation-aware convolutional layers and orientationaware regression, the researchers were able to improve the performance of their bounding box regression network.
Researchers [25,26] have investigated the potential use of HVAC control systems that can determine the clothing level of people in a building, and adjust the indoor temperature accordingly. The study found that such systems could improve the comfort of occupants and reduce energy consumption. Two different approaches were tested in the study: one using vision E3S Web of Conferences 396, 02020 (2023) https://doi.org/10.1051/e3sconf/202339602020 IAQVEC2023 sensors and a pre-trained convolutional neural network (CNN) based on the MobileNetV2 architecture, and another using computer vision and CNN. Both approaches achieved similar levels of clothing level determination accuracy, with the vision-based system achieving an accuracy of 86% and the computer visionbased system achieving an accuracy of 90%. In terms of occupant satisfaction, the vision-based system was preferred by 90% of the people served by it, while the computer vision-based system was preferred by 38% of the people served by it. Additionally, the computer vision-based system resulted in a decrease in the number of people who felt "cooler" by 81%. These results suggest that HVAC control systems that can determine clothing levels and adjust the indoor temperature accordingly can be effective in improving occupant comfort and reducing energy consumption.
There are several different approaches that can be used for determining human motion, each with its advantages and disadvantages. Some of the strategies that have been studied in existing research include the following: x Wireless wearable sensors: This approach involves using sensors that are worn by people and equipped with heart rate sensors and accelerometers. These sensors can track the movement and activity of the wearer, and transmit the data wirelessly to a central system. The accuracy of this approach can be quite high (e.g. 85%), but its practical use is limited because people need to wear the sensors in order to be detected. [27][28][29] x Cameras and deep learning: This approach uses cameras to capture images or video of the environment, and then applies deep learning algorithms to process the data and determine human activity. One commonly used algorithm is the long short-term memory (LSTM) recurrent neural network, which is well-suited to processing timeseries data. This approach can achieve good accuracy (e.g. 83%) in distinguishing between different activities, but it requires the use of powerful computing resources. [12] x Heat emissions: This approach involves detecting the heat emissions from the human body, which can be used to infer the activity of the person. This can be done using sensors that are sensitive to infrared radiation, or by analyzing the heat gain profiles of different activities. The accuracy of this approach can be quite high (e.g. 81%), but it may be limited by the need for accurate heat gain profiles and the sensitivity of the sensors. [30] The use of modern technologies in HVAC systems, such as occupancy and activity detection, can improve the IAQ and reduce energy consumption. These modernized control systems can automatically adjust the heating, ventilation, and air conditioning settings based on the number and type of people in a room, as well as their activities. This can lead to more efficient use of energy, as the HVAC system will only be running at full capacity when it is needed, rather than constantly maintaining a set temperature. This can also improve IAQ, as the HVAC system can be adjusted to provide more fresh air and better air circulation when there are more people in a room, for example.

Methods
In the scope of the project, the first prototype of the human location sensor coupled with IAQ sensors is being developed. As the main computer of the sensor Jetson Nano (reComputer Jetson-10-1-A0 -Nvidia Jetson Nano 4GB RAM + 16GB eMMC) is used. It is coupled with a KNX router which serves as a connector between the CO 2 , RH, and temperature sensors. For IAQ measuring Jung KNX CO 2 multi-sensor with a CO 2 measuring range of 0 to 2000 ppm, RH range of 10 to 95 %, and temperature range of −5 to +45°C is used. As for the camera, Arducam IMX219 8Mps wide-angle camera module with type 1/4″ Image Sensor Format and Pixel Size of 1.12μm×1.12μm is used. The principal scheme of the sensor is shown in Fig. 1.   Fig. 1 Principal scheme of the human location and IAQ sensor All of the sensor elements were connected and a bracket for mounting in false ceilings was added. Also, a separate cover for the IAQ sensor was 3D printed to the necessary size. The finished prototype is shown in Afterward the RAPiD, a Rotation-Aware People Detector in Overhead Fisheye Images was installed on the device's computer [3]. For testing purposes output images from RAPiD with marked detections and confidence, and values can be saved on the sensor's computer. A web-based user interface was developed to control the sensor. It allows to start and stop recording as well as shut down the CPU. It gives real-time camera picture which refreshes every 30 seconds. Also, the IAQ data and a diagram of detected person locations relative to the image is shown in the browser. Communication between the UI web page and sensor is done via Google Firebase real-time database. The RAPiD detector is implemented using PyTorch, it runs on Jetson Nano GPU with approximately 0.2 fps with 1024x1024 images. It contains more detailed information regarding the sensor status. Such information as IP address, person location logged as points with X and Y coordinates, and last time data was received is available.
For the sensor to work in a real environment it must be connected to a 220V electricity network and wi-fi internet.

Results
As an initial test of the developed sensor it was placed in a university classroom (Fig. 3). The dimensions of the analyzed classroom are 5,1m x 11,7 m x 2,7m (height). The maximum number of pupils in the classroom is 36 and a lecturer. The obtained data provided by the developed sensor was compared with a visual analysis method. The first simple result can be seen in Fig. 4. It shows that the sensor works in the given situation and is capable of detecting a human presence in the whole area of the classroom. The sensor was also tested during a typical lesson. The results showed that also for this case the sensor coupes with human detection and accounting.
The next step in ventilation system control development is necessary to add a window actuator which would be controlled by a driver. The principal scheme is shown in Fig. 5. The actuator can be run by a 24V electricity connection. The logic of how the windows should be operated must be determined through an algorithm that takes into account the IAQ parameters and human location in the room.

Conclusions
The use of modern technologies, such as computer vision and neural networks, in HVAC control systems, can improve their efficiency and effectiveness. By using these technologies, HVAC systems can automatically adjust their settings based on the number and type of people in a room, as well as their activities, leading to better air quality and more efficient use of energy. Additionally, the study of the metabolic rate and its impact on human comfort, as well as the influence of clothing on an occupant's sense of temperature, can provide valuable insights for designing effective HVAC control strategies.
The use of convolutional neural networks (CNNs) in HVAC control systems can improve their ability to recognize patterns, such as the number and type of people in a room, as well as their activities. This can be accomplished through the use of computer vision algorithms, such as YOLO and SSD, which are based on CNNs and are effective for image and video recognition. These systems can be used for occupancy and activity detection, which can provide valuable information for automatically adjusting the settings of HVAC systems. The use of these technologies can lead to more efficient and effective control of HVAC systems, improving indoor air quality and reducing energy consumption.
In the scope of the study, the first version of the developed prototype of the human detection sensor is shown. It consists of a fisheye camera which is connected to a Jetson Nano CPU and has additional IAQ sensors. For human detection and recognition, the Rotation-Aware People Detector in Overhead Fisheye Images is used. The first testing results of the camera show that it is capable of accurate human detection and accounting in a university classroom.
In future works, further testing of the sensor must be performed. It will be tested under different conditions and human positions. Also, the measured IAQ parameters will be compared between the ones provided by the sensors and additional devices placed in the classroom to find out if there is a difference. Afterward, the sensor will be coupled with window actuators for controlling natural ventilation depending on the IAQ parameters and human location.