Enhancing IoT Data Integrity and Effectiveness through hybrid Compression Method: A Step Towards Energy Efficiency

. The expansion of the Internet of Things (IoT) has magnified the challenge of managing data generated by IoT devices, notably in meteorological applications like temperature and humidity monitoring. This research addresses the imperative of efficiently reducing IoT data volume while preserving data integrity and underscores the significant implications for energy consumption. Our approach involved a two-fold strategy, employing the DHT11 sensor and ESP32 microcontroller for data collection, followed by an exploration of various data compression algorithms: delta encoding, run-length encoding (RLE), variable-length integer encoding (VLI), and bit-packing. The strategic combination of RLE and delta encoding yielded an exceptional compression rate of 98%. Beyond data reduction, this methodology offers energy savings by minimizing data transmission times, evidenced by the swift 133-microsecond compression process. Furthermore, the seamless transmission of compressed IoT data to Azure Cloud not only reduced cloud storage costs but also optimized storage space, contributing to energy efficiency. This research illuminates the significance of data compression in mitigating the environmental impact of IoT technologies, fostering a greener, more energy-conscious future.


Introduction
The evolution of the Internet of Things (IoT) [1] has heralded a transformative era in data acquisition, bringing about a paradigm shift in how information is harnessed across a multitude of domains.Notably, the field of meteorology has witnessed a remarkable metamorphosis, driven by the infusion of IoT technologies.These advancements have substantially augmented our ability to monitor and comprehend real-time atmospheric conditions, ushering in a new era of meteorological data collection and analysis.In this dynamic landscape, meteorological data, especially pertaining to crucial parameters such as temperature and humidity, has seen an unprecedented surge in collection rates.State-of-theart meteorological sensors, exemplified by the DHT11 [2] sensor in conjunction with the ESP32 microcontroller [3,4], now generate prodigious volumes of real-time data.However, the proliferation of this data abundance poses a formidable challenge: how to efficiently store, manage, and process these extensive datasets.The unique characteristics of `IoT meteorological data, characterized by its high volume, diverse nature, and rapid generation pace, demand careful consideration in terms of data handling.Beyond our specific research endeavor, a rich tapestry of studies has enriched the discourse around data compression, offering diverse perspectives and approaches [5].In [6], where they conducted an exhaustive investigation into a spectrum of compression techniques, systematically categorizing them based on their unique characteristics.Their work extended to a comprehensive comparative analysis of various lossless compression methods, providing valuable insights into the nuances of each [7,8] ventured into both lossy and lossless data compression methodologies.Their meticulous research focused on assessing essential metrics such as compression ratio, compression factor, compression gain, savings percentage, and compression time.Parallelly, Amandeep Singh and their dedicated team introduced an innovative hybrid data compression technique, notable for its remarkably reduced compression times in comparison to existing methodologies [9].In a concurrent exploration, [10] meticulously compared lossless and lossy data compression strategies, culminating in the proposal of a novel bit reduction algorithm.This algorithm cleverly harnessed number theory and file differential techniques to compress text data while significantly reducing time complexity.In the domain of image compression, [11] conducted an extensive study, scrutinizing an array of lossless techniques encompassing Huffman coding, Arithmetic coding, Lossless predictive coding, and Lossless Jpeg.Their rigorous analysis culminated in the assertion that Lossless Jpeg emerged as the most effective technique, boasting superior compression ratios and reduced processing times.Turning to the compression of Tamil documents, [12] centered their investigation on the Huffman compression technique.Their research clearly highlighted the exceptional performance of this method, particularly concerning compression ratios and peak signal-tonoise ratios in the context of image compression.These techniques were thoughtfully applied to reduce response sizes and response delays, particularly for a significant subset of HTTP content types [13].Furthermore, [14] presented an insightful portrayal of a data compression algorithm's implementation within an FPGA, leveraging the Xilinx Embedded Development Kit. Their work highlighted the inherent advantages of this implementation, which encompassed ease of hardware updates and expedited compression times.On a parallel note, embarked on elucidating the intricacies of implementing sparse matrix and vector operations on the PEZY-SC processor, contributing valuable insights into performance optimization in this specific context.As we delve into the multifaceted domain of data compression, these diverse research endeavors collectively enrich our understanding of the challenges and innovations surrounding this critical field.They serve as beacons of knowledge, illuminating the path forward as we navigate the ever-evolving landscape of data handling, compression, and storage.Our research into tailored data compression solutions for IoT meteorological data brings with it the potential to significantly impact energy consumption.The energy efficiency of data storage, transmission, and processing is of paramount importance, especially in the context of IoT, where devices often operate on limited power sources.By reducing the volume of data that needs to be transmitted and stored, our approach effectively minimizes the energy demands associated with these processes.In particular, the strategic combination of Run-Length Encoding (RLE) and delta encoding that we have employed results in an impressive compression rate of 98%.This reduction in data size directly translates to energy savings during data transmission and storage.IoT devices that collect meteorological data can operate more efficiently, making them suitable for deployment in S E3S Web of Conferences 477, 00042 (2024) https://doi.org/10.1051/e3sconf/202447700042STAR'2023 remote or resource-constrained environments.Furthermore, when data is transmitted to cloud-based storage systems, as is often the case in IoT applications, our compression technique significantly reduces cloud storage costs.By optimizing storage space and reducing data volume, we contribute to the overall reduction in energy consumption related to data handling in cloud environments.In summary, our research not only addresses the challenges of managing vast volumes of IoT meteorological data but also aligns with the broader objectives of energy-efficient data management and storage.As the IoT ecosystem continues to grow, our work provides valuable insights into mitigating energy consumption, ultimately contributing to more sustainable and environmentally conscious practices in this technological era.

Data compression techniques
In the realm of IoT data management and storage, a paramount consideration revolves around the adoption of diverse data compression methodologies.These methodologies are instrumental in addressing the dual imperative of data volume reduction while concurrently upholding data integrity.The overarching domain of data compression methods can be broadly classified into two principal categories: lossless compression and lossy compression, shown in figure 1.Each of these categories proffers a distinct array of merits and demerits, rendering them amenable to selective deployment contingent upon the unique exigencies of the data and the intended applications.In the ensuing section, we shall embark upon an exhaustive examination of these two divergent approaches, meticulously elucidating their inherent characteristics and intricacies.Lossy data compression involves a method where limited data loss is acceptable for specific applications, allowing for real-time transmission, primarily used in media like audio, video, and images.It is unsuitable for preserving all types of data.On the other hand, lossless data compression aims to reduce file size while maintaining data integrity.It is suitable for text-based documents, software, and data where preserving all information is essential.This article focuses on lossless data compression techniques and evaluates their effectiveness in reducing data size without compromising data integrity.

Run-Length Encoding
Run-Length Encoding (RLE) is a data compression method designed to identify and replace repetitive sequences of identical bytes with a count of their occurrences.Essentially, RLE reduces data size by representing continuous sequences of characters, referred to as 'runs,' in a more concise format.Each run is encoded into two bytes.The first byte signifies the 'run S E3S Web of Conferences 477, 00042 (2024) https://doi.org/10.1051/e3sconf/202447700042STAR'2023 count,' indicating the length of the run, followed by the original character.It's important to note that single characters are represented as runs of 1, and individual blank spaces are often disregarded, especially in text data.RLE proves particularly advantageous when dealing with highly redundant data.While it may not achieve the same compression ratios as more advanced techniques, its simplicity of implementation and quick execution render it a pragmatic choice for straightforward compression tasks.

Delta Encoding
Delta Encoding, also referred to as delta compression, constitutes a data compression method centered on the representation of data as the variance between sequential values.Its primary objective is to diminish data size by encoding the change from one data point to the next, instead of storing the absolute values.This technique demonstrates its utility when consecutive values within a dataset display minor fluctuations.

Bit-Packing
Bit-Packing stands as a data compression technique with a principal emphasis on the proficient storage and encoding of data through the aggregation of multiple values into a predetermined number of bits.Its effectiveness becomes particularly pronounced when dealing with a set of values characterized by a constrained range, allowing their representation within a defined bit limit.

Variable-Length Integer Encoding (VLI)
Variable-Length Integer Encoding (VLI) emerges as a data compression technique that places primary emphasis on the efficient encoding of integer values through the utilization of a variable number of bits.VLI's exceptional utility becomes manifest when confronted with a spectrum of integer values characterized by diverse magnitudes, necessitating their representation within a condensed bit structure.This method offers substantial advantages in encoding integers featuring varying magnitudes while concurrently minimizing storage demands.Methodology

Data Collection
The data utilized in this study was acquired through the utilization of a DHT11 sensor, a specialized instrument designed explicitly for the measurement of temperature and humidity.Integration of the DHT11 sensor into a data collection apparatus was facilitated by the incorporation of the ESP32 microcontroller.The ESP32 microcontroller played a pivotal role in the acquisition of real-time temperature and humidity metrics, diligently procured from the DHT11 sensor.The DHT11 sensor was strategically positioned within a meteorological environment, deliberately exposed to fluctuations in both temperature and humidity.Systematic readings emanating from the sensor were meticulously documented at regular intervals, thereby culminating in the generation of a coherent and meaningful dataset.The process of data collection transpired continuously over a predefined temporal expanse to guarantee the representativeness of the amassed data.The dataset in its raw form comprises a series of paired values encompassing temperature and humidity measurements.To facilitate the acquisition of precise meteorological data, we established a meticulous data collection configuration centered around two principal components: the DHT11 sensor and the ESP32 microcontroller.The initial phase involved the establishment of a secure connection between the DHT11 sensor and the ESP32.This connection was meticulously orchestrated, necessitating the alignment of three pins dedicated to power supply, data transfer, and grounding.The steadfastness of this connection was paramount to obviate any potential disruptions.In order to guarantee uninterrupted data collection, an external power source was judiciously employed to energize the ESP32.This strategic decision obviated concerns related to power interruptions that could potentially compromise the integrity of our data.Moreover, we took great care in positioning the DHT11 sensor within the target environment, ensuring an unobstructed and optimal exposure to the ambient air for precise data collection.On the software front, the Arduino Integrated Development Environment (IDE) was harnessed in conjunction with the DHT library (DHT.h).This prudent utilization greatly facilitated the communication between the ESP32 and the DHT11 sensor, streamlining the process of data collection at predetermined intervals.The ESP32 microcontroller was intelligently programmed to autonomously gather temperature and humidity data, thereby ensuring the consistent and dependable generation of a dataset that would serve as the foundation for our subsequent analysis.

Delta-RL Encoding Method
The hybrid compression method employed in this project combines two distinct algorithms: Run-Length Encoding (RLE) and Delta Encoding.This approach is designed to maximize data reduction while preserving data integrity when dealing with meteorological data collected from a DHT11 sensor interfaced with an ESP32 microcontroller.Our Delta-RLE hybrid compression method is initiated with the application of delta encoding, a process that computes differentials in temperature and humidity values between sequential data readings.It adeptly discerns regions of minimal fluctuation amidst data points, thereby minimizing the magnitudes of delta values.The resultant dataset, encoded via delta encoding, is subsequently subjected to the Run-Length Encoding (RLE) algorithm.This algorithm proves highly efficient in compressing sequences characterizing alterations in temperature and humidity.RLE not only captures recurring delta values but also conveys the count of occurrences for each distinct sequence.This approach attains an exceptional compression rate.This remarkable accomplishment translates into a significant reduction in the storage requisites for meteorological data within Internet of Things (IoT) applications.The speed and efficiency exhibited by this hybrid approach position it as an exceptionally compelling option for scenarios where constraints associated with storage capacity and expenses are of paramount importance.

Compression Ratio
The compression ratio (CR) is a metric that quantifies the extent of reduction in data size realized through the application of a particular compression algorithm.It is ascertained by juxtaposing the size of the compressed file against the size of the original file.CR can be expressed as follows:

Compression Factor
The term "Compression Factor" is formally defined as the quotient obtained by dividing the size of the original file by the size of the compressed file.In contrast to the Compression Ratio (CF) can be expressed as follows:

Space Saving
The term "Space Saving" denotes the process of diminishing storage space utilization through data compression.It provides a quantitative representation of the disparity between the initial storage capacity taken up by data and the diminished capacity necessary subsequent to the application of a compression algorithm.Space Saving (SS) metric is often articulated as the exact volume of space saved, typically measured in bytes or other pertinent units.Such quantification offers valuable insights into the effectiveness and efficiency of compression methods.

Results and discussions
The results of the data compression experiments, which focused on various compression algorithms, have been presented in table 1, emphasizing metrics related to data size reduction, compression ratio, space saved, and compression time in microseconds.Deltat Encoding demonstrated a notable compression ratio of 87.5%, leading to a substantial space saving of 7500 bytes.This method exhibited efficient performance, achieving compression within a mere 133 microseconds.Run-Length Encoding (RLE) surpassed Deltat Encoding, delivering a remarkable compression ratio of 89.10%.Consequently, RLE resulted in substantial space savings, reducing data size by 10622 bytes, and operated swiftly, with a compression time of 105 microseconds.These findings affirm the effectiveness of RLE as a proficient compression method within the given context.In contrast, Bit-Packing offered a compression ratio of 50%, leading to a space-saving of 4000 bytes.However, it required a slightly longer compression time, totaling 360 microseconds.Similarly, VLI (Variable-Length Integer Encoding) achieved a 50% compression ratio and saved 4000 bytes while demanding a compression time of 363 microseconds.Both Bit-Packing and VLI delivered moderate compression ratios, making them suitable choices for specific applications, particularly those where a trade-off between compression ratio and compression time is acceptable.
The most compelling results were attained through the combination of Delta Encoding and Run-Length Encoding, producing an extraordinary compression ratio of 99.25%.This hybrid approach resulted in significant space savings, reducing the data size by 7040 bytes.Remarkably, this impressive compression ratio was accomplished with a relatively fast compression time of 139 microseconds.Therefore, the hybrid Delta-RL Encoding approach stands out as the most efficient method, offering the highest compression ratio and space saving, making it particularly well-suited for applications where data size reduction and preservation of data integrity are paramount.

Conclusion
In addressing the challenges associated with efficiently managing and storing meteorological data from a DHT11 sensor, this study has made significant strides.The comprehensive evaluation of various data compression techniques, including Delta Encoding, RLE Encoding, Bit-Packing, and VLI, has revealed the potential of a novel hybrid approach, which combines Delta Encoding and RLE Encoding, to achieve a remarkable compression rate of 99.25%.This innovative solution effectively reduces the size of the data, thus yielding S E3S Web of Conferences 477, 00042 (2024) https://doi.org/10.1051/e3sconf/202447700042STAR'2023 substantial cost savings during data transfer to Azure cloud storage.The findings presented in this research underscore the practicality and economic benefits of adopting the hybrid compression approach.While the study's focus was on meteorological data, the implications extend beyond this domain.The hybrid compression method holds the potential to address data storage and transfer challenges in a broader spectrum of applications, particularly in the realm of the Internet of Things (IoT) and cloud-based solutions.Furthermore, the application of efficient data compression techniques, such as the hybrid approach developed in this study, directly aligns with efforts to reduce energy consumption in data handling and transmission processes.By significantly decreasing data size, this method reduces the energy needed for data transfer and storage.In a world increasingly concerned with sustainable practices and energy efficiency, these findings not only enhance data management but also contribute to the broader goal of reducing the carbon footprint associated with data-intensive applications.
In summary, this project has not only contributed to our understanding of data management in the context of IoT and cloud-based applications but has also emphasized the significance of selecting the most suitable compression methods tailored to specific applications.As we move forward in the era of burgeoning data generation, such research endeavors become increasingly vital in optimizing data handling processes, reducing energy consumption, and unlocking the full potential of data-driven technologies.

S 2023 Fig. 2 .
Fig. 2. A sample of collected data which includes a set of temperature and humidity pairs.

Fig. 3 .
Fig. 3. Setup environment using two key components: the DHT11 sensor and the ESP32 microcontroller.

Table 1 .
Performance of Data Compression Algorithms