Study on applicability of data collection frequency for heavy-duty vehicles based on remote monitoring

. The remote online system of heavy-duty vehicles is composed of vehicle terminal and monitoring platform based on the specified communication protocol, data format and data transmission frequency, which is an important embodiment of heavy-duty vehicle pollution control in the application for Internet of Vehicles. The influence of different collection frequencies and compute cycles on vehicle energy consumption and emission calculation is studied in this paper. The calculation errors of mileage, vehicle fuel consumption per 100 kilometers and emission factors of 14 collection frequencies and 7 compute cycles were calculated by continuously monitoring the emission and energy consumption data for 10 heavy-duty vehicles uploaded in 1 month. The result shows that decreasing the data collection frequency will lead to the increase of error and the decrease of correlation, while increasing the computing cycle can reduce the error. When the calculation error is 1%, 5% and 10%, the collection frequency shall be at least 0.5Hz, 0.2Hz and 0.1Hz, and the computing cycle shall be greater than 1800 seconds, 3600 seconds and 7200 seconds. The study content of this paper provides the theoretical foundation for the application and storage of remote monitoring data of heavy-duty vehicles, and provides a solution to the waste of data storage space caused by the problem of remote monitoring big data set.


Introduction
Remote monitoring construction of heavy-duty vehicle is constantly improved.China and other environmental protection administrations in China actively encourage the *Corresponding author: ligang@vecc.org.cninstallation of vehicle terminal and require China VI vehicles to possess networking conditions.The data collected by remote monitoring are obtained from vehicle CAN bus, vehicle sensor and GPS module respectively, forming a big data set with multi-source heterogeneity, dynamic growth and wide distribution.Data with high quality and high collection frequency will truly reflect the emission situation of heavy-duty vehicles in the actual operation, which can be used to analyze the compliance of vehicle emission standards and in-service conformance for remote monitoring of heavy-duty vehicles.At present, the Data Sampling Interval (DSI) is 1Hz and the Data Transmission Interval (DTI) is 0.1Hz in the remote monitoring of heavy-duty vehicles.However, high DSI and DTI will produce high communication cost and many data issues to face.
Data volume is huge.The data format and data transmission frequency of vehicle terminals are calculated according to Appendix Q [1] of GB 17691-2018 <Limits and Measurement Methods for Emissions from Diesel Fuelled Heavy-Duty Vehicles> (hereinafter referred to as Appendix Q) --the theoretical size of the message transmitted of each heavy-duty vehicle is 431 bytes (0.42 KB/s).Taking 62,512 networked heavy-duty vehicles in Beijing (by the end of 2020) as an example, the data uploaded by Beijing's monitored heavy-duty vehicles consumes up to 200TB of storage resources in a year, and there are about 220,000 heavy-duty vehicles having access to internet in Beijing.China possesses a huge inventory for commercial vehicles, the number of commercial vehicles that meet the China VI standard was 5.133 million in 2020 according to the statistical data of China Association of Automobile Manufacturers [2] .Therefore, heavy-duty vehicle online monitoring will be faced with the data volume of PB or even EB level confronting the full implementation of the China VI standard phase b in 2021.
Data value density is low.The data collection of heavy-duty vehicles is different from the other industries.Its large amount of data is collected by vehicle sensor, which has numerous interference factors in the actual use of vehicles.High DTI will lead to a large number of invalid data collections, while the data quality of heavy-duty vehicles is worrying, the waste of data storage resources is further intensified.Zhang et al. [3] conducted statistics on the ratio of data lost to data invalid for data quality of 132 heavy-duty vehicles with upload frequency of 1Hz, and found that the key monitoring parameter data sent by modified vehicles in service were invalid in about 64% of the time.The main reasons for a large number of invalid data collections for heavy-duty vehicles are as follows: defects in data collection methods or principles, errors or improper use of sensors, etc. [4] .Therefore, higher data collection frequency does not mean higher data value.
The above two points are the current problems that have to be faced for remote monitoring.In order to achieve online monitoring preferably, fully explore the big data value of heavy-duty vehicle monitoring, and balancing data collection and reasonable use of resources, this paper studies the applicability of DSI for the physical quantity commonly used in remote monitoring of heavy-duty vehicles, and provides monitoring suggestions from the perspective of data application and data storage.

Construction of optimal DSI model scheme
Liu et al. studied the accuracy of map matching under DSI between 5~60 seconds with a step size of 5 seconds and different DCI [5] .Jiang Guiyan et al. analyzed the deviation for the data collection of buses, and regarded that 50 seconds, 80 seconds or 100 seconds were appropriate for DTI with DSI of 1Hz [6] ; Liu Yuhuan et al. analyzed the variation of matching accuracy with different DSI under the condition of urban road network, and took 10 seconds as DSI for the actual road network in Beijing to compare the map matching between 10~120 seconds [7] .
The means of fixed step size and statistical analysis were generally adopted to conducted optimal DSI analysis on the data collection for floating vehicles in previous scholars' studies.However, different from the problem of matching with the map, the real-time requirement of heavy-duty vehicles is not high, instead, the monitoring of emission and energy consumption in the long or longer time dimension requires longer Data Calculation Interval (DCI).Therefore, this paper adopts the following methods to simulate the transmission of different DSI and DCI for vehicles: (1) Date collection The actual road driving data of the test vehicle running continuously for one month was collected as the basic data set with DSI of 1Hz.The data collection items are shown in Annex Table 1.
(3) DCI Simulate different DCI 1) Aggregation and calculation were carried out for eigenvalue of various parameters according to the vehicle identification code and natural day.The application scene of this simulation method was the remote monitoring scene with natural day as the period calculation, DCI as the random variable and DSI as the control variable.
2) All the vehicle data collections were divided into equal time intervals, and the errors of eigenvalue of various parameters were calculated with different DSI from 600, 1800, 3600, 7200, 10800, 14400 and 18000 seconds respectively.This simulation mode was a scene of remote monitoring under a fixed period, and DCI and DSI were both control variables.
(4) Error of data eigenvalue Mean Absolute Error (MAE) was calculated by utilizing the eigenvalues of all data calculated in different DSI and those calculated with DSI at 1Hz, and 10% was selected as the threshold for observation.
In the formula, µ is the Mean Absolute Relative Error of eigenvalue; i T is the calculated value with DSI of 1Hz; i T is the calculated value in different DSI; n is the data size used for calculation.

Test samples
10 of China VI heavy-duty vehicles were selected as the object in the test, the vehicles were in good condition, there were no failures in the data collection process for one month, and the data could be transmitted steadily.See Annex Table 2 for details of the test samples.

Preparation of data set
The data were partitioned from the time information collected and the vehicle identification number, and arranged in positive order according to the time information.The eigenvalues of the data under each partition were calculated to obtain a new data set, which is shown as follows: 1 2 D is the eigenvalue of m under the partition; j D is the partition of j .

Calculation of eigenvalue
, 1 Figure 1a shows the correlation of vehicle speed eigenvalue in different DSI, and the correlation shows a decreasing trend with the reduction of DSI.Among them, the maximum speed value is most affected by DSI, while the average speed is least affected by DSI. Figure 1c shows the mileage calculated by speed integral and the MRE with DSI of 1Hz.When the DSI was below 60 seconds, the MRE had been greater than 10%.Observing from the MRE probability distribution in different DSIs (Figure 1b), with the increase of DSI, the error distribution gradually expanded, and the maximum error exceeded 200%, the overall error of 1 second and 5 seconds could be controlled below 0.2% and 2%, and DSI of 10 seconds was below 10%.
Acceleration, as a derivative feature of the vehicle, is able to reflect the intensity of vehicle speed change.Meanwhile, many scholars regard acceleration as an important parameter to judge driving behavior, and there are significant differences in vehicle emission and energy consumption characteristics under different driving behaviors [8][9] .As shown in Figure 2a, the correlation between maximum and minimum acceleration values gradually decreased with the increase of acquisition frequency, presenting the same trend.Acceleration was different from vehicle speed, and frequency variation was the main reason for inaccurate calculation of acceleration.In terms of the maximum acceleration value observed, the collection interval changed from 1 second to 2 seconds, the variation rate of correlation changed by 26%, and the correlation of maximum acceleration was lower than 0.8.Therefore, a higher collection frequency is beneficial to the calculation of acceleration, while a lower collection frequency leads to the complete lack of correlation in the calculation of acceleration.As shown in Figure 2b, the mean error of acceleration was about 19.2% after more than 2 seconds, which was much greater than 10%.Observing from the error distribution of acceleration at different frequencies (Figure 2c), with the decrease of DSI, the average error calculated gradually increased, and the error distribution center gradually approached from low to high of the error.

Eigenvalue analysis of nitrogen oxide output value
The output value of nitrogen oxide, as the most important parameter for remote monitoring of heavy-duty vehicles, is usually used to calculate the total emission and emission intensity per unit time.Observing from the correlation analysis of the nitrogen oxides instantaneous density (hereinafter referred to as "EF") and the transmission frequency, the mean value of EF and 95% fractile will not lead to significant correlation variation due to the change of collection frequency, which shows that there is no significant difference in the average vehicle EF level and the overall EF emission level distribution under the long time dimension; However, in the process of calculating the maximum value of EF, the correlation shows a tendency of obvious decrease.As shown in Figure 3, the correlation variation was greater than 25% when the collection frequency changes from 1 second to 60 seconds, while the maximum vehicle emission measured by remote monitoring was significantly different from the maximum nitrogen oxide concentration of heavy-duty vehicles collected in 1 second and 2 seconds when the collection frequency was greater than 5 seconds.As shown in Figure 4, this paper carried out a statistical analysis for the maximum EF value of vehicles.Since the samples adopted in this test were not from the same type of vehicles, the box distribution of the maximum value observed NOx emissions of vehicles under different collection frequencies and different partitions was calculated, in which the observed maximum value decreased with the reduction of the collection frequency.Considering from the emission regulations of heavy-duty vehicles, they adopted effective point pass rate and maximum value of NOx per unit time as punishment basis in both GB 17691-2018 and DB 11/1475-2017 emission regulations.Therefore, if a low collection frequency is adopted for remote monitoring of heavy-duty vehicles, a large number of excessive behaviors will not be monitored.After the China VI standard, the total NOx emission accounting of heavy-duty vehicles had been gradually transformed from the original annual survey and type examination methods of emission intensity accounting to the whole life cycle monitoring, so as to make it possible to accurately account the total NOx emissions.However, the working principle of NOx sensors [14] would lead to a large number of invalid values in NOx emission data, thus bring difficulties to accurately account for NOx emissions.Therefore, the invalid data uploaded by sensors was not calculated in the process of calculating the total NOx emission in this paper.As shown in Figure 5a, the total NOx emission accounting errors calculated at each partition in this paper were represented by the mean value, and the results showed that the total emission accounting errors gradually increased with the growing of collection frequency.When the collection frequency was greater than 60 seconds, the error was already greater than 10%.Meanwhile, as shown in Figure 5b, when the collection frequency was greater than 10 seconds, the error distribution presented a tendency of  In addition to the total emission accounting, vehicle emission intensity is also the focus of this paper.The emission levels of heavy-duty vehicles under different frequencies were observed from macroscopic point of view (without considering driving conditions) in this paper.As an important parameter to evaluate vehicle emission performance, emission factor is generally expressed in the form of g/km and g/ kWh, which respectively denoted the emission intensity of vehicle pollutants per unit mileage (hereinafter referred to as "ER") and the emission intensity under unit engine work (hereinafter referred to as "EW").EW was used to judge whether the vehicle emission level is up to the standard in GB 17691-2018, compared with instantaneous concentration, it was more indicative of vehicle emission levels with ER.As shown as figure 6, the correlation of expression intensity of the two forms of factors decreased with the reduction of collection frequency.With the increase of collection frequency, the correlation variation of emission factor was obviously greater than that of specific emission factor, especially when the collection frequency was more than 10 seconds, the trend of decreasing correlation gradually increased; While the specific power factor was less affected by the collection frequency, of which the correlation coefficients were all greater than 0.9.Therefore, the calculation of vehicle emission intensity by power is more robust than the calculation of vehicle emission intensity by mileage, which is less affected by the collection frequency.
As shown in Figure 7c, the relative error mean value of ER and EW gradually increased with the decrease of DSI.When DSI was greater than 60 seconds, the calculation error of ER was greater than 10%.Even when the collection frequency was 120 seconds, the calculation of MRE couldn't be carried out, and the error approached infinity; The EW error was greater than 10% after DSI of 120 seconds.

Eigenvalue analysis of vehicle fuel consumption
As an important parameter to evaluate the actual fuel consumption of vehicles, the fuel flow is of great significance to evaluate the vehicle energy consumption index and estimate the vehicle carbon emission.The change of data collection frequency has little influence on the calculation mean of fuel flow, 95% fractile and calculation of fuel consumption per 100 km.Observing from the rate of correlation variation, the change of DSI generated little influence on the calculation of fuel consumption, the monitoring of the change of average fuel consumption and the distribution of vehicle fuel consumption in the long-term time dimension did not generate great influence.As shown in Figure 8b, the error of fuel consumption per 100 km of the vehicle exceeded 10% with the increase of collection frequency when DSI was greater than 600 seconds.As shown in Figure .8b, correlation error appeared obvious difference in the change process of the collection.The overall deviation of the samples in 2 seconds and 5 seconds was less than 1%, while the calculation of fuel consumption per 100 km of the vehicle started to show a clock shape after 60 seconds in DSI.Therefore, if a low collection frequency is utilized to calculate fuel consumption per 100 km, the accuracy of fuel consumption will be affected, while the collection frequency from 2 seconds to 60 seconds will generate little influence on the calculation of fuel consumption per 100 km of vehicles.9, when DSI was 1, 2 and 5 seconds, the calculation errors of each eigenvalue in different DCI were all less than 10%.However, under the observation in a short time, the calculation errors of each eigenvalue were much higher than those when DCI was taken as a natural day.Under the same DSI, the longer the DCI was, the smaller the error was calculated in the same DSI.Meanwhile, under short time observation, the error of calculating EW was much larger than that of calculating ER and fuel consumption per 100 km.To obtain EW with an error less than 10%, the DSI should not be less than 60 seconds and the DCI should not be less than 18,000 seconds.

Conclusion
The applicability of remote monitoring data collection frequency of heavy-duty vehicles was studied and analyzed in this paper, the error and correlation calculation of EW, ER and fuel consumption per 100 km of heavy-duty vehicles were calculated from different DSI and DCI, which come to the following conclusions: 1) This paper proposes that the monitoring frequency of cold and hot data should be treated differently in the current remote monitoring methods of heavy-duty vehicles; 2) The energy consumption and emissions of heavy-duty vehicles can also be represented by sampling analysis; 3) Finally, the errors in different DSI and DCI are helpful for error correction in remote monitoring of emissions and energy consumption accounting.
is the vehicle emission load of NOx in the time of i , with the unit of g; i v S is the sum of speed integrals in the time of i , with the unit of km; nox ER is the vehicle emission factor, with the unit of g/km; i W is the sum of vehicle power in the time of i , with the unit of kwh; nox EW is the specific power of the vehicle, with the unit of g/kwh; , diesel i EF is the diesel consumption in the time of i , with the unit of L; diesel ED is the vehicle fuel consumption, with the unit of L/100km. 3 Correlation and error analysis of data eigenvalue in different DSI 3.1 Analysis of speed eigenvalue Speed, as the most important basic feature of the vehicle, is widely used to calculate the driving distance of the vehicle per unit time and analyze the characteristics of the vehicle 4 E3S Web of Conferences 360, 01005 (2022) https://doi.org/10.1051/e3sconf/202236001005VESEP2022 working condition in the current remote monitoring application of the heavy-duty vehicle.
Fig. 5a.The Mean MRE of Total NOx Emission Calculation in Different DSI.

Fig. 5b .
Fig. 5b.The MRE Distribution of Total NOx Emission Calculation in Different DSI.
or sampling calculation to reduce the usage of storage resources or calculation resources.