Real time estimation of emissions in a diesel vehicle with neural networks

Several studies in literature have shown how real-world emissions strongly depend on driving condition, driving style, ambient temperature and humidity, etc. so that they are significantly different from the values measured on test benches over standard driving cycles. This concern, together with the so-called Diesel-gate, has caused the introduction in Europe of an innovative procedure for the registration of vehicle based on real driving emissions (RDE) measured with a portable emission measurement system (PEMS). PEMS devices are bulky and very expensive, therefore they cannot be extensively for an actual real time monitoring of emissions. To solve this problem, the present work proposes a Neural Network model based on the interpolation of the time-histories of driving conditions (speed, altitude, ambient temperature, humidity and pressure) and emissions measured on a diesel start-and-stop vehicle while performing a series of RDE tests. Two different approaches are proposed. The first one calculates the emissions on the basis of the vehicle motion (speed and altitude profile, ambient conditions). The second one models the engine block using as input the ambient conditions, the load and the rpm of the engine as derived from the OBD-II scanner. The output of both models are the flow rates and cumulated values of CO2 and NOx. Note that the inputs of the two models are signal that can easily obtained on-board without additional sensors.


Introduction
Increasingly stringent emissions regulations are threatening the future of diesel. On the one hand, it enables significant savings of fuels and, therefore, of greenhouse emissions, in long trips. On the other hand, the diesel engine is no longer viewed favourably in public opinion and by European legislators who consider this engine to be particularly polluting.
European standards define acceptable exhaust emission limits for vehicles sold in the European Union [1] - [3]. They are defined by a series of directives that stage the gradual introduction of increasingly stringent limits. At present, the standard to be met appears are: Euro 6 and US EPA 2010 for heavy vehicles (HDV1) and Euro 6 and Tier 2 for light vehicles (LDV2).With the new European provisions RDE (Real Driving Emissions) a further tightening has been introduced which aims at a substantial block for diesel cars from 2020.
In order to obtain pollution data as real as possible, emissions are to be measured under real road driving conditions by means of special instruments installed on board called PEMS (Portable Emissions Measurement System) [4] - [10]. PEMS devices are bulky and very expensive, therefore they cannot be extensively for an actual real time monitoring of emissions. The goal of this investigation is to verify if it is possible to predict, with reasonable accuracy, the overall emissions of NOx and CO2 during an RDE cycle with a numerical model using as inputs signals already acquired in a vehicle.
In the literature, it is possible to find numerous studies carried out in the laboratory with the ultimate aim of generating models to predict emissions. One of the greatest challenges lies in the difficulty to characterize the system behaviour both in stationary and transient conditions [11] , [12].
The strong dynamic component of the problem has seen the use of various types of neural networks, from the simplest Feed Forward to the more complex Recurrent [13]- [15]. The results, of what appears to be the state of the art, are around 80-90% accurate [16]- [18].
In this investigation, we will use real CO2 and NOx emissions measured in RDE tests to develop Neural Network models. Two different approaches are proposed. The first one calculates the emissions from the vehicle motion (speed and altitude profile, ambient conditions) and is aimed at giving the driver an estimation of the emissions produced in a RDE test. The second one, models the engine block using as input the ambient conditions, the load and the rpm of the engine derived from the OBD-II scanner and is meant to be used in the framework of hybrid electric vehicles.
Even if the tools and the topics addressed in this study are not particularly innovative, it is new the application to driving cycles complying the RDE legislations. Moreover, this investigation is only the first step towards the development of advanced energy management strategies for hybrid electric vehicles aimed at reducing CO2 and NOx.

The vehicle and the RDE cycles
The experimental data used in this investigation were acquired on a Class 3b vehicle (maximum speed >120 km/h, power-to-unladed-mass ratio > 34 W/kg) whose specifications cannot be reported because of a confidentiality agreement. The vehicle was type approved in 2013 and had a mileage of 70000 km when the tests began. It was equipped with an AVL PEMS 493 as shown in Fig. 1. This unit includes multiple gas analysers, a GPS receiver (to record vehicle speed, latitude, longitude and altitude), an exhaust flow meter and an interface for connection to the vehicle's On-Board Diagnostics (OBD). Ambient temperature, humidity and pressure are measured using appropriate sensors.
Several RDE cycles (each with a duration between 93 and 108 minutes) were performed in Lecce with this equipment. Data were sampled with a frequency of 10 Hz resulting in a total of more than 570000 samples (> 15h). For the details about the tests and their outcomes, please refer to [4].

Pre-processing of the raw data
A wise selection of data and the subsequent pre-processing stage are crucial for a proper training phase. In this work, we selected the following data vectors as predictors due to their low Pearson coefficient value:  Air temperature (°C);  Air pressure (mbar);  Altitude (m);  Vehicle speed (km/h);  Calculated engine load (%);  Engine speed (rpm);  Exhaust gas flow (l/h);  CO2 (g/s);  NOx (g/s).

Outlier and smoothing filterd
Taking account of data distribution, which does not fit a Gaussian model, we used moving median method as outlier detection strategy. That method substitutes anomalous values by interpolating the previous non-outlier to the next one.
A smoothing filter was applied to highlight significant patterns to attenuate the noise generated by several phenomena (environmental, electrical, computer artifacts or by high speed). That filter employs moving average method calculated with respect to a specified time window span. Moreover, as a side effect, this routine reduces the training time as shown by experimental evidence.

Traction force and traction power
Considering the goal of this investigation, we decided to generate traction force and traction power data vectors. In this way, we created a wide ranged model linked to variables like vehicle acceleration, mass and size specifications. The following approach, we adopted, (by Guzzella et al (2007) [19]) uses the elementary equation describing the longitudinal dynamics of a road vehicle: where , ( ) are the mass and speed of the vehicle; is the traction force; is the aerodynamic friction; is the rolling friction; is the grade force; is the disturbance force that summarizes all other not yet specified effects. represents the force generated by the main engine minus the force used to accelerate the rotating parts inside the vehicle and minus all the friction losses of the powertrain. Assuming the system works in ideal conditions, this force is completely transferred to the wheels as power:

= •
Both generated signals are function of: mass, acceleration, frontal area , aerodynamic drag coefficient , rolling friction coefficient .

Data Scaling
Right after filtering data, they have been scaled to fit the range required by the input neurons in the neural network. This task has been achieved by normalizing the signals between 0.1 and 0.9, computing them as equation below shows: Table 1 shows all the data that will be used for the training of the neural network.

Development of the neural networks
We developed two different models using the same variable as target of forecast as shown in Fig. 2. The "Vehicle" model was designed to be an auxiliary cockpit during RDE test while the "Engine" model will be implemented on a hybrid powertrain model.

Multi-layer Perceptron
An Artificial Neural Network (ANN) is a dynamic system having oriented graph as topology (Fig. 3). Each node is called artificial neuron, while arcs are defined as synaptic weight. The term network come from the layered topology connecting neurons. It can be seen as a black box that associates an input to an output. Taking advantage of that, a neural network become a mathematical function f(input; weights) whose behaviour varies according to the weights and of course the input, without specifying the shape of the f function. The present study aims at generating a light and responsive model for a real time forecasting of emission. For this reason, we chose a multilayer perceptron feedforward ANN that is characterized by a simple structure. The chosen optimization algorithm, Adam, is an alternative to the classic SGD that maintains a different learning rate for each parameter, combining the advantages of RMSProp and AdaGrad. Moreover, it operates in a computationally efficient way [20].

Training options
The whole dataset has been divided into three main parts: However, the distribution of the tests in the training, evaluation and test sets is a critical issue that will be analysed as a further investigation. As for the structure, two different configurations have been chosen, one with single and double hidden layers. This choice was made because of the desire to achieve the highest possible degree of generalization. In fact, a too complex structure could "learn" too much from training data and show bad results in the test phases. A Dropout logic has also been inserted to minimize possible Overfitting problems. Another discriminating factor taken into consideration is the activation function chosen for the single layers. We opted for a sigmoid and a non-linear ReLU activation function. The error functions taken into consideration were: Where are the target values from the RDE tests and ℎ ( ) are the predicted values by the ANN. In particular, the last error function is called infinite norm or Chebyshev distance. It appears to be more robust in controlling residual errors, but, on the other hand, it can have a strong sensitivity in the presence of noise on the signal [21].
To sum up, the hyperparameters taken into consideration are shown in the Table 2. The goodness-of-fit of the trained ANN has been evaluated by using three statistic estimators: RMSE, coefficient of determination ( 2 ) and cumulative relative error.

Manual optimization
The first step was to identify the best error function among the available ones. A reduced number of combinations was used with a particular focus on loss functions. The details are in Table 3 .  Fig. 4 show the trend of the target function as the number of neurons and the learning rate changes for one of the models examined in this study.

Bayesian optimization
In the previous paragraphs, experiments on the models were carried out using manually chosen values as hyperparameters. These values are the result of numerous tests whose objective was to improve the result as much as possible and therefore the minimization of the loss. What makes such an approach difficult is that, for each model, the optimum requires its own configuration, not necessarily the same as the others, even if using the same data. The time required to achieve an optimal configuration could be considerable or even not lead to concrete results, but thanks to specific optimizers, the operation is easier and immediate.
The Bayesian optimization approach uses mathematical models and does not require human intervention at any stage of research. The tool that has been used combines the assisted approach to the algorithmic one, delegating to the programmer only the choice of the value range for the hyperparameters and automating the search through an algorithm called Bayesian Optimizer. Bayesian optimization maintains a Gaussian process model of the target function internally and uses its evaluations to train the model. An innovation in Bayesian optimization is the use of an acquisition function, which the algorithm uses to determine the next point to consider. The acquisition function can balance sampling in areas that have not yet been well modelled and, therefore, have poor performance of the target functions [22] [23]. Unlike the random or grid search, therefore, the results of past evaluations are tracked to form a probabilistic model that maps the hyperparameters according to the probability of a score on the target function [24]. In literature, this model is called "surrogate" for the objective function and is represented as ( | ) [25]. The latter is much easier to optimize than the objective function. The Bayesian optimization looks for the next set of hyperparameters to evaluate on the real target function by selecting those that work best on the surrogate function. The training was carried out on a computer with the following technical specifications: CPU AMD Ryzen 3 2200g, GPU integrated AMD Vega 8, RAM 16Gb (2x8) 3200 MHz.

Discussion of the results
The networks were trained through Bayesian algorithmic logic differentiating by model (Engine and Vehicle), target (CO2 and NOx) and number of layers (single and double). In Fig. 5-Fig. 7 the results of the tests are summarized. The optimal structure of the neural network varies significantly for the different cases, as can be seen from the plots. The ReLU activation function tends to have the worst results in almost all metrics and the only exceptions are the NOx and CO2 cumulative error in the single-layer engine model ( fig. 3.19). The sigmoid function offers better performance in all other case studies. The single layer configuration gave better results in predicting NOx, whereas the CO2 target prefers a double layer structure. In general, the CO2 target, as expected is easier to predict by showing better 2 and fewer cumulative errors ( fig. 3.19). Finally, we can say that the Engine model is better predicted by neural networks in terms of 2 ( fig. 3.18), but we obtain smaller cumulation errors with the Vehicle model ( fig. 3.19).  The last step of this process was the selection of the best combination of hyperparameters with respect to the chosen metrics, giving priority to the accumulation error. Table 5 shows the combinations of hyperparameters of the selected neural networks with respective results. Table 5. Summary of the best models with their parameters.
Note that we managed to minimize the error on accumulation to a value below 1% and to obtain 2 around 85%. This gives us both precision on the measurement of pollutants, the objective of this work, and a high understanding of the local signal trend. The latter is very important whenever a strategy to reduce emissions is to be structured. In fact, capturing the positive gradients of CO2 and NOx in real time allows a greater reactivity of the control algorithms.
In general, the number of neurons of the proposed Neural Networks is low, which emphasizes a greater ability to predict targets by simpler models. The result is not surprising, as simpler models tend more to generalize and get better results in the test phase. Finally, it should be noted that, in all combinations, the dropout factor has remained around zero, which means that the early stopping parameters were set correctly, and overfitting problems were rarely found during training. The instantaneous and cumulated emissions of NOx along a test are shown in Fig. 8 together with the speed profile for the vehicle model. Similar results are obtained in for NOx and for CO2 with the engine model. Note that, despite the final value of the cumulate is practically identical, the instantaneous values of NOx are not accurately predicted, above all in the rural and motorway sections of the cycles. This is probably due to an uneven distribution of the amount of data in relation to the operating sections. The RDE test provides the same number of km for each zone, but at different speeds. It is therefore obvious that higher speeds (rural, motorway) result in less run time and, consequently, fewer samples. The local accuracy issues are therefore the result of a particularly pronounced under fitting.
To improve the results, the procedure is being applied separately to each section as a further investigation. A preliminary result of this study is also shown in Fig. 8 were the results of the Neural Network trained and tested only on the urban sections of the RDE tests is reported as a black line. Note the significant improvement that can be seen qualitative in the capability to follow the peaks and the valleys of the NOx experimental curve and in the cumulate trace. Quantitatively, the RMSE decreases from 0.0028g/s to 0.0022g/s, the 2 increases from 0.56 to 0.67, the error on accumulation on the urban section decreases from 16% to 0.85%, by training the model on the urban section only. This proves that by training the neural network on consistent data the algorithm improves its prediction capabilities.

CONCLUSIONS
This study describes the modelling with Neural Network of real-world emissions of NOx and CO2 in a diesel vehicle using experimental data obtained with a Portable Emissions Measurement System on a Diesel start and stop vehicle of class 3b during RDE tests. The first phase of the investigation was the pre-treatment of the dataset in Matlab environment.
The following data of interest were selected: temperature and atmospheric pressure, vehicle speed, altitude, engine rpm, engine load, NOx, CO2. In addition, the vectors of traction force and power to the wheels were generated to address the dynamic effects. A first filter of outliers and one of smoothing then eliminated part of the noise present on the signals. The last phase of pre-processing saw the normalization of all signals in a range between 0.1 and 0.9.Two different models (Engine and Vehicle) were created which had different input data, but the same targets for the prediction. The main objective of the investigation was to minimize the cumulative error on emissions and, in addition, other metrics such as RMSE and 2 were considered. The basic model chosen for the neural network was Perceptron with a multi-level structure. The hyperparameters related to it were treated in detail and the possible combinations of these were analysed. In particular, the performance of three different error functions were evaluated: MAE, RMSE, infinite norm. The MAE showed significantly better results in the test phase, and was therefore selected for the training. To deal with the high number of combinations, an algorithm-based optimization logic was used. The Bayesian optimizator, implemented in Matlab environment, creates an internal model called surrogate that, iteration by iteration, simulates the behaviour of the loss function. This allowed a more careful choice of hyperparameters, minimizing the number of combinations examined by the model and significantly reducing the computational time. The results shown that simpler models, single layer and low number of neurons, guarantee better performance. Moreover, the Sigmoid activation function is able to describe physical phenomena with greater accuracy. In conclusion, the trained networks achieved performance with cumulative errors of less than 1% and 2 around 85%, both for the Vehicle and Engine models and for CO2 and NOx emissions. Nota that the Vehicle model is designed to be a sort of auxiliary "cockpit" with the aim of providing the driver with an estimate of emissions on a certain route or as an aid to the pilot during the RDE tests. The Engine model, instead, has, as future development, the development of advanced energy management strategies for hybrid electric powertrains. In this way, it will be possible to evaluate the effects of hybridisation in terms of both CO2 and NOx.