Remote NO x emission prediction model based on LSTM neural network

. Test results from many researchers show that NO x emission from many on-broad heavy-duty diesel vehicles is higher than which been registered. Therefore, CN_VI emission regulations clearly proposes that the heavy-duty diesel vehicles should be supervised by a T-BOX which can transmit CAN message from vehicle OBD interface to the remote monitoring platform. Based on the formation mechanism of NO x emission and the variety of OBD data flow, the LSTM (Long Short-Term Memory) neural network model inputs such as engine speed, torque, atmospheric pressure, coolant temperature, fuel consumption rate and intake air mass flow are selected by using partial least square method (PLS). 19877 groups of data from engine test results were used for model training and verification, the root mean square error of training and test are R TR = 29.7 × 10 -6 and R TE = 19.9 × 10 -6 ,with a high prediction accuracy which can fully meet the requirements of the SCR system DeNO x performance diagnosis module in the OBD remote monitoring system.


Introduction
With the rapid development of the national economy, the main contribution factors of urban pollution sources have begun to shift from fixed source emissions to mobile source emissions. Emissions of mobile sources in Beijing, Shanghai, Hangzhou, Jinan, Guangzhou and Shenzhen are the primary sources, accounting for 45.0%, 29.2%, 28.0%, 32.6%, 21.7% and 52.1% respectively [1] . By the end of 2018, there were 18.18 million diesel trucks in China, accounting for 7.9% of the vehicle ownership, which were mainly commercial vehicles with high intensity of use , harsh operating environment and relatively high pollution emissions, and emissions of nitrogen oxides (NOx) and particulate matter (PM) accounted for 60.0% and 84.6% of the national vehicle emissions respectively .
Since the implementation of the CN_IV and CN_V emissions regulations, a number of research and testing institutions found that the real-road emissions of diesel vehicles, …… …… …… Fig. 1. OBD remote monitoring system. OBD remote monitoring system is mainly composed of OBD remote monitoring terminal (also known as vehicle terminal/T-BOX) and remote monitoring platform. As shown in Figure 1, OBD remote monitoring terminal is usually installed in the vehicle OBD diagnostic interface, which can collect the main operating parameters of the vehicle during operation through the CAN bus (e.g. engine speed, intake mass flow, fuel consumption rate, engine net output torque, SCR inlet temperature/SCR outlet temperature, urea injection volume, urea level sensor signal, downstream NOx sensor measurement etc.). Through the wireless communication module of the terminal, the data will be sent to the remote monitoring platform in the format specified by the platform using the 2G/4G network. The platform can accept the data uploaded by the terminals installed on different vehicles, and carry out a series of data pre-processing, storage and calculation operations to achieve the normalized tracking of continuous, real-time and online status monitoring of vehicle operating condition information, I/M status information and pollutant emissions.
After processing a large number of long-term running data from on-line vehicles, many common cheating behavior of vehicles are identified based on temperature sensor signal diagnosis module, NOx sensor signal diagnosis module, urea injection system working status diagnosis module etc. Based on the feedback results of the real state of the actual vehicle, train the neural network self-learning model that determines the diagnosis result, determine the threshold of core criteria and the weight factor of diagnosis result of sub-model. The learned model can accurately identify the abnormal using behavior of vehicles such as complete or partial pull-out of the temperature sensor, partial dilution of the NOx sensor, pullout of the NOx sensor, failure of the NOx sensor, the using of temperature/ NOx sensor signal generator, reflux of the injected urea, the using of poor quality urea and poor performance of SCR catalysts. This paper focuses on the NOx emission prediction method used for the diagnosis of DeNOx performance of the vehicle SCR system to ensure that the accuracy of prediction output by the model meets the needs of remote diagnostics.

Method of SCR system DeNOx performance diagnosis
After the platform accepts the OBD data uploaded by the vehicle, the filtered data is used as the input to the DeNOx performance diagnostics module. Due to the technology configuration of a single downstream NOx sensor used in diesel vehicles, it is necessary to diagnose the DeNOx performance of the SCR system through the following methods: 1) Predict upstream NOx emission level through the NOx emission prediction model of the platform, as shown in figure 2. 2) Estimate the maximum conversion capacity of the catalyst through inlet and outlet temperature of the SCR system, estimate the theoretical maximum reaction capacity of the SCR system under the actual injection condition by the reactant real-time injection volume, and obtain the DeNOx quantity, as shown in Figure 3. 3) Predict downstream NOx emission mass through upstream NOx primary emission cumulative mass and the theoretical DeNOx mass. Finally, compare the downstream NOx emission mass and the actual NOx emission mass calculated by t NOx sensor (as shown in Figure 4) to evaluate the real-time conversion capability (performance) of the SCR system.  The evaluation method of the real-time conversion capacity of the SCR system is as follows. Since the conversion efficiency of normal vanadium based SCR catalyst can generally reach more than 60% under the conditions above the starting temperature, Condition 1 gives the basic judgment of the catalyst conversion efficiency, Condition 2 is to diagnoses the DeNOx efficiency of SCR system by checking the logic correspondence among the accumulated mass of the upstream NOx emission, the accumulated mass of urea injection and the accumulated mass of the downstream NOx emission. When Condition 2 fulfilled, the conversion performance of SCR system is considered normal. (1) In the formulas: A is the accumulated mass of the engine out NOx emission during a certain time; B is the theoretical accumulated DeNOx mass corresponding to the consumed urea during a certain time; C is the accumulated mass of tailpipe NOx emissions during a certain time.
When the downstream NOx sensor works normally and the signal is reliable, the accuracy of this diagnosis method mainly depends on the accuracy of the upstream NOx emission prediction model. This paper uses the LSTM neural network method to predict the engine out NOx emission of the target vehicles because OBD remote monitoring platform can easily get a lag quantity of real-time data that can be quickly classified in to diffident models and brands.

Influence factors of diesel NOx generation and preliminary confirmation of input parameters
Three conditions are required for the formation of diesel exhaust NOx: excess oxygen in the combustion chamber, high temperature of the combustion process, and duration of high temperature [17] .
As the advance angle of fuel injection is delayed, the amount of fuel involved in the mix combustion decreases, and the initial heat release and peak of the heat release are reduced. Thus, the maximum combustion temperature is reduced. Meanwhile, the duration of high temperature in combustion chamber get shorted and the NOx emission is significantly reduced.
The diesel engine speed can affect both the high temperature and the duration of the combustion process. When the injection angle is fixed, the higher the speed, the shorter the actual injection time. At the same time, the charge coefficient decreases with increasing speed, resulting in a reduction in NOx emission concentration.
The load represents the maximum combustion temperature and the duration of the high temperature. Torque, cycle fuel injection quantity (mg/stk), excess air coefficient etc. represent the load factor. And the torque can be obtained by product of the rated torque in the vehicle static information and the percentage of net output torque in the OBD data flow; cycle fuel injection quantity can be calculated by the proportional relationship between the number of engine cylinders in the vehicle static information and the speed and the fuel consumption rate in the OBD data flow; excess air coefficient can be calculated by the intake quantity and the fuel consumption rate in the OBD data flow.
For diesel engines with EGR, the EGR rate is also an important factor that affects the original NOx emission. However, most of heavy-duty vehicles that meet the implementation of the CN_IV and CN_V Emissions Regulations adopt the SCR-only DeNOx technology route, the proportion of vehicles using EGR is small, so the EGR rate is not considered in this paper.
Intake resistance, exhaust resistance, temperature after intercooling etc. also have a great impact on emissions, but these parameters mainly depend on load factors, which has been reflected in the fuel consumption rate factors and will not be considered.
The environmental factors such as the atmospheric pressure, ambient temperature, ambient humidity of the engine-post-processing system also have an impact on the generation of NOx, and the atmospheric pressure can be obtained through the on-board sensor signal, the ambient temperature and ambient humidity etc. can be confirmed through the input in the platform atmospheric environment module with the GPS signal later. Then correct NOx emissions.
According to the requirement of GB17691 for remote monitoring of OBD data flow, the types of data that are shown in Table 1 must be read from OBD interface. Table 1. Requirements for remote monitoring of data flow in GB17691.

Number
Data items 1 Vehicle speed 2 Atmospheric pressure (direct measurement or estimation) 3 Maximum benchmark torque of the engine 4 Engine net output torque 5 Friction torque 6 Engine speed 7 Engine fuel flow 8 NOx sensor output 9 SCR inlet temperature (if applicable) 10 SCR outlet temperature (if applicable) 11 DPF differential pressure 12 Intake air mass flow read by the air quality flow sensor 13 Urea tank level 14 Fuel tank level 15 Engine coolant temperature 16 Latitude and Longitude Based on the standard output parameters of OBD interface in Table 1, combined with the parameters related to the formation of NOx emissions, preliminarily select seven parameters, such as engine speed, engine net output torque, atmospheric pressure, cooling water temperature, fuel consumption, SCR inlet temperature, and intake mass flow, as the initial input of the model. Then filter and confirm relatively independent characteristic parameters as the final input of the neural network by the PLS method.

PLS method
The PLS method uses the idea of main component analysis and reduce the multiple correlation of the independent variables by extracting the component. However, it is different from principal component analysis. Partial least square method extracts components in groups of independent and dependent variables at the same time. The extracted components satisfy the following two conditions [18] : (1) The extracted component group should contain the variation information in the independent variable and the dependent variable as much as possible; (2) The correlation between the independent and dependent variable components in the same group is as large as possible, that is, their correlation coefficients are as large as possible.
If t represents the component extracted from the independent variable X and r represents the component extracted from the dependent variable Y, the above two conditions mean choosing the largest variance Var(t) and Var(r) and the largest correlation coefficient Corr(t , R) . consumption Assuming that there are and that make formula = and = true, the above condition can be expressed the extremum problem in formula (3): max �Var(t)Var(r) Corr(t, r) = max C ov(Xw, Yv) = max( w T X T Yv) Solve w and v using singular value decomposition. A set of eigenvectors t=Xw and r=Yv can be obtained through decomposition of the equation , where w=f1 and v=g1. It has been proved in literature [19] that the covariance Cov(t, r) at this time is the largest. The load vectors of feature vector are defined as = / , = / ; The regression factor of the regression model r � = ct is = / , and the components of the first group extracted are and . The independent variables and dependent variables were decomposed to : X = X 1 + tm T , = Y 1 + c , and do the same thing for X1 and Y1. Assuming A times of decomposition, independent variables and dependent variables can be expressed as: The relationship between the variables X and Y can be expressed as the following: Where B is the number of independent variables, Bi is the regression factor of the ith independent variable, Xi is the ith independent variable, and YE is the residual matrix.

Screening of independent variables
Although the partial least square method extracts the most relevant components from independent variables and dependent variables, the insignificant variables in the independent variables will still have an adverse effect on the model [20] , so the independent variables must be filtered. The independent variables are tested by the significance coefficient as follows: The regression factor bi is calculated using all samples for the ith independent variable and the regression factor bi (-j) is calculated using the sample except the jth sample point, and the following parameters are defined: The significance of the independent variable is judged by whether the significance coefficient |bi(j)/t0.05, n-1ơbi| is greater than 1. The greater the significance coefficient, the higher the significance of the independent variable. When the significance coefficient is less than 1, the independent variable is not significant.
However, simply filtering the independent variables by whether the significance coefficient |bi(j)/t0.05, n-1ơbi| is greater than 1 does not result in the best results. Therefore, two parameters in the formula (7) and formula (9) are used to determine the accuracy of the model to assist in the selection of independent variables. The root mean square error can show the average prediction effect of the model.
In the modeling process, the sample is divided into two parts: the modeling sample and the testing sample. Training samples are used to build the model, and test samples are used to verify the prediction performance of the model.
The root mean square of the training sample prediction error: where yi represents the actual value of the ith training sample in the dependent variable, represents the predicted value of the ith training sample and n is the number of training samples.
The root mean square of the test sample prediction error: where yj represents the actual value of the jth test sample in the dependent variable,y � j represents the predicted value of the jth test sample and m is the number of test samples.
After judging the significance coefficient is less than 1, adopt the method of screening one by one to remove the insignificant variables. First calculate the Rtr and Rte values of the model before removing an independent variable, and then calculate the Rtr and Rte values again after removing the independent variable. If both Rtr and Rte become smaller, it is reasonable to remove the independent variable, otherwise it should not remove the independent variable. If the final Rtr and Rte values are large, it means that the relevant information has been missed or the nonlinearity was serious, and the information should be re-extracted.

Removal of specific sample points
Because of experimental errors or other reasons, a small number of sample points deviate too much from other sample points, and when they are used to model or validate the model, there is a large deviation in the mode. Therefore, this part of the data needs to be eliminated. There are a statistical test method proposed in the literature [18] .
The contribution rate of the jth sample to the hth component th is defined as: where ℎ 2 is the square of the hth component value calculated with the jth sample. ℎ 2 is the variance of component th.
The cumulative contribution rate of the jth sample is: where m is the number of extracted components. When the cumulative contribution is too large, the probability that the component will deviate the result is high. The specific determination conditions are as follows: The sample point is retained if the above conditions are met, otherwise it should be eliminated.

Determination of the structure of the PLS model
The number of extracted components will directly affect the effect of partial fitting of neural network. Determine the optimal number of components by increasing the number of components one by one. The method steps are as follows: the number of components increases one by one from 1. We can use the method in Figure 5 to determine the optimal structure of the neural network part with current number of components to obtain the PLS model with different components. As the number of components increases, the prediction error of the model first decreases. When the number of components of the model is 6, the test sample error is minimum. And then the number of components starts to increase, and the error of training sample changes little when the number of components is 4 to 6. Therefore, the optimal number of components of this model is 6. This paper finally confirms that the parameters used for the neural network model estimate are as follows: engine speed, torque, atmospheric pressure, cooling water temperature, fuel consumption and intake mass flow. The engine test bench data are used for model training and validation. We combined with the sensor signal of the T-BOX and all environmental factors are used as input of the environmental correction factor MAP when estimating emissions for the vehicle.

Possible error in the PLS model
Based on previous experience in establishing the NOx emission estimation model in the injection control strategy, it has been found that the deviation of the PLS model is mainly due to the following aspects [14] : (1) Error of the test data. There are some errors in the data obtained from the experiment, which will result in the error of the model prediction results. The errors are randomly distributed, and the influence of this error on the model decreases gradually as the training sample increases.
(2) The deviation caused by fitting. The PLS method is only applicable to linear fitting, Although the accuracy of the model is improved to a certain extent by adding some quadratic terms, the quadratic curve still has a large deviation from the actual curve; (3) Error caused by the residual terms. During modeling with the PLS method, there is always a residual YE for the dependent variable that cannot be eliminated, and it also have an impact on the model; (4) Others. The selected independent variables and training samples are not representative, so the model is poorly optimized.
In order to reduce the error of the PLS model, the PLS method of neural network is used. The composition extracted by the PLS method is fitted directly to the dependent variable, eliminating the influence of the residual terms. At the same time, the error caused by fitting can be greatly reduced by using of the neural network method.

Determination of neural network model
Although the BP (back propagation) neural network is popular and technologically sophisticated in the industry, recurrent neural network is more appropriate for sequence prediction and large amounts of data [15] . Moreover, the LSTM neural network overcomes the problem of gradient disappearance and gradient explosion caused by long-distance memory, and is a more suitable choice for training the network model [21] .

2.1.1 Selection of activation functions and learning algorithm
The main function of the activation function is to strengthen the ability of network learning and provide nonlinear modeling ability for the network [22] . Different activation functions have different characteristics and application situation. In the LSTM neural network, the Sigmoid function can be selected as the gate activation function. The Sigmoid function can produce values between 0 and 1. 1 means remember and 0 is for forgot. In addition, activation functions such as Tanh, ReLU, Sigmoid can be selected as state activation functions. Compared with the ReLU function, Tanh function is more robust, and its convergence speed is faster than Sigmoid function, so we choose the Tanh function as the state and output function.
The common learning algorithm of LSTM neural networks is BPTT (back-propagation through time). The BPTT algorithm is a common method for training RNN, in fact, its essence is still the BP algorithm. It calculates the hidden state and output, error and partial derivative of weight of each layer, and then optimize the parameter of vertical propagation between levels and lateral propagation on time using gradient descent [23] .

2.2 Determination of the number of nodes in hidden layer
The number of nodes in the hidden layer has a great impact on the performance of the LSTM neural network. If the number of hidden layer nodes is too small, the information processing ability and generalization ability of the network will decrease. Conversely, if there are too many hidden layer nodes, the network will also be complicate, the model training speed will be affected, and the risk of "overfitting" will increase. However, based on the Kolmogorov theorem, it can be seen that this hidden layer has 2n-1 neurons, where n represents the number of neurons in the input layer. Because the number of input layer nodes of LSTM model created in this paper is 6, the number of hidden layer nodes is 13.

Adam optimization algorithm
The optimization algorithm is essentially a mathematical method. And common optimization algorithms include SGD (Stochastic gradient descent, SGD), Momentum, Nesterov Momentum, Adagrad (Adaptive sub-gradient, Adagrad), RMSprop, Adam etc. [21] . Adam is actually the result of an optimized combination of Momentum and RMSprop. Adam can not only store exponential decay averages of squares of past gradients like RMSprop, but also can store exponential decay averages of past gradients like momentum. In short, Adam can automatically adjust the learning rate of parameters, increase training speed, and improve network stability. The main parameters of the recurrent neural network based on LSTM are listed in Table 2: :

Experimental acquisition of model data
We used the bench test results of a commercial heavy-duty vehicle diesel engine as the training and validation data for the model are from and the specific parameters of the engine are shown in Table 3: : Measure parameters including engine data flow and gas emissions using instruments shown in Table 4, and figure 6 shows the bench test site. Fuel delivery per cycle, engine speed, engine net output torque, atmospheric pressure, coolant temperature, SCR inlet temperature, intake mass flow method, and 14877 groups are used as sample training data and 5000 groups are used as test data. Because the unit of each input data is different and the order of magnitude is different, it is necessary to normalize the data first.

Results analysis
LSTM neural network and PLS model were combined to the final neural network model. And the engine speed, torque, atmospheric pressure, cooling water temperature, fuel consumption, intake mass flow were taken as inputs of neural network. The predicted results of training and test sets of the model (the network model output value after denaturalization), the error of training set and the test set, as shown in figure 7 to figure 10 respectively.
As can be seen from figure 7 and figure 8, the predictive model has good learning ability and generalization ability, and with the increase of training times, the mean square error becomes smaller and smaller. The training sample error of the model (RTR) is 29.7×10 -6 . And the RMSE of the model is 2.27% when the maximum NOx emission of the training data is 1310×10 -6 . The test sample error of the model (RTE) is 19.9×10 -6 . And the RMSE of the model is 2.64% when the maximum NOx emission of the training data is 753×10 -6 . And the position with the largest error in the test set is in the 1001 data, with an error -267.2. The test set sample error percentage is shown in Figure 11. It can be seen from Figure 11 that the maximum error percentage of the test set is only 1.18%, which has reached a very high estimation accuracy. The data with the largest error mainly appears in the first few sets of data for in each layer of the training set, which is caused by the basic characteristics of the recurrent neural network and performs well in other cases.

Conclusion
(1) The PLS model based on the LSTM neural network model. First of all, select the parameters according to their availability, NOx emission correlation, and their significance. And then determine the structure of model by incrementing one by one. Finally establish the diesel engine NOx emissions prediction model and confirm engine speed, torque, atmospheric pressure, cooling water temperature, fuel consumption, mass air flow as the input of the model.
(2) The training sample error of the model (RTR) is 29.7×10 -6 . And the RMSE of the model is 2.27% when the maximum NOx emission of the training data is 1310×10 -6 . The test sample error of the model (RTE) is 19.9×10 -6 . And the RMSE of the model is 2.64% when the maximum NOx emission of the training data is 753×10 -6 .
(3) For this diesel engine, the maximum error percentage of test set is only 1.18%, which can fully meet the requirements for remote diagnosis of DeNOx performance of SCR system and can be used for reasonable diagnosis of downstream NOx sensor signal.
Open Fund of the National Engineering Laboratory for Mobile Pollution Source Emission Control Technology (NELMS2017A08)