Improvement of Bayesian Dynamic Linear Model for Predicting Missing Data of Bridges

. The missing data in bridge operation will lead to the decline of the reliability of data analysis results. In this paper, the Bayesian dynamic linear model is improved by changing the parameter matrix of hidden state variables, and the model is optimized under the condition that the predefined variables are unchanged. The frequency of a strain measuring point of the bridge is taken as the observed value, and the collected frequency value of one month is used as the training set (the collection time interval is 30 minutes) to predict the data of the next week. By comparing the predicted result with the observed value, it is found that the absolute error is less than 14.05Hz and the relative error is less than 1.82% when the training frequency value varies from 756 Hz to 773.4 Hz.


Introduction
The status of bridge engineering symbolizes the development level of infrastructure in a country. With the development of economy, there are more and more highway bridges, railway bridges, pedestrian bridges and other special bridges (such as pipelines and cables), so it is imperative to inspect and maintain these infrastructures. Therefore, a series of sensors are deployed to monitor the condition of the structure in real time to alleviate the problems caused by the aging of the bridge structure. However, the actual operation of bridges will be affected by many factors, for example, many bridges are far from urban areas, and solar energy can only be used for power supply in the field, which leads to the fact that the batteries of equipment cannot be replenished with electricity from solar panels after the electricity is exhausted in continuous rainy weather, resulting in power failure of equipment and missing data. In the case of missing data, it is unfavourable to analyse the health status of bridges, and there is a great potential safety hazard.
The mainstream forecasting methods in China are polynomial regression analysis method [1], autoregressive moving average model [2], grey model [3] and so on. However, the effect of these prediction methods in practical bridge application is not satisfactory. For example, the polynomial regression analysis method will cause a sharp increase in the amount of calculation under the condition of multivariate dimension; the dimension of autoregressive moving average model in practical application does not exceed two dimensions, which is suitable for the problem of less sample data; the grey model shows good results when the data has strong tendency, and only stays at the level of observation. However, the influencing factors of the bridge environment are complex and changeable, and the above methods can not solve the multi-dimensional variable situation.
The Bayesian dynamic linear model (BDLM) was first proposed by British statisticians M. West and J. Harrison [4] in 1976. In view of the particularity of the actual background of bridge structure, Solhjell first combined Bayesian dynamic linear model with structural health monitoring (SHM). K. Y. Koo [5] and others explored the influence of bridge temperature, traffic load and wind speed on bridge strain frequency, and determined that the influence of wind speed on frequency was ignored by principal component analysis. Goulet [6] defined the hidden state variables that affect the bridge strain frequency, and realized the prediction of the bridge frequency value by separating the observation into various general components according to the specific bridge model. However, the data selected by Goulet are all stable collected values without drastic fluctuations, which is lack of representativeness. Moreover, in order to improve the accuracy of the prediction value, Goulet used expectation maximization (EM) algorithm to train model parameters, leading to the result that the predefined of state variables are changed and the significance of multi-step prediction is lost.
In this paper, the time period with large and frequent temperature changes is selected to ensure the representativeness of the training data. On the premise that the data of the bridge meets the conditions of the model, the parameter matrix is changed. The parameters of components are set as constant 1 except the E3S Web of Conferences 185, 02027 (2020) ICEEB 2020 http://doi.org/10.1051/e3sconf/202018502027 parameters of periodic components, which not only ensures the predefined of state variables unchanged, but also avoids the problem of long calculation period and falling into local optimal solution caused by the expectation maximization algorithm. After comparing the multi-step prediction results with the observed values, the error is in a reasonable range.

BDLM
In BDLM, the observation equation and state equation are defined as:

Global matrix
The observation variables can be directly obtained by the acquisition equipment. There are four observation variables in this paper, namely: frequency (B), temperature (T), traffic load (L) and traffic mode (P The state transition matrix is defined as: The observation matrix is defined as: The measurement error variance matrix is defined as:

Probability recurrence
The specific steps for prediction and inference of observation variables and state variables are as follows: Posterior distribution of state variables at time (t-1): Prior distribution at time t: As in equation (4), , . One-step prediction distribution at time t: As in equation (5), | , | . Posterior distribution at time t: As in equation (6), As in equation (7), 1 , 1 . K-step observation prediction distribution: | .

Application environment 3.1 Analysis of bridge under real situation
In this paper, the Shenzhen Bay Highway Bridge is taken as an engineering example, and the data of the M10 static strain measurement point of the main beam section at the entrance of Hong Kong side of the bridge is taken as the analysis object. At present, the collected data since the installation of sensors in January 2020 have been obtained, as shown in figure 1. It can be seen from figure 2 that the daily variation of the traffic flow of the bridge is relatively stable, so it can be considered that the traffic load and the traffic mode defined in the model change consistently every day. According to the change of temperature data in figure 3, the range of temperature in half a year is 23.1℃, which can be approximately regarded as the range of temperature in one year. Among them, the temperature in April is located in a period that the temperature varies greatly and frequently, and the range of temperature reaches 12℃ in only one month. In April, there are four cases where the range of temperature exceeds 10℃ in two days. Therefore, taking the data of April as the training set can ensure the representativeness of data selection.

Normality verification
The data of M10 measuring point in April are verified to be normal. It can be seen from figure 4

Correlation verification
Taking the frequency value of M10 measuring point in April as the ordinate and the temperature value as the abscissa, it can be seen from figure 5 that the frequency value and the temperature value are approximately linear.

Separated hidden state variables
After bringing the data of M10 measuring point in April into the model, the hidden state variables can be separated from the observation variables. It can be seen from figure 6 that the strain frequency is only separated into two components, which is considered as the superposition of the baseline component and the autoregressive component of the frequency. It can be seen from figure 7 that the temperature is affected by daily and seasonal factors, and there are daily and seasonal components, so there are four components. It can be seen from figure 8 that the component type of the traffic load is the same as the strain frequency that does not include cyclic component.

Treatment of missing data
This section introduces that the missing data in structure health monitoring is predicted by BDLM. In figure 9 and figure 10, the comparison between one-step prediction values and observation values is shown in the part without data missing, and multi-step prediction values are given in the part with data missing, at this time, there are no observation values as the feedback values.
In order to explain the difference between prediction values and observation values more vividly, the relative error and absolute error are used as judgment indexes. In figure 11 and figure 12, the maximum relative error is 1.82%, and the maximum absolute error is 14.05Hz within 336 time steps.

Conclusion
By applying Bayesian dynamic linear model to the special background of bridge health monitoring, the missing data can be predicted. At the level of observation variables, observation values can be used as feedback values to optimize one-step prediction values. At the level of state variables, there is no need to consider the parameter training of the model, but only explore the changes of the components themselves, and realize the multi-step prediction under the condition of ensuring the predefined variables unchanged. In addition, the prediction results can be kept within a reasonable error range, which proves the rationality of the selection of hidden state variables and the accuracy of prediction.