Water quality prediction analysis of Qingyi River based on time series

. According to the current situation of water quality in drainage basin, the key to improve the prediction accuracy is to select the appropriate prediction model of water quality. The time series method excellently reflected the continuity of the future data in the case of emphasizing historical data. What’s more, the time series method has the higher short-term prediction accuracy and simple modeling process. So, the time series method was used to establish the Auto-Regressive and Moving Average (ARMA) model for the time series of the concentration of dissolved oxygen (DO), biochemical oxygen demand (BOD5), chemical oxygen demand (CODCr), ammonia nitrogen (NH3-N) and total nitrogen (TN) at the Guidu fu section of Qingyi River from January 2011 to December 2015. Then, the concentrations of the five water quality indicators from January to June 2016 were predicted, which were verified and analyzed with the measured values. The results show that the model has fine fitting effect and higher prediction accuracy, which can accurately reflect the current and future change trends of the water quality.


Introduction
The water environment problem in drainage basin is one of the most important problems faced by environmental management in China in recent years. Water quality prediction can evaluate water quality changes as early as possible, which has great significance for water environment protection. According to the difference between the theoretical basis and the solution method when establishing the water quality prediction model, the water quality prediction model is divided into mechanism water quality model and non-mechanism water quality model. Because the mechanism water quality prediction model uses the governing equation to describe the changing trend of water quality, it is necessary to identify the parameters with practical physical significance in the modeling process, which leads to the modeling process more complex [1]. Therefore, non-mechanism water quality prediction models are widely used in water quality prediction at present. The commonly used nonmechanism water quality prediction models include artificial neural network method [2], grey system theory method [3][4][5][6], time series method [7][8] and so on. But in practical application, the mathematical theory of artificial neural network method is not perfect, the data training speed is too slow, and sometimes it is impossible to obtain results [9]. The grey system theory method has weak antiinterference ability and gray deviation, which leads to the model prediction accuracy not meeting the requirements [10]. The time series method has a relatively complete mathematical theory foundation, which can make full use of historical data to make quantitative predictions for future water quality, and the short-term prediction accuracy is well [11].
Time series method is a mature data processing method. It analyzes and studies the corresponding time series mathematical model established by the dynamic water quality parameter data, and excavates the periodic information of the water quality change to make an accurate prediction of the data change trend. What's more, the method has been widely used in domestic and foreign economic, life science, physics, computer and other fields [12][13][14][15][16][17]. In the field of water quality prediction, the time series method is mainly applied to the prediction of water quality indicators, and most of the prediction ranges are more than 12 steps [18]. However, the practice shows that the time series method has higher accuracy in short-term prediction, and the longer the prediction time, the greater the error [19]. Because of the uncertainties of external factors on the water quality of Qingyi River and the insufficient information of relevant data, the prediction accuracy of long-term time scale cannot meet the requirements. Therefore, this study used six-step prediction in the prediction time, in order to ensure the higher accuracy. Qingyi River is a tributary of Minjiang River. Due to the rapid development of social economy along the river in recent years, it inevitably has a negative impact on the water quality. Hongya section of Qingyi River, as the drinking water source of Hongya County in Meishan City, undertakes the drinking water safety guarantee of nearly 400,000 people. This work established an Auto-Regressive and Moving Average (ARMA) model to fit the water quality data of the Hongya section of the Qingyi River from January 2011 to December 2015, and predicted its evolution trend. In order to provide a scientific basis for local water environment management and protection.

Water quality prediction method based on time series
The time series refers to a series of measured data arranged in chronological order. Water quality prediction model of time series is to use the historical measured data of water body to predict the current value of water quality at a certain time in the future.
If a linear combination of the past interference value and the current interference value of time series { } is represented by ; { }is white noise and is recorded as WN(0, σ 2 ); ɑ 0 , ɑ 1 , ɑ 2 , ..., ɑ p (ɑ p ≠0) are the auto-regressive coefficient; b 0 , b 1 , b 2 , ..., b p (b p ≠0) are the moving average coefficient. Both the auto-regressive coefficient and the moving average coefficient are real numbers and are determined by the least squares parameter estimation method, represents the value of random phenomena at time t. Then, the Auto-Regressive and Moving Average model as follows: Which is recorded as ARMA (p, q) model, and the time series that satisfies the model are called the ARMA (p, q) series.
In the equation (1), represents the measured value of water quality indicator at time t-p, and represents the interference value of the current time, that is, the data in the series have non-extractable information for establishing the model. Therefore, is a linear combination function of the measured value of water quality indicator in the previous time p, the past interference value and the current interference value.
Generally, the modeling process of ARMA (p, q) model are divided into four steps: stationarity test and white noise test of the measured value of water quality, coefficient estimation and model test of water quality prediction model.

Establishment of ARMA model for Qingyi River
In this work, five water quality indicators data such as DO, BOD 5 , COD Cr , NH 3 -N and TN were selected from the Guidu fu section of Qingyi River from January 2011 to June 2016. The ARMA(p,q) model was established with the measured concentrations of the five water quality indicators as sample data from January 2011 to December 2015, and the significance test of the model was carried out. Then, the concentration values of five water quality indicators from January to June 2016 were predicted and verified. The results show that the model is effective and the fitting effect is well. This section takes the concentration of COD Cr indicator as an example to describe the modeling process in detail.

Stationarity test for time series of water quality measured data
The stationary series is the basis of time series analysis. Many stationarity testing methods in time series analysis assume that data samples are from stationary and ergodic stochastic processes, that is, their expectations, variances and auto-covariance functions do not change with time, and can replace the overall average with time average. In this paper, the stationarity of time series { }of COD Cr concentration in Qingyi River were tested by Augmented Dickey Fuller (ADF) Test [20] and Phillips-Perron (PP) test [21]. As shown in Table 1, the statistics of ADF and PP methods of COD Cr concentration in the time series are -3.636710 and -3.735216, respectively, which are less than the critical values of 1% level, 5% level and 10% level. Therefore, the original hypothesis is rejected and the series passed the test at 99% confidence level, that is, the series are stationary series.

White noise test for time series of water quality measured data
The white noise test for time series of water quality measured data is a necessary step in the modeling of water quality prediction model of Qingyi River. White noise series means that there is no correlation between the series itself, and the series composed of historical water quality measured data has no significance for future water quality prediction. That is to say, the white noise series without any information cannot be used to establish time series model. Only time series without white noise test can be used for modeling. This work used the Q statistic to test whether the time series {X t } is a white noise series. Given the significance level ɑ, when the P values (probability) of the test statistic is less than the significance level ɑ, the time series of the water quality indicator is considered to be a non-white noise series. The test results as shown in Fig.1. As can be seen from Fig.1, the P values of COD Cr white noise test is less than the significant level (ɑ=0.05), so the original assumption that the series { } is white noise is rejected, that is, the series { } is not white noise series.

Coefficient estimation of water quality prediction model
The order p and q of the ARMA (p, q) model can be determined by observing the autocorrelation coefficient and partial correlation coefficient of the water quality measured data time series { } after stationarity test and determination of the non-white noise series.
As can be seen from Fig.1, both autocorrelation and partial correlation coefficient diagrams of series are trailing. Therefore, ARMA (p, q) model should be selected to determine the optimal model order after determining multiple sets of p and q values and validity test of the model and significance test of the parameters through AIC criterion or BIC criterion [22]. The best model is ARMA (2, 1) by calculating on Eviews 8.0 software many times, and the estimated coefficients as shown in Table 2. Then, by introducing the fitted model coefficients into equation (1), the final expression of the model can be obtained as follows:

Testing of water quality prediction model
Before the model can be used for prediction, the time series of water quality measured data must be tested, including the white noise test of residual. Only the residual series is white noise to prove that the model has extracted all information of the original time series { }. The white noise of the residual can be obtained by the above white noise test method. Fig.2 shows the autocorrelation and partial correlation of residuals.

Fig.2. The autocorrelation and partial correlation of residual
As can be seen from Fig.2, the autocorrelation coefficients and partial correlation of the residuals fall within the range of two standard deviations, so it was considered that the residuals are white noise series. Combined with Q statistics and corresponding P values, it can be found that P values are greater than 0.05, that is to say, the residual is a white noise series, and the model is effective. In the coefficient estimation of Table 2, the P values corresponding to all t-test results is less than 0.05, so the parameter is significant. In conclusion, the model ARMA (2, 1) is effective.
The actual COD Cr concentration data, the fitting result of the model and the residual of Guidu fu section from January 2011 to December 2015 are shown in Fig.3, in which the horizontal axis is time (2011-2015) and the vertical axis is COD Cr concentration. It can be seen that the fluctuations between the fitted and measured values of the model are consistent and the residuals are small, which shows that the fitting effect of the model is well.

Verification and analysis of water quality prediction results
Based on the above methods, 300 data of five water quality indicators of DO, BOD 5 , COD Cr , NH 3 -N and TN for Qingyi River from January 2011 to December 2015 were used to establish the corresponding water quality prediction model and predicted the concentrations and the change trend of five water quality indicators from January to June 2016. The prediction results as shown in Fig. 4-8.

DO prediction results
Dissolved oxygen (DO) means the amount of molecular oxygen dissolved in water, expressed in milligrams of oxygen per liter of water. It is an important indicator for measuring water pollution and an important condition for water body to achieve self-purification. The higher the degree of contamination of the water, the less DO concentration. The predicted results of DO are shown in Fig. 4.
As shown in Fig. 4, the DO concentration of Qingyi River from 2011 to 2015 shows a relatively stable characteristic, which is above 6mg/L and meets the requirement of the environment quality standard for surface water class II and above. However, in 2016, DO concentration began to decrease, and the concentration are lower than 6 mg/L from April. Water quality category reduced from class II to class III. According to the data of water quality survey, the overall water quality of the the Guidu fu section of Qingyi River decreased in the first half of 2016 compared with the same period in 2015. From January to June, the water temperature gradually increased, which also led to the decrease of DO concentration in water. The predictive value are compared with the measured value in 2016, and the overall trend is consistent, showing the accuracy of ARMA prediction model.

BOD5 prediction results
Biochemical oxygen demand (BOD 5 ) refers to the amount of oxygen consumed by microorganisms when they decompose organic compounds in water. It is a comprehensive indicator of the concentration of aerobic pollutants such as organic compounds that can be decomposed by microorganisms in water. The prediction results are shown in Fig. 5.
As can be seen from Fig. 5, the concentration of BOD 5 in Qingyi River is within the standard of Class I from 2011 to 2016. However, it has been rising slowly since December 2015, which indicates that the water quality has a downward trend.

CODCr prediction results
In general, chemical oxygen demand (COD Cr) is used to represent the total amount of organic matter in water. The predicted results are shown in Figure 6.
As shown in Fig. 6, the concentration of COD Cr in the water body from 2011 to 2015 conforms to the standard of Class II. and it has an obvious upward trend since December 2015. In June 2016, the concentration of COD Cr is more than 15mg/L, which belongs to the standard of Class III. As an important indicator of organic pollutant content in water, the concentration of COD Cr increased, which indicated that the influence of organic pollutants on Qingyi River water increased. From the Fig.  6, it can be seen that the fitting effect between the predictive value and the measured value is well, which can better reflect the actual change trend of chemical oxygen demand.

NH3-N prediction results
Ammonia nitrogen (NH 3 -N) is a product of organic matter degradation by microorganisms in water, and it is a kind of inorganic nutrients. If the concentration of NH 3 -N in water body is too high, it can lead to eutrophication of water body, which is harmful to aquatic organisms. Therefore, the prediction of NH 3 -N concentration in water is particularly important. The prediction results are shown in Fig. 7.
As shown in Fig. 7, the concentration of NH 3 -N fluctuated greatly from 2011 to 2014 in the Guidu fu section of Qingyi River. From 2011 to 2015, the change of concentration was in line with the standard of Class III, and there was a trend of decreasing year by year. The concentration of NH 3 -N from January to June 2016 also showed a downward trend, which began to be lower than 0.15 mg/L in June and reaching the standard of class I. Except for some errors in April, the results of the overall simulation are in good agreement with the measured values, which can reflect the change trend of short-term water quality.

TN prediction results
Total nitrogen (TN) refers to the total amount of various forms of nitrogen in water, which is one of the important indicators used to measure water quality. The prediction of TN concentration is helpful to understand the status of water contaminated by nutrients. The predicted results are shown in Fig. 8.
It can be seen from Fig. 8 that the concentration of TN in Qingyi River maintained basically stable trend from 2011 to 2014, which belonged to the water quality standard of Class III. But from June 2015, the concentration has generally increased, and it belongs to water quality standard of Class IV. Since 2016, the concentration of TN continues to rise, which may increase further, indicating that the pollution of water by nutrients will increase and should be further controlled.

Result analysis
From Fig. 4-8, it can be seen that the ARMA model can efficiently simulate the change of water quality in Qingyi River from 2011 to 2015, which is consistent with the actual change trend. It can predict the concentration changes of DO, BOD 5 , COD Cr , NH 3 -N and TN indicators. In comparison with the measured values in 2016, all the indicators can be reasonably simulated and predicted. And especially, the short-term prediction is well. Because the abnormal operation of sewage pipe network and centralized treatment facilities in some industrial parks along the Qingyi River in the first half of 2016, and the direct discharge of sewage into the river, which led to less deviations in the prediction results of the indicators. Therefore, the influence of external factors should also be taken into account in practical application in order to improve the accuracy of prediction. Generally, the prediction results are reasonable and can provide a scientific reference for water quality control in drainage basin.

Conclusions
The ARMA model based on the time series of water quality indicators can be used to simulate and predict water quality indicators accurately. The DO, BOD 5 , COD Cr , NH 3 -N and TN indicators have achieved good results in the prediction of 2016, showing the accuracy of the model predictions. It can grasp the short-term trend of pollutants and provide reliable basis for water environment planning and management in drainage basin. The prediction and analysis of five indicators of Guidu fu section in Qingyi River shows that the concentrations of BOD5, CODCr and TN indicators will increase in the next months. It result the reduction of DO concentration, which indicates that controlling the concentrations of BOD5, CODCr and TN is still an important task of water quality management and control in Qingyi River.