Modelling of ammonia nitrogen in river using soft computing techniques

Ammonia nitrogen is one of the most hazardous water pollution parameters. It is crucial to monitor the concentration of ammonia nitrogen to minimize ammonia nitrogen pollution in river water. This study aims to develop a reliable model to accurately predict ammonia nitrogen concentration. Langat River was selected as the study area. Two soft computing techniques namely Backpropagation Neural Network (BPNN) and Adaptive Neuro-Fuzzy Inference System (ANFIS) were employed for the model development. Different model architectures were developed and evaluated. ANFIS model VI appears as an effective tool to serve the main objective where it has a considerably high coefficient of determination, low mean absolute and root mean squared errors, and small average percentage error. The model has an average percentage error of 23%, indicating it is able to provide an estimation accuracy of at least 77%.


Introduction
Rapid urbanization and development have become an unavoidable trend worldwide. However, as most of the nations are developing tremendously, degradation of water quality in rivers has raised a serious concern [1]. Consequences of river water quality deterioration are declining in the health of aquatic lives and cause a partial or complete change of species composition in the polluted river watershed [2]. Nowadays, a great number of rivers are no longer able to sustain aquatic lives and are not suitable for human activities due to worsening water quality. Therefore, assessment of river water qualities is of utmost importance in developing countries as rivers with acceptable water quality will become scarce in the future [3].
Ammonia nitrogen comprises un-ionized ammonia (NH 3 ) and ammonium ions (NH 4 + ). Ammonium will dominate ammonia when pH is lower than 8.75, whereas ammonia is the predominant form at a pH of more than 9.75 [4]. In the natural condition of river water, ammonia nitrogen occurs as ammonium. However, the portion of un-ionized ammonia builds up as temperature and pH levels are inclined [5]. Although the fraction of ammonia is much lower than ammonium in natural water, the toxicity of ammonia is immensely high and dissolved ammonia acts as the main contributor to toxicity in ammonia nitrogen.
Hence, the concentration of ammonia nitrogen should be studied at all times to maintain the health of rivers. The high toxicity of un-ionized ammonia will cause irreversible negative impacts on the growth of aquatic organisms, the weight of organs and the condition of gill. As a consequence, the infancy and survival of fish and other aquatic organisms would be diminished immensely and finally lead to the extinction of certain aquatic populations [6]. In return, development of the economy would be affected, and human health will be another concern. Besides, a redundant amount of ammonium favours the production of phytoplankton, this will eventually lead to algae bloom and eutrophication in bodies of water. Following the mass death of algae, the appearance of a large quantity of dissolved organic matter would significantly reduce the concentration of dissolved oxygen through microbial respiration [4]. Therefore, the survivability of aquatic organisms becomes an issue due to hypoxia and environmental issues will also arise such as increase emission of nitrous oxide [5].
As one of the most hazardous water pollution parameters, ammonia nitrogen has dealt a tremendous impact on the ecological community by discharging through anthropogenic runoff into the aquatic ecosystem [7]. Therefore, it is crucial to monitor the concentration of ammonia nitrogen to minimize ammonia nitrogen pollution in river water. However, the relevant study is still limited. Therefore, it is necessary to develop a reliable model to accurately predict ammonia nitrogen concentration.

Methodology
Langat River is selected as the study area. It is located in Selangor, Malaysia. The main reason for choosing it as the area of interest is it flows through a highly urbanised area, with serious erosion issues. Moreover, it is significantly affected by dredging and sand mining activities which then cause drastic depositions and sedimentations problems [8]. The water quality data is obtained from the Department of Irrigation and Drainage (DID) Malaysia. The duration of the dataset is ranged from 17 July 2012 to 22 February 2018.
In this study, Backpropagation Neural Network (BPNN) and Adaptive Neuro-Fuzzy Inference (ANFIS) techniques were proposed for the prediction of ammonia nitrogen concentration in rivers. Dissolved solids (DS), turbidity (T), total solids (TS), phosphate (PO 4 3-) and nitrate (NO 3 -) were selected as the input parameters for the BPNN and ANFIS models because they were recognized to have a higher correlation with ammonia nitrogen. There is no rule of thumb while setting the data splitting ratio. However, based on the previous practices, the most popular range of training to testing data ratios used by past researchers was from 60%:40% to 80%:20% [9][10][11]. Hence, to train the model with the most feasible input-output patterns, the upper boundary limit is chosen in this study, where the datasets were divided into a ratio of 80:20 for training and testing purposes respectively.
A three-layer BPNN model was implemented to predict ammonia nitrogen concentration and the model architecture is presented in Fig. 1. The input water quality parameters were dissolved solids (DS), turbidity (T), total solid (TS), phosphate (PO 4 3-) and nitrate (NO 3 -). Meanwhile, the output variable was the concentration of ammonia nitrogen.
In this study, the architecture of the BPNN model contained only one hidden layer. The transfer function implemented between the input layer and hidden layer was sigmoid. It was selected is due to its capability to provide a prediction model with greater performance. The training algorithm employed in BPNN model was the Levenberg Marquardt (trainlm). The main reason is, when dealing with function fitting (non-linear regression) problems, Levenberg-Marquardt (trainlm) can exhibit a higher performance [11][12][13]. In the process of developing BPNN model, it is always a critical task to determine the number of neurons in the hidden layer. Since neural network models are sensitive to number of neurons in the hidden layer, under-fitting issue may occur when there are too few neurons while over- fitting issue occurs when the number of neurons is redundant. The above-mentioned issues may affect the prediction accuracy and hence reduce its reliability [14][15][16][17][18]. In this study, a trial and error method was implemented to avoid the occurrence of such an issue. Initially, the hidden neurons were assigned at a number of 2 hidden neurons and slowly incremented to 10.  Table 1.
The developed BPNN and ANFIS models were then evaluated by using a series of statistical analyses, containing coefficient of determination (R 2 ), mean absolute error (MAE), root mean squared error (RMSE) and average percentage error (% error).

Result and discussions
In this section, the outcomes of statistical analyses on the proposed BPNN models are presented to select the most suitable model for the ammonia nitrogen prediction.  Fig. 3 shows the bar chart presenting the average percentage error of each BPNN. The highest and the lowest average percentage error is 277% and 56%   To further seeking the possibility to improve the model accountability for ammonia nitrogen prediction, several ANFIS models were developed using different combinations of input and output membership functions. The performances were evaluated and tabulated in Table 3.
In general, a R 2 value that approximates 1 is always preferable. Models VI, II, VII, I and IV are considered as the models that have shown a stronger relationship between the observed values and the predicted values among the examined models because they are the five models with a R 2 value of above 0.70.
On the other hand, in terms of mean absolute error (MAE) and root mean squared error (RMSE), a smaller error value is always encouraging as it exhibits a better model performance. As tabulated in Table 3, model VI has achieved the highest R 2 value and recorded the lowest MAE and RMSE values.
Percentage error analysis is employed to further verify the appropriateness of the model. Figure 4 shows the average percentage error for the corresponding ANFIS models. Overall, there are 4 models with an average percentage error lower than 30%. Model VI shows the lowest percentage with 23.4%, followed by Models II (25.7%), VIII (27.7%) and X (28.3%). The best model obtained from BPNN and ANFIS is further compared in terms of their accuracy to identify which model is more appropriate to achieve the main objective of this study. The comparisons of the models are tabulated in Table 4. From the perspective of coefficient of determination, ANFIS model has achieved a performance that is more outstanding if compare to BPNN. Meanwhile, in terms of MAE and RMSE, ANFIS model has recorded a smaller value. A smaller error value is always preferable in the model evaluation as it indicates a higher prediction accuracy. In addition, the ANFIS model has an average percentage error of 23% which is lower than 56% for BPNN model. In other words, the ANFIS model is able to provide a prediction accuracy of at least 77%. Based on all the indicators, it can be concluded that the ANFIS model is a better option for the ammonia nitrogen prediction in the river.

Conclusions and recommendations
Two soft computing techniques namely BPNN and ANFIS were used to develop the computational model for ammonia nitrogen prediction in the river. The model performances were evaluated by statistical analyses. ANFIS model VI is considered as the best-performed model, as it shows the highest R 2 value of 0.849, low MAE and RMSE values of 2.74 and 4.16 respectively, and the lowest average percentage error of 23%. In other words, the model can provide an accuracy of at least 77%. The model accuracy can be further enhanced by having a wider range of datasets or through the implementation of more advanced or hybrid machine learning techniques.