An Efficient River Water Quality Prediction and Classification Model using Metaheuristics based Kernel Extreme Learning Machine

. In the previous years, water quality has been susceptible to different pollutants. Also, the various environmental conditions like vegetation, climate and basin lithology affects the quality of the river water naturally. So, the prediction of water quality (WQ) becomes a major process to control and basin lithology affects the quality of the river water naturally pollution. The rise of artificial intelligence (AI) manners can be utilized for designing predictive methods for water quality index (WQI) and classification. This study focuses on the design of metaheuristics-based kernel extreme learning machine (MBKELM) for river water quality prediction and classification. The proposed MBKELM model aims to predict and classify the quality of river water into different classes. In addition, a prediction and classification model using KELM is derived to appropriately determine the water quality. Moreover, the parameter tuning of the KELM model takes place by pigeon optimization algorithm (POA). A wide range of experimental analyses was performed on benchmark datasets and the experimental outcomes reported the supremacy of the MBKELM technique over the recent techniques. The results stated that the proposed MBKELM model has accomplished minimal MSE and RMSE values. On examining the results in terms of MSE on training set, the MBKELM model has accomplished a lower MSE of 0.00257 whereas the existing model has gained a higher MSE of 0.00336. Also, on examining the results in terms of RMSE on testing set, the MBKELM manner has accomplished a lesser RMSE of 0.05070 whereas the existing model algorithm has gained a higher RMSE of 0.05800.


Introduction
Surface water in the river plays an important part in economic development, social health, and environmental [1].Several elements affect Water Quality (WQ) in rivers, involving human factors like manufacturing, agricultural and urban practices, and natural factors like erosion and rainfall.Since surface water is the main source of freshwater around the world, its degradation might lead to substantial consequences on drinking water obtainability and, more commonly, on future strategies and economic developments [2].The communication of rivers with the path, interrelated exchange of urban, surrounding environments, industrial and agricultural contaminants, results in water pollution [3].Thus, it is highly significant to propose a novel approach to analyze and, possibly, forecast the WQ.It is suggested to deliberate the temporal dimensions for predicting the WQ pattern to guarantee the observing of seasonal changes of the WQ [4].But, certain variations of model together to forecast the WQ grant good performances than employing a single method [5].There exist various methods presented for the modeling and prediction of the WQ.This method includes visual modelling, statistical approaches, predictive algorithms, and analyzing algorithms.In order to determine the relationship and correlation amongst distinct WQ variables, multi-variate statistical technique has been applied.The geostatistical approach has been employed in regression analyses, transitional probability, and multivariate interpolation.With innovative computing with Artificial Intelligence (AI) methods, the modeling of WQ has been presented to resolve the WQ problems.The Artificial Neural Network (ANN) models assisted in monitoring of WQ system via forecasting WQ changes [6].They could immensely enhance the efficacy of aquaculture.The stimulations of WQ condition have challenges and difficulties based on the usage of water quality and hydrodynamic models, a comparatively new computation model.The ANN methods were largely determined in various disciplines and offer another technique to monitor and understand WQ in the reservoir.ANN model has been effectively employed for simulating and forecasting WQ in water bodies.Various ANN models, like feedforward neural networks, were employed in many applications [7].The fuzzy logic systems have been proposed for solving complicated non-linear systems [8].ANN application has been effectively employed as a tool for predicting and computing WQ in water bodies.ANN model needs parameters value to design prediction.ANN model contains several benefits, involving work with parallel processing, ability to learn and manage very complex nonlinear systems.[9] employed K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Neural Network (NN), and Deep Neural Network (DNN) for classifying WQ by means of information from the Pakistan Council of Research in Water Resources (PCRWR) for drinking water.This study focuses on the design of Metaheuristics Based Kernel Extreme Learning Machine (MBKELM) for river water quality prediction and classification.The proposed MBKELM model aims to predict and classify the quality of river water into different classes.In addition, a prediction and model using KELM is derived to appropriately determine the water quality.Moreover, the parameter tuning of the KELM model takes place by Pigeon Optimization Algorithm (POA).A wide range of experimental analyses was performed on benchmark datasets and the experimental outcomes reported the supremacy of the MBKELM technique over the recent techniques.

Related works
In Hu et al. [10], Convolutional Neural Network (CNN)-Gated Recurrent Unit (GRU) hybrid models are determined for predicting the attentiveness of river pollutants.Take the river flow, river flow velocity, and pollutant concentration as the input data of CNN models, the feature vectors are extracted through CNN model, and the higher dimension vectors of time sequence are created.Later, it is inputted to GRU model for training the models, also attention mechanisms are employed for optimizing the method.Lastly, pollutant concentration predictions were accomplished.Simultaneously, Auto Regressive Integrated Moving Average (ARIMA), GRU, and Back Propagation Neural Network (BPNN) methods have been employed for training and forecasting similar training sets.Othman et al. [11] make an input method with ANN model for computing the Water Quality Index (WQI) from input parameter rather than employing an index of the parameter while one of the parameters is not present.The information is gathered from the 9 WQ monitoring stations at the Klang River basin, Malaysia.Additionally, complete sensitivity analyses were performed for identifying an effective input parameter.Haghiabi et al. [12] investigate the performances of AI techniques includes ANN, Group Method of Data Handling (GMDH), and SVM to predict WQ component of Tireh River placed in the southwest of Iran.In order to improve the SVM and ANN models, distinct kinds of kernel and transfer functions are attempted, correspondingly.Hmoud Al-Adhaileh and WaselallahAlsaade [13] developed an effective process of observing drinking water to guarantee friendly and sustainable green environments.In this study, the Adaptive Network-based Fuzzy Inference System (ANFIS) method has been proposed for predicting the WQI.The FFNN and KNN models have been used for classifying WQ.The datasets have 8 important variables, however, 7 variables have been taken into account for showing important values.The presented method has been proposed according to this statistical parameter.Abobakr Yahya et al. [14] developed an effective model with SVM model for predicting the WQ of Langat River Basin via the analyses of the information of 6 variables of dual reservoir which is placed in the catchment.The presented method can be taken into account as an efficient tool to identify the WQ status for the river catchment areas.Noori et al. [15] proposed a hybrid method by integrating the method based ANN and watershed models.Integrating these 2 methods assisted to enhance the validation and calibration processes when accounting for the WQ and complex hydrological procedures.The proposed method has been used for the watershed in the Atlanta metropolitan region, USA, for predicting regular phosphate, nitrate, and ammonium loads.In Lu and Ma [16], 2 new hybrid DT based ML methods were introduced for obtaining a precise short-term WQ predictive result.The fundamental model of the 2 hybrid methods exists XG Boost and RF model that correspondingly present an innovative data denoising method -Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) model.Tryasha et al. [17] proposed a model to predict the WQI for small catchment this model works well and it is yet to be improved for large catchment.It uses Deep Learning and Random Forest for predicting the Water Quality Index which depends on six water quality variables.Ali El Bilal et al [18] proposed eight Machine Learning based model for predicting Irrigation Water Quality and it is useful for doing agriculture in an efficient manner.This work is helpful for the farmers since they can measure the input metrics using sensors and they can manage the water quality in lesser cost and time.

The Proposed Model
In this study, an effective MBKELM model is derived to predict and classify the WQ.The MBKELM model involves three major phases namely preprocessing, classification, and parameter optimization.The detailed processes involved in these modules are discussed briefly in the following sections.

Preprocessing
The processed stage is most essential in data analyses for improving data quality.During this stage, the WQI has been estimated in the very essential parameters of datasets.Afterward, the water instances are categorized on the fundamental of WQI value.The zscore approach has been utilized as data normalized manner to higher accuracy.The Zscore has been utilized for normalizing data by calculating combined mean (µ) and standard deviation.The Z-score has been implemented to scale parameter values amongst [0, 2].It can be estimated utilizing the succeeding equation: where x implies the tested instance from the dataset that exists estimated.

KELM Model
The preprocessed data are fed into the Kernel Extreme Learning Machine (KELM) model for prediction and classification purposes.Consider this the hidden layer is nonlinear due to the usage of nonlinear activation function.But, the output layers are linear with no activation function.It has 2 output, input, and hidden layers.
Where x represents a training instance, then f(x) indicates the output of NN.The SLFN using k hidden node could be given as follows: Let G(w,b,x) be the hidden layer activation function, w represents the input weight matrix linking the input layer with the hidden layer, b implies the bias weight of hidden layer, as well  = [ 1  2 …   ]is the weights among the output and hidden layers.For an ELM using n training sample, d input neuron (numbers of band), k hidden neuron, as well as m output neuron (m class), In which   represents the m-dimension preferred output vector for the ith training instance   , the d dimension w_j represent the jth weight vectors from the input layer to jth hidden neurons, also b_j signifies the bias of jth hidden neurons [17].Now, 〈  ,   〉indicates the inner product of   &  .The sigmoid function g is employed as an activation function, hence the output of jth hidden neurons is given as follows.
g(〈w j , x i 〉 + b j ) = 1/ (1 + exp (− Whereas exp(•) represents the exponent arithmetic, and ϵ^2 denotes the steepness variable.indicates the hidden layer output matrix of ELM with the size of (n,k), given in the following equation: Next, B is evaluated by a minimum norm least square solution: Whereas C indicates a standardization variable.The ELM models could be given in the following ELM could be expanded to KELM model through trick. where In which   &  represents the ith&jth training samples, correspondingly.Next, replace   with Ω, the demonstration of KELM is written as follows Let   () be the output of KELM method, and Apparently, unlike ELM, the primary feature of KELM is that the amount of hidden nodes isn't preferred to be fixed also there is no arbitrary feature mapping.Further, the computational time is decreased than ELM model because of the kernel trick employed.

Parameter tuning using POA
In order to tune the weight and bias values of the KELM model, the POA is applied to it.The POA mostly contains 2 operators: the map as well as compass operators and the landmark operators.where  refer the map as well as compass factor, but  represents the uniform arbitrary number from the range 0 and 1,   signifies the global optimum solutions [18],   () refers the present place of pigeon at sample , and   () demonstrates the present velocity of pigeon at iterations .
During the landmark operators, every pigeon is ranked based on its fitness value.In all generations, the amount of pigeons are upgraded in Eq. ( 14), in which only half amount of pigeons has been regraded for calculating the chosen place of centered pigeon, but each other pigeon alters its target by subsequent the desirable target place.
where   refers the amount of pigeons from the present iteration .
The place of the chosen purpose has been computed as Eq. ( 15), but each other pigeon upgrades its place nearby this place in Eq. (16).

Results and discussion
The study utilizes programming language Python, machine learning libraries like Scikit-     2 and Fig. 4 provide the WQ classification results analysis of the MBKELM model in terms of different measures.The outcomes ensured that the MBKELM model has accomplished maximum classification performance.On examining the results in terms of sensitivity, the MBKELM model has obtained an increased sensitivity of 99.860% whereas the FFNN and KNN techniques have attained a reduced sensitivity of 99.610% and 82.500% respectively.Moreover, on investigative the outcomes with respect to specificity, the MBKELM manner has reached a higher sensitivity of 99.890% whereas the FFNN and KNN algorithms have gained a minimal sensitivity of 99.610% and 89.500% correspondingly.Furthermore, on exploratory outcomes with respect to precision, the MBKELM manner has attained a maximum sensitivity of 99.975% whereas the FFNN and KNN methodologies have attained a minimum sensitivity of 99.961% and 82.500% correspondingly.higher sensitivity of 99.890% whereas the FFNN and KNN algorithms have gained a minimal sensitivity of 99.610% and 89.500% correspondingly.Furthermore, on exploratory outcomes with respect to precision, the MBKELM manner has attained a maximum sensitivity of 99.975% whereas the FFNN and KNN methodologies have attained a minimum sensitivity of 99.961% and 82.500% correspondingly.

Fig. 4. Comparative analysis of MBKELM model with existing approaches 5 Conclusion
In this study, an effective MBKELM model is derived to predict and classify the WQ.The MBKELM model predicts and classifies the quality of river water into different classes.In addition, a prediction and model using KELM is derived to appropriately determine the water quality.Moreover, the parameter tuning of the KELM model takes place by the POA.A wide range of experimental analyses was performed on benchmark datasets and the experimental outcomes reported the supremacy of the MBKELM technique over the recent techniques.Therefore, the MBKELM technique can be utilized as an effective tool to predict and classify WQ.In future, advanced DL based predictive models can be designed to boost the overall performance.As a forward-looking implication, the study recommends the exploration of advanced deep learning (DL) models in future research endeavors.The objective is to leverage the potential of DL techniques to further enhance the overall predictive capabilities in the domain of water quality assessment.This forward-thinking approach underscores the study's commitment to staying at the forefront of technological advancements for more accurate and sophisticated water quality predictions.

Fig. 1 .( 5 )
Fig. 1.KELM structure learn, and optimization algorithms to implement the Multi-Branch Kernel Extreme Learning Machine (MBKELM) and Kernel Extreme Learning Machine (KELM) models for water quality prediction, employing data preprocessing tools like Pandas, statistical analyses such as k-fold cross-validation, and visualization tools like Matplotlib or Seaborn, with documentation in Jupyter Notebook.The performance of the proposed model is assessed against Kaggle dataset.The different classes of WQI are shown in Fig. 2. The results are examined interms of various evaluation parameters.

Fig. 2 .
Fig.2.Different classes of WQIThe results demonstrated that the MBKELM model has accomplished minimal MSE and RMSE values.Table1shows the results in terms of MSE on training set, the MBKELM model has accomplished a lower MSE of 0.00257 whereas the ANFIS technique has gained a higher MSE of 0.00336.Also, on examining the results in terms of RMSE on testing set, the MBKELM manner has accomplished a lesser RMSE of 0.05070 whereas the ANFIS algorithm has gained a higher RMSE of 0.05800.Likewise, on inspecting the results with respect to MSE on testing set, the MBKELM methodology has accomplished a lower MSE of 0.00215 whereas the ANFIS technique has gained a higher MSE of 0.00290.Similarly, on scrutinizing the outcomes in terms of RMSE on testing set, the MBKELM model has accomplished a minimum RMSE of 0.04637 whereas the ANFIS approach has gained a higher RMSE of 0.05400.

Fig. 3 .
Fig. 3.ROC analysis of MBKELM modelFinally, Fig.3depicts the ROC curve analysis of the MBKELM technique on the applied dataset.The results portrayed that the MBKELM technique has accomplished maximum performance with the ROC of 99.9842.Table2and Fig.4provide the WQ classification results analysis of the MBKELM model in terms of different measures.The outcomes ensured that the MBKELM model has accomplished maximum classification performance.On examining the results in terms of sensitivity, the MBKELM model has obtained an increased sensitivity of 99.860% whereas the FFNN and KNN techniques have attained a reduced sensitivity of 99.610% and 82.500% respectively.Moreover, on investigative the outcomes with respect to specificity, the MBKELM manner has reached a higher sensitivity of 99.890% whereas the FFNN and KNN algorithms have gained a minimal sensitivity of 99.610% and 89.500% correspondingly.Furthermore, on exploratory outcomes with respect to precision, the MBKELM manner has attained a maximum sensitivity of 99.975% whereas the FFNN and KNN methodologies have attained a minimum sensitivity of 99.961% and 82.500% correspondingly.

Table 1
shows the results in terms of MSE on training set, the MBKELM model has accomplished a lower MSE of 0.00257 whereas the ANFIS technique has gained a higher MSE of 0.00336.Also, on examining the results in terms of RMSE on testing set, the MBKELM manner has accomplished a lesser RMSE of 0.05070 whereas the ANFIS algorithm has gained a higher RMSE of 0.05800.Likewise, on inspecting the results with respect to MSE on testing set, the MBKELM methodology has accomplished a lower MSE of 0.00215 whereas the ANFIS technique has gained a higher MSE of 0.00290.Similarly, on scrutinizing the outcomes in terms of RMSE on testing set, the MBKELM model has accomplished a minimum RMSE of 0.04637 whereas the ANFIS approach has gained a higher RMSE of 0.05400.

Table 1 .
MSE and RMSE analysis of MBKELM model on training and testing sets

Table 2 .
Comparative analysis of MBKELM model with different measuresThe outcomes ensured that the MBKELM model has accomplished maximum classification performance.On examining the results in terms of sensitivity, the MBKELM model has obtained an increased sensitivity of 99.860% whereas the FFNN and KNN techniques have attained a reduced sensitivity of 99.610% and 82.500% respectively.Moreover, on investigative the outcomes with respect to specificity, the MBKELM manner has reached a