Evaluating Clustering Algorithms: An Analysis using the EDAS Method

. Data clustering is frequently utilized in the early stages of analyzing big data. It enables the examination of massive datasets encompassing diverse types of data, with the aim of revealing undiscovered correlations, concealed patterns, and other valuable information that can be leveraged. The assessment of algorithms designed for handling large-scale data poses a significant research challenge across various fields. Evaluating the performance of different algorithms in processing massive data can yield diverse or even contradictory results, a phenomenon that remains insufficiently explored. This paper seeks to address this issue by proposing a solution framework for evaluating clustering algorithms, with the objective of reconciling divergent or conflicting evaluation outcomes. “The multicriteria d ecision making (MCDM) method” is used to assess the c lustering algorithms. Using the EDAS rating system, the report examines six alternative clustering algorithms “the KM algorithm, EM algorithm, filtered clustering (FC), farthest-first (FF) algorithm, make density-based clustering (MD), and hierarchical clu stering (HC)”— against, six clustering external measures. The Expectation Maximization (EM) algorithm has an ASi value of 0.048021 and is ranked 5th among the clustering algorithms. The Farthest-First (FF) Algorithm has an ASi value of 0.753745 and is ranked 2nd. The Filtered Clustering (FC) algorithm has an ASi value of 0.055173


Introduction
Clustering is commonly used in the early stages of big data analysis to partition large datasets into smaller segments.This division allows for easier comprehension and management of the data, facilitating subsequent analytical operations [1].Choosing the right clustering algorithm is "crucial for handling large-scale data and evaluating the performance of clustering algorithms" is an ongoing and important concern in various fields, including fuzzy set theory, genomics, data mining, computer science, machine learning, business intelligence, and financial analysis [2].Researchers and professionals from diverse fields such as computer science, economics, political science, bioinformatics, sociology, and others often engage in discussions to evaluate the potential advantages and disadvantages of analyzing these data to aid decision-making.However, the decisionmaking process is highly intricate due to the presence of conflicting interests among multiple stakeholders and the complexity of the systems involved [3,4].Clustering algorithms are unsupervised machine learning algorithms that operate without prior information.They divide the original data space into smaller segments based on high dissimilarities between groups and high similarities within groups.Clustering is a versatile technique that can be applied to process large-scale data of different types, aiming to discover previously unknown correlations, hidden patterns, and potentially valuable information [5,6].To uncover the valuable insights concealed within location data, researchers often rely on clustering learning algorithms, which are widely employed in various studies.Cluster analysis, also referred to as group analysis, serves as not just a statistical method for investigating classification problems (such as samples or indices), but also as a vital algorithm in the field of data mining.Cluster analysis encompasses several patterns, with a pattern typically representing a measurement vector or a point within a multi-dimensional space [7].Cluster analysis relies on the concept of similarity, where patterns within a cluster exhibit higher similarity compared to patterns outside the cluster.Clustering analysis algorithms can be categorized into different types, including partition methods, hierarchical methods, density-based methods, grid-based methods, and modelbased methods.These algorithms employ various techniques to group similar patterns together and distinguish them from dissimilar patterns [8].

Clustering Algorithms
Clustering algorithms are commonly categorized into four main classes: partitioning methods, hierarchical methods, density-based methods, and model-based methods.Numerous classic clustering algorithms have been introduced and studied, including the Kmeans algorithm, k-medoid algorithm, expectation maximization (EM), and frequent pattern-based clustering algorithms [9].This research paper focuses on an empirical study involving six influential clustering algorithms: "KM algorithm, EM algorithm, filtered clustering (FC), farthest-first (FF) algorithm, make density-based clustering (MD), and hierarchical clustering (HC)".These algorithms can be implemented using "the WEKA software" [10].

KM algorithm
It is a well-known unsupervised learning method primarily employed for categorizing samples (n) into k distinct categories.The algorithm begins by inputting n samples and specifying the desired number of categories, k.Initially, k points are randomly selected from the n samples as the initial cluster centers.Subsequently, the algorithm computes the distances between the n samples and the k cluster centers, assigning each sample to the nearest cluster center.The algorithm then calculates "the average of all samples within each cluster to obtain new cluster centers".This process is repeated iteratively while assessing a criterion function.The algorithm continues until the clustering result aligns with the specified criterion function, yielding the final clustering outcome [11].

EM algorithm
It is a versatile technique used to estimate maximum likelihood in scenarios involving missing values or latent variables.It is particularly useful for mixture models, where an observed data set of a random variable Y is classified into mixture components based on probabilities.The fundamental concept underlying this algorithm is the assumption that the dataset originates from an unobservable discrete random variable U, which signifies the mixture component responsible for generating each observation yi.The algorithm iteratively fits these probabilities, updating them in each iteration until a convergence criterion is met [12].

Filtered clustering (FC)
Filtered clustering is a data mining and machine learning technique that aims to identify and group similar objects or data points using specific filtering criteria.It is a modified version of traditional clustering algorithms that incorporates additional constraints or filters during the clustering process.Filtered clustering applies the filtering criteria to the data points either before or during the clustering process, effectively excluding certain objects from consideration.These filters can be based on specific attributes, ranges, or patterns observed in the data.The primary objective of filtering is to narrow down the scope of the clustering algorithm to a subset of the data that satisfies predetermined conditions, thereby enhancing the accuracy and relevance of the resulting clusters [13].

Farthest-first (FF) algorithm
FF, "a powerful greedy permutation method in computational geometry", involves traversing a sequence of points in space with a stochastic starting point and selecting subsequent points that are as distant as possible from the previously chosen to set of points.In the context of clustering, FF clustering applies the FF traversal technique to optimize Kmeans.It starts by selecting centroids and then assigns samples to clusters based on maximum distance.Specifically, k centroids are generated by stochastically selecting a data point as the first centroid and greedily choosing the second centroid that is farthest from the first.This process is repeated k times.Once all centroids are determined, FF assigns the remaining data to the cluster that has the closest feature distance.Unlike K-means, FF only requires one traversal to cluster the data.The cluster centers are actual data points, and their positions remain fixed throughout the computation.This approach typically accelerates the clustering process as it involves fewer reassignments and adjustments [14].

Make density-based clustering (MD)
This algorithm effectively discerns regions of both high and low densities and can be applied to detect clusters of various shapes, including circular clusters.It doesn't necessitate predefining the number of clusters, and it is also capable of detecting outliers in the dataset.The density-based clustering algorithm relies on two parameters: the radius of influence or epsilon value and the minimum points condition, which determine the desired density parameter.To form a cluster, the algorithm initiates arbitrarily within a identified region and encompasses an area with an epsilon value (Eps = 0.02268) to include the minimum number of points [15].

Hierarchical clustering (HC)
The process of training multiple models for subsets of related clients can be accomplished by clustering the model updates received from the clients.However, many unsupervised clustering algorithms require an a priori estimation of the number of clusters.Since it is not possible to know how many unique data-generating distributions the clients' datasets are drawn from, it becomes necessary to use a clustering algorithm that can autonomously determine the number of clusters.Nonetheless, some clustering methods that automatically determine the number of clusters fail to assign outlier samples to a cluster and label them as noise (such as DBSCAN).In this scenario, hierarchical clustering emerges as a suitable choice for clustering when the number of clusters is unknown, as it assigns all examples to the most relevant cluster.Another advantage of using hierarchical clustering is its capability to handle large numbers of samples and clusters while also providing reasonable interpretability [16].External measures for evaluating clustering results are considered more effective compared to internal and relative measures.In this particular study, six external measures for clustering evaluation are chosen.These measures include "entropy, purity, Rand index (RI), adjusted Rand index (ARI), Fowlkes-Mallows index (FMI), and Jaccard coefficient (JC)".Notably, entropy and purity measures are extensively used as external evaluation metrics in the fields of data mining and machine learning [17].

Entropy
Entropy can serve as a metric for "evaluating the effectiveness of a clustering algorithm".When applied to clustering, entropy measures the level of uncertainty or randomness in "the assignment of data points to clusters".A lower entropy value indicates higher purity and greater separation among clusters, whereas a higher entropy indicates more mixed and less distinct clusters.The entropy value typically falls within the range of 0 to log2(C), where C represents the number of clusters.A lower entropy value signifies better separation and clearer boundaries between clusters, while a higher entropy value suggests increased overlap or blending between clusters.

Purity
Purity serves as another measure for "evaluating the efficacy of a clustering algorithm".It quantifies the degree to which the resulting clusters align with the true class labels or ground truth of the data points.To compute the purity of a clustering solution, each cluster is assigned to the majority class among its "data points".Purity is a straightforward and easily interpretable metric, particularly when the ground truth labels are available.However, it does not take into account the structure or density of the clusters and may not be suitable for all clustering scenarios.Therefore, it is often employed in conjunction with other evaluation metrics to obtain a more comprehensive understanding of the clustering algorithm's performance.

Rand index (RI)
The Rand index (RI) is a widely used metric for evaluating clustering algorithms, which measures the similarity between the clustering solution and the true class labels or ground truth of the data points.It assesses the level of agreement between pairs of data points in terms of their assigned clusters.To compute the Rand index, each pair of data points in the dataset is considered, and their clustering assignments are compared to the true class labels.The RI is calculated based on the number of agreements and disagreements between these assignments.It's important to note that the Rand index is symmetrical, meaning it focuses on the overall agreement rather than the specific clustering assignments.

Adjusted Rand index (ARI)
It quantifies the level of agreement between the clustering solution and the true class labels or ground truth of the data points, while taking into account chance agreements.The ARI adjusts the Rand index (RI) by comparing the observed agreement between clustering assignments and true labels with the expected agreement under a random clustering model.This adjustment enables a more reliable assessment of clustering performance."The ARI value ranges from -1 to 1, where a value of 1 indicates a perfect clustering solution" that precisely matches the true class labels.Conversely, a value close to 0 or 0 suggests a random clustering solution, while negative values indicate that the agreement between clustering and true labels is worse than random.It is important to note that while ARI is widely used, it does have limitations.For instance, ARI can be "influenced by the number of clusters or class labels in the dataset", and it assumes that the true labels are known, which may not always be the case in unsupervised learning scenarios.

Fowlkes-Mallow's index (FMI)
This is a widely used metric for evaluating clustering algorithms, especially when "the ground truth class labels are known".It assesses the similarity between "the clustering solution and the true class labels" by considering both pairwise similarities and the assigned clusters.The FMI is derived from precision and recall measures and calculates the geometric mean of these two metrics to evaluate the clustering performance.The Fowlkes-Mallows Index "takes values between 0 and 1, with a value closer to 1 indicating a higher similarity between the clustering solution and the true class labels", while a value closer to 0 suggests a lower similarity.It's important to note that the Fowlkes-Mallows Index assumes the availability of ground truth class labels and evaluates the agreement between the clustering and the true labels.It may not be suitable for scenarios where the ground truth is unavailable or when the clustering objective differs from the class labels.

Jaccard coefficient (JC)
The Jaccard coefficient (JC) is not commonly employed for evaluating clustering algorithms.Typically utilized in set theory, the Jaccard coefficient primarily measures the similarity between sets or binary data.However, if the objective is to assess the similarity between two clustering solutions, the Jaccard coefficient can be used to compare the overlapping clusters.The Jaccard coefficient "ranges from 0 to 1, with a value of 1 indicating a perfect agreement between the two clustering solutions", while a value of 0 signifies no agreement between the solutions.

The EDAS Method
"The EDAS (Evaluation based on Distance from Average Solution) method" is a decisionmaking approach that assesses and ranks alternatives based on their performance across multiple criteria.It involves comparing the performance of each alternative to the average performance across the evaluated criteria [18].In the EDAS method, performance metrics or indices are utilized to measure the performance of the alternatives.These metrics can encompass various criteria such as accuracy, efficiency, or effectiveness.The performance values of each alternative are then compared to the average performance across the evaluated metrics [19,20].The EDAS method calculates two main metrics: "Positive Distance from Average (PDA) and Negative Distance from Average (NDA)".The PDA quantifies the positive deviation of an alternative's performance from the average, indicating better performance.On the other hand, the NDA quantifies the negative deviation, signifying worse performance [22].By considering both "positive and negative deviations from the average", the EDAS method provides a comprehensive evaluation and ranking of alternatives.It assists decision-makers in identifying the alternatives with the best overall performance and supports them in making informed decisions "based on the relative distances of the alternatives from the average performance" [23].
➢ Select the characteristics that best define the decision possibilities for the given decision problem."The decision matrix X" is generated to show the performance of different options relative to specific criteria.X=x11x12⋯x1nx21x22⋯x2nx31x32⋯x3n (1) ➢ Weights for the criteria are expressed in equation 2.

wj=[w1⋯wn]
, where j=1nw1⋯wn=1 ➢ The average result regarding all criteria must be computed using the formulas presented below, per the specification of the EDAS method: AVj=j=1nkijn ➢ The positive distance from average (PDA) is expressed in equation 4.Here B is "Beneficial criteria", and C is "non-beneficial criteria".
➢ The negative distance from average (NDA) is expressed in equation 5.Here B is "Beneficial criteria", and C is "non-beneficial criteria".
➢ The equation 2 multiplied by 4 and 5 respectively is used to calculate the weighted sums of the positive and negative distances from the average solution for all alternatives.These weighted sums are then normalized to determine the final scores.

SPi=j=1mwj×PDAij
➢ Equations 8 and 9 are used to "normalize the weighted sum of the positive and negative distances from the average solution" for all alternatives. NSPi=SPimaxi(SPi) NSNi=1-(SNimaxiSNi) ➢ "The final appraisal score (ASi) for all alternatives" is determined by taking "the average of the normalized weighted sum of the positive and negative distances from the average solution" for each alternative.
The best choice among the selective alternatives is determined by selecting the alternative with the highest appraisal score, where "the appraisal score for each alternative is between 0 and 1" [24].

Analysis And Discussion
Table 1.The starting values of the external evaluation measures Table 2. Positive Distance from Average (PDA) 0.1011 0.0000 0.0000 0.0000 0.0000 0.0956 0.0000 0.8794 0.2171 0.2088 0.0000 0.1150 0.0000 0.0000 0.0000 0.0000 0.1913 0.0000 0.9578 0.2301 1.6667 0.2213 0.0273 0.0696 0.0000 0.0000 0.0000 0.0000 0.0000 0.1150 0.0000 0.0000 0.0000 0.0000 0.1831 Table 2 showcases "the PDA (Positive Distance from Average) values" for various clustering algorithms applied to a particular dataset using the EDAS method.The PDA metric quantifies the positive deviation of each algorithm's performance from the average performance across the evaluated metrics.These PDA values enable an evaluation of each algorithm's relative performance compared to the average across the metrics, facilitating comparison and comprehension of their respective qualities using the EDAS method.

Table 3. Negative Distance from Average (NDA)
Table 3 displays the NDA (Negative Distance from Average) values for various clustering algorithms applied to a specific dataset using the EDAS method.The NDA metric quantifies the negative deviation of each algorithm's performance from the average performance across the evaluated metrics.These NDA values offer an evaluation of each algorithm's relative performance compared to the average across the metrics, indicating their respective negative deviations from the average.This information is derived using the EDAS method.4 illustrates the allocation of equal weights (0.1667) to various evaluation metrics, such as "purity, entropy, Fowlkes-Mallows index (FMI), Rand index (RI), adjusted Rand index (ARI), and Jaccard coefficient", for different clustering algorithms applied to a specific dataset.This uniform weight distribution implies that each metric is considered equally important in evaluating the performance of these clustering algorithms on the dataset.clustering algorithms applied to a specific dataset.The Weighted PDA metric incorporates the weighted positive deviations of each algorithm's performance from the average performance across the evaluated metrics.These Weighted PDA values assess the relative performance of each algorithm by considering these weighted positive deviations from the average across the metrics.They provide valuable insights into the performance comparison of the algorithms in question.

Conclusion
Clustering algorithms have a crucial role in the field of machine learning and data analysis as they enable the grouping of similar data points based on their inherent characteristics.These algorithms aim to uncover patterns, structures, and relationships within datasets, without relying on pre-existing class labels or target variables.Prominent clustering algorithms like K-means, hierarchical clustering, DBSCAN, and Mean Shift offer distinct methods for partitioning data into meaningful clusters.By utilizing these algorithms, analysts and researchers can extract valuable insights from complex datasets, facilitating informed decision-making based on the identified patterns.The assessment of clustering algorithms is performed using the multicriteria decision making (MCDM) method.The report employs the EDAS rating system to assess six alternative clustering algorithms: "KM algorithm, EM algorithm, filtered clustering (FC), farthest-first (FF) algorithm, make density-based clustering (MD), and hierarchical clustering (HC)".These algorithms are compared against six clustering external measures to determine their performance.The Hierarchical Clustering (HC) algorithm exhibits the highest ASi value of 0.929506, indicating superior overall performance compared to the other algorithms.The Farthest-First (FF) Algorithm also demonstrates a relatively high ASi value of 0.753745, suggesting strong performance.Conversely, the Make Density-Based Clustering (MD) algorithm exhibits the lowest ASi value of 0.011219, indicating relatively weaker overall performance compared to the other algorithms.

Figure 1 .
Figure 1.The starting values of the external evaluation measuresFigure1showcases the initial values of external evaluation metrics for diverse clustering algorithms applied to a particular dataset.The metrics encompass "purity, entropy, Fowlkes-Mallows index (FMI), Rand index (RI), adjusted Rand index (ARI), and Jaccard coefficient".These values enable an initial assessment of the clustering performance for each algorithm, aiding in the comparison and understanding of their individual qualities.
) and NSNi (Normalized Sum of Negative Indices) values for different clustering algorithms.NSPi quantifies the normalized sum of positive performance deviations from the average, while NSNi quantifies the normalized sum of negative performance deviations from the average.These values provide insights into the relative performance of each algorithm based on their "positive and negative deviations from the average across the evaluated indices".For instance, the Farthest-First (FF) Algorithm exhibits high NSPi and NSNi values, indicating "substantial positive and negative deviations from the average performance", respectively.Conversely, the Make Density-Based Clustering algorithm displays relatively low NSPi and NSNi values, suggesting smaller deviations from the average performance.

Figure 2 .
Figure 2. NSPi and NSNi Figure 2 displays the NSPi (Normalized Sum of Positive Indices) and NSNi (Normalized Sum of Negative Indices) values for various clustering algorithms.NSPi measures the normalized sum of positive performance deviations from the average, while NSNi measures the normalized sum of negative performance deviations from the average.These values offer insights into the relative performance of each algorithm based on their "positive and negative deviations from the average across the evaluated indices".For example, the Farthest-First (FF) Algorithm exhibits high NSPi and NSNi values, indicating "significant positive and negative deviations from the average performance", respectively.On the other hand, the Make Density-Based Clustering algorithm shows relatively low NSPi and NSNi values, suggesting smaller deviations from the average performance.
various clustering algorithms applied to a specific dataset using the EDAS method.The ASi values represent the average summation of indices for each algorithm, indicating their overall performance.The Rank column shows the ranking of each algorithm based on their ASi values, with a lower rank indicating better performance.For instance, the Expectation Maximization (EM) algorithm has an ASi value of 0.048021 and is ranked 5th among the clustering algorithms.The Farthest-First (FF) Algorithm has an ASi value of 0.753745 and is ranked 2nd.The Filtered Clustering (FC) algorithm has an ASi value of 0.055173 and is ranked 4th.The Hierarchical Clustering (HC) algorithm has the highest ASi value of 0.929506 and is ranked 1st.The Make Density-Based Clustering (MD) algorithm has an ASi value of 0.011219 and is ranked 6th.Lastly, the K-Means Algorithm has an ASi value of 0.055376 and is ranked 3rd.These ASi values provide an assessment of each algorithm's overall performance, and the rankings offer a comparative analysis of their performance.Based on the result, we observe that the Hierarchical Clustering algorithm achieves the highest ASi value and is ranked first, indicating its superior performance compared to the other algorithms.

Figure 3 .
Figure 3. ASi The ASi (the final appraisal score) values depicted in Figure 3 provide an overall assessment of the performance of each algorithm, considering various evaluation metrics.Algorithms with higher ASi values indicate better overall performance, while lower values suggest relatively lower performance.The ASi values offer insights into the overall performance of different clustering algorithms.Notably, the Hierarchical Clustering (HC) algorithm exhibits the highest ASi value of 0.929506, indicating superior overall performance compared to the other algorithms.The Farthest-First (FF) Algorithm also demonstrates a relatively high ASi value of 0.753745, suggesting strong performance.Conversely, the Make Density-Based Clustering (MD) algorithm exhibits the lowest ASi

Figure 4
Figure 4 provides the rankings of various clustering algorithms based on their ASi (Average Summation of Index) values.A lower rank signifies better performance, with the topranked algorithm being the best performer.The Rank column offers a straightforward assessment of the relative performance of each algorithm based on their ASi values, enabling easy comparison and identification of the top-performing algorithms.The ranking order provides a representation of the relative performance, with the Hierarchical Clustering algorithm achieving the highest rank and the Make Density-Based Clustering algorithm obtaining the lowest rank.

Table 5
displays the Weighted PDA (Positive Distance from Average) values for various

Table 6
displays the Weighted NDA (Negative Distance from Average) values for various clustering algorithms applied to a specific dataset.The Weighted NDA metric takes into account the weighted negative deviations of each algorithm's performance from the average performance across the evaluated metrics.These Weighted NDA values offer an evaluation of each algorithm's performance by considering the weighted negative deviations from the average across the metrics.They provide insights into the relative performance of each algorithm based on these weighted deviations.

Table 7
presents the NSPi (Normalized Sum of Positive Indices

Table 8 .
Asi and Rank

Table 8
displays the ASi (the final appraisal score) and Rank values for