Comparison of the K-Means method with and without Principal Component Analysis (PCA) in predicting employee resignation

Iwan Binanto; Andrianto Tumanggor

doi:10.1051/e3sconf/202447502009

All issues

Volume 475 (2024)

E3S Web Conf., 475 (2024) 02009

Abstract

Open Access

Issue		E3S Web Conf. Volume 475, 2024 InCASST 2023 - The 1^st International Conference on Applied Sciences and Smart Technologies


Article Number		02009
Number of page(s)		10
Section		Environmental Impact Assessment and Management
DOI		https://doi.org/10.1051/e3sconf/202447502009
Published online		08 January 2024

E3S Web of Conferences 475, 02009 (2024)

Comparison of the K-Means method with and without Principal Component Analysis (PCA) in predicting employee resignation

Iwan Binanto^* and Andrianto Tumanggor

Informatics Department, Faculty of Science and Technology, Sanata Dharma University, Yogyakarta, Indonesia

^* Corresponding author: iwan@usd.ac.id

Abstract

Employees are individuals who work for a company or organization and receive a salary. Employees are the most important assets that need to be effectively managed by the company in order to maximize their contribution. However, many employees feel dissatisfied with the outcomes of their contributions to the company, as they do not receive the expected rewards. This study utilizes a dataset from Kaggle.com, consisting of a total of 14,999 data rows with 10 attributes. In the first experiment, the dataset was reduced using PCA before applying the K-means clustering method. In the second experiment, the dataset is directly fed into the K-means clustering method without PCA. To evaluate the clusters in the K-means method, this study applies the sum of squared error (SSE) method and the silhouette coefficient method to determine the optimal clusters. The study concludes that there are two dominant factors, last_evaluation and average_monthly_hours, that contribute to employees resigning from a company. The SSE evaluation indicates that both methods have an elbow point at 3 clusters, suggesting that dividing the data into more than 3 clusters does not provide significant additional information. The silhouette coefficient evaluation shows that K-means without PCA obtain the best silhouette coefficient value of 0.5674, while K-means with PCA obtain a silhouette coefficient value of 0.5491. Although K-means with PCA have the advantage of reducing the dimensionality of the dataset, they have a longer execution time compared to K-means without PCA, with an execution time of 181.53 seconds for K-means with PCA and 95.84 seconds for K-means without PCA.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.