Enhancing Predictive Accuracy: Assessing the Effectiveness of SVM in Predicting Medical Student Performance

. The high cost of pursuing a medical education necessitates effectively monitoring and evaluating medical students' performance. This study aimed to develop and evaluate a prediction system for medical students’ national exam scores using the Support Vector Machine (SVM) algorithm. The dataset consisted of grades from first and second-year medical students at Muhammadiyah University of Yogyakarta, specifically from the 2014 and 2015 classes, to predict the final year exam score. The methodology involved data acquisition, data preprocessing, and classification and prediction of student performance. Remarkably, the SVM model achieved an accuracy rate of 95.48%. The findings highlight the substantial potential of SVM for accurately predicting medical student performance. The prediction system can enable educational institutions to proactively identify students needing additional support or intervention. This early intervention can help improve academic progress and enhance the overall quality of medical education. Future research efforts should focus on improving the prediction system's practicality and effectiveness by incorporating additional factors. This study successfully developed and evaluated a prediction system for medical student performance using the SVM algorithm. The high accuracy achieved by the SVM model emphasises its potential as a valuable tool for medical education institutions. By leveraging machine learning, educational institutions can provide targeted support to students, leading to improved learning outcomes and advancements in medical education.


Introduction
The escalating costs of undergraduate medical education have raised considerable concerns regarding student attrition and its implications.In the Indonesian context, it has been observed that regular students bear a substantial financial burden, with an estimated allocation of approximately 100 million Indonesian rupiahs or around seven thousand U.S. dollars for their first year of enrollment in esteemed public medical institutions [1].These findings highlight the significant economic challenges of pursuing medical education in Indonesia and underscore the imperative for comprehensive interventions to address affordability issues.Such measures are crucial to prevent the squandering of investments in students who still need to complete their medical degrees.
While the dropout rates in medical schools typically range from 15.7% to 18.4% for a standard four-year program [2], a report by Ken Research on Indonesian medical students in 2020 revealed an approximate 88% passing rate for the Physician course [3].These findings indicate a significant number of medical students who need help to fulfil their goal of becoming doctors, resulting in a waste of resources.Early identification of potential dropouts is crucial to mitigate the negative consequences associated with non-completion.Considering the substantial costs involved, monitoring and evaluating the performance of medical students becomes imperative to prevent the financial burden associated with academic failure.
Numerous options are available when selecting a Machine Learning algorithm for classification prediction.One widely utilised algorithm in Supervised Learning is the Support Vector Machine (SVM).SVM aims to establish an optimal decision boundary, a hyperplane, that effectively separates n-dimensional space into distinct classes.This enables accurate classification of new data points in the future.The algorithm identifies support vectors, which are extreme points, to define the hyperplane.Hence, it is commonly known as Support Vector Machine [4].
Support Vector Machine (SVM) has demonstrated/ versatility and accuracy across various applications.In stock prediction, SVM achieved accuracies ranging from 60% to 70% [4].In sentiment analysis and text classification, SVM outperformed algorithms like Naive Bayes, Decision Trees, and Random Forests, exhibiting higher accuracy, precision, recall, and F1 score [5].For engineering purposes, the application of Least Squares Support Vector Machines (LS-SVM) successfully predicted the critical flashover voltage of polluted insulators, offering valuable insights for electrical transmission systems [6].By incorporating the Genetic Algorithm (GA), SVM performance was further enhanced, surpassing classical SVM models in predicting solar radiation parameters [7].However, in sentiment analysis, Novel BERT outperformed SVM, achieving an average accuracy of 83.5% compared to SVM's average accuracy of 75.3%.BERT's automatic feature selection demonstrated superior performance in detecting sentiment in the IMDB movie dataset [8].Similarly, in customer product review classification, Recurrent Neural Network (RNN) outperformed SVM with an accuracy of 94.86%, surpassing SVM's accuracy of 86.67% [9].In the medical field, SVM has proven to be a valuable tool for predicting breast cancer subtypes using diverse data types in systems science, showcasing its potential in biomedical, bioengineering, and clinical applications [10].Considering the wide range of applications and acceptable accuracy, this study aims to develop and evaluate the classification of medical student performance using SVM.
Due to its robustness and comprehensive utilisation in classification, recognition, and generalisation tasks, Support Vector Machine (SVM) has consistently demonstrated superior accuracy and efficient training time in numerous studies [5], [11].Previous research has always shown SVM's superiority over alternative algorithms [12], [13], achieving accuracies ranging from 69.87% to 74.04% across various classification tasks [14].Notably, SVM has even achieved exceptional accuracy rates of up to 99.18% when combined with Educational Data Mining (EDM) techniques [12].Based on these compelling findings, SVM has been selected as one of the algorithms for our testing process.It consistently outperforms other algorithms in previous studies [12], exhibiting accuracy rates between 69.87% and 74.04% for different classification tasks [14] and, in certain instances, reaching a remarkable accuracy of 99.18% [12].Therefore, the Support Vector Machine (SVM) algorithm has been chosen as the preferred approach for predicting medical student performance.

Method
The methodology employed in this study encompasses three main stages: data acquisition, data pre-processing, and student classification prediction.Each step will be described in detail and illustrated in Figure 1, which presents the flowchart of the process.The detailed steps in each stage, as illustrated in Figure 1, will provide a comprehensive framework for conducting the research and analysing the results.

Data acquisition
The dataset utilised in this study was sourced from the academic records of medical students enrolled at Muhammadiyah University of Yogyakarta during the academic years 2014 and 2015, specifically focusing on their first and second years of studies.The dataset consists of 287 records, each containing all grades from the first and second year and the national examination score achieved.A total of 257 passed, with 30 records failing in the final examination.

Data pre-processing
Following the data acquisition, the collected dataset underwent thorough pre-processing to address the issue of imbalanced data.To handle this challenge, the Synthetic Minority Over-sampling Technique (SMOTE) was employed.SMOTE is a widely recognised method utilised specifically for imbalanced datasets.It operates by generating synthetic samples of the minority class through interpolation between existing samples.This technique effectively balances the distribution of courses within the dataset, thereby enhancing the performance of machine-learning models on unbalanced student data [15].By applying SMOTE, the imbalanced nature of the dataset was addressed, ensuring a more reliable and representative training dataset for subsequent analysis and prediction tasks.As a result, both groups of successful and failed dataset students each have 257 records.
To handle the imbalance, the first step is to calculate the imbalance ratio [16] (I.R.) as shown in equation (1).
Here, I. R. represents the imbalance ratio, f(Smajor) denotes the frequency of the significant class (C1), and f(Sminor) represents the frequency of the minor course (C2).Based on the calculated imbalance ratio, appropriate sampling techniques can be employed.If the imbalance ratio is low, under-sampling can be utilised by reducing the number of samples from the primary class (C1).On the other hand, if the imbalance ratio is high, undersampling should be avoided to prevent data loss.In such cases, oversampling techniques are applied to synthesise additional samples of the minority class (C2) to match the number of representatives from the majority class (C1).
By considering the imbalance ratio, selecting undersampling or oversampling techniques can effectively address the class imbalance issue in the dataset, ensuring a more balanced representation of the data for subsequent analysis and modelling tasks.

Classification and prediction
In this study, the Classification and Prediction process utilised the Support Vector Machine (SVM) algorithm, a well-established supervised learning technique renowned for its effectiveness in classification tasks within machine learning.By constructing an optimal hyperplane, SVM effectively divides the dataset into distinct classes, such as "pass" and "failed," based on the medical final exam outcomes.The SVM algorithm was employed to classify and predict the data under investigation.Subsequently, the performance of the SVM model was evaluated, and the products are presented in the results section of this study.
SVM is a directed learning algorithm used for classification or regression tasks within machine learning.However, the SVM is predominantly employed for classification.The SVM is entirely founded on creating a hyperplane that optimally divides the data set into two classes.The results of the SVM model are shown in Table 2, with an accuracy rate of 95.48%.

Result and discussion
This paper aims to evaluate the efficacy of the SVM method in predicting student performance.If the predictions demonstrate higher accuracy, our proposed approach can be employed to enhance the forecasting of student performance.The SVM method will be evaluated using diverse metrics, including F1 Score, Recall, Precision, and Accuracy.These metrics will be computed using the values of true positive (TP), true negative (TN), false positive (FP), and false negative (FN), as formulated in equations ( 2) to (5).

Precision=
TP TP+FP (4) Accuracy= TP+TN TP+TN+FP+FN (5) These metrics provide a comprehensive assessment of the SVM method's performance in predicting student performance.By evaluating these metrics, we can determine the precision, recall, F1 Score, and overall accuracy of the SVM model, thereby quantifying its effectiveness in forecasting student performance accurately.
After applying SMOTE, the dataset was divided into two parts with a ratio of 70:30, resulting in training and test datasets.The training dataset was utilised to train the Support Vector Machine (SVM) model to obtain the highest F1 score through the grid search cross-validation (CV) method.The hyperparameters used for the grid search CV were C = [0.01,0.1, 1, 10, 100], gamma = 'scale,' and Radial Basis Function (RBF) kernel.The grid search CV method employed k-fold crossvalidation, a model performance evaluation technique involving partitioning the data into multiple "folds" for alternating training and testing, with a total of 5 folds.Consequently, the grid search CV yielded the highest F1 score of 0.9605 with C = 10.Subsequently, this model was employed to predict the test dataset, generating an evaluation matrix table.
The prediction results are displayed in Table 2, provided in the following section.Based on the information provided in Table 2, it can be concluded that the SVM method exhibited strong performance in predicting student performance.The model achieved an F1 Score of 0.9565, indicating a good balance between precision and recall.The Recall value 1.0 suggests that the SVM model successfully identified all positive instances.Additionally, the Precision value of 0.9166 indicates a high accuracy in predicting positive samples.From the confusion matrix table, the SVM model can correctly predict students who pass with an accuracy of 92.68% and successfully predict students who do not pass with 100% accuracy.
The overall Accuracy of 0.9548 further supports the effectiveness of the SVM method in accurately forecasting student performance.These results imply that the SVM algorithm can be a reliable tool for predicting student performance, offering valuable insights, and aiding decision-making processes in educational settings.
The SVM algorithm demonstrates significant potential in medical education by accurately predicting student performance, enabling timely interventions and enhancing academic progress.Future research should further explore additional factors to improve the prediction system's effectiveness.The study emphasises the SVM method's reliability in predicting student performance, contributing to improved educational outcomes in medical education.

Conclusion
This study assessed the effectiveness of the Support Vector Machine (SVM) method in predicting student performance in medical education.The dataset utilised in this study encompassed the grades of 1st and 2nd-year medical students from the 2014 and 2015 classes at Muhammadiyah University of Yogyakarta.The results presented in Table 2 demonstrate that the SVM model exhibited strong performance, with an F1 Score of 0.9565, Recall of 1.0, Precision of 0.9166, and an overall Accuracy of 0.9548.These findings highlight the significant potential of the SVM algorithm as a reliable tool for forecasting student performance.By accurately predicting student outcomes, educational institutions can proactively identify students needing additional support or intervention, leading to improved academic progress and overall quality of medical education.The results affirm the efficacy of the SVM method in predicting student performance, providing valuable insights for educational decision-making.The findings contribute to the advancement of medical education by emphasising the potential benefits of utilising machine learning algorithms like SVM in improving student outcomes and the overall quality of medical education.Future research endeavours have the potential to enhance the prediction system by incorporating additional factors, thereby improving its applicability in real-world scenarios.While the SVM method has demonstrated strong performance in predicting student performance, including supplementary variables can provide a more comprehensive and holistic evaluation.

Table 1 .
Matrix evaluation