Enhancing Sustainable Rice Grain Quality Analysis with Efficient SVM Optimization Using Genetic Algorithm

. The most extensively cultivated and indispensable crops in agriculture, ensuring sustainable rice cultivation practices is crucial, and accurately determining rice grain quality is a critical component of this effort. To achieve this goal, researchers have developed a novel technique that combines Support Vector Machine classification with Genetic Algorithm optimization. Using this technique, they were able to categorize rice grain quality with 92.81% accuracy, which improved to 93.31% after optimizing feature weights and classification configurations using the Genetic Algorithm. Compared to other classification algorithms, SVM showed the highest accuracy value, while k-NN had the lowest accuracy of 88.32%. The study's results suggest that the combination of SVM classification with Genetic Algorithm optimization is an effective method for accurately analyzing rice grain quality. Furthermore, the SVM-based method outperformed other commonly used classification algorithms in terms of accuracy. This study's findings could be valuable in promoting sustainable rice cultivation practices by improving the accuracy and efficiency of rice grain quality analysis and enhancing the overall productivity of the rice cultivation process. To further improve the classification accuracy of rice grain quality, the researchers employed the Genetic Algorithm optimization method to refine the feature weights and classification configurations.


Introduction
In agriculture, rice is the most widely used and unavoidable staple food in the world [1].One of the staple foods in many countries, especially on the Asian continent, is rice.In Indonesia, rice is also the most consumed carbohydrate food, followed by corn, cassava and sweet potatoes [2].In the list of countries that produce the most rice in the world according to the United States (US) Ministry of Agriculture (USDA) and Statistics, Indonesia is in third place.It is estimated that Indonesia's total rice production in 2022 is 55.67 million tons of GKG, up 1.25 million tons (2.31 percent) from the estimated total production in 2021 of 54.42 million tons of GKG.In 2021 and 2022, the largest rice production will occur in March.Meanwhile, the lowest rice production in 2021 will occur in December, and the lowest in 2022 will occur in January.Paddy production in March 2022 was 9.54 million tons of GKG, while rice production in January 2022 was 2.46 million tons of GKG.East Java, West Java and Central Java are the three provinces with the largest overall rice production (GKG) in 2022.Meanwhile, West Papua, DKI Jakarta and the Riau Archipelago are the three provinces with the lowest rice production [3].The selection of rice to be consumed to meet daily needs depends on consumer preferences.Due to the many variables that affect the quality of rice grains, it is necessary to identify the quality of harvested rice [4].Grouping or analyzing data on a large scale requires a data mining algorithm for grouping [5], [6] Data Mining is the process of searching for hidden patterns in a data set in a data warehouse or database based on previously undiscovered knowledge.Data mining offers capabilities, including the ability to describe how analysis finds patterns and processes in data [7].Machine Learning is a field of computer science and artificial intelligence (AI) that focuses on using data and algorithms to simulate the way people learn and can gradually improve their accuracy [8], [9].The most effective method for determining grain quality of rice is classification.Classification divides data into distinct classes or groups.The process of finding new types of data that can be included in new groups and distinguishing classes or groups of data so that it can be used to make decisions, find out what class of object is being studied, and find out class labels that are not yet known [10].Many algorithms for classification include the Support Vector Machine (SVM), which is a method used for the classification process with fairly accurate results [11].
Optimization is done by using feature selection to improve the accuracy of the results obtained.Feature selection is a filter method to select features based on their performance.This method will select the best features to be identified so that they can be used for the classification process, besides that this method places ratings on individual features or on subsets [12].Feature selection can be done univariate or multivariate.Univariate, namely selecting features based on one feature at a time, while multivariate, namely considering a subset of features at once [13].Genetic Algorithms are a component of Evolutionary Algorithms, using the process of natural selection, also known as evolution [14].Genetic Algorithms (GA) are commonly used to improve problem solving including closest search, decision making systems, and sophisticated approaches.The benefits of genetic algorithms are their adaptability and capacity to solve difficult problems [15].Many studies have used data mining techniques to determine the quality of rice grains.Some of these studies include the Intelligent Two-Stage DarkNet-SqueezeNet Architecture-Based Framework for Identification of Multiclass Rice Grain Varieties [16], Quality Identification of White Rice (Oryza sativa L.) Based on Amylose and Amylopectin Content in Traditional Markets and "Selepan" Salatiga City [17], and Image Edge Detection to Determine Rice Quality Based on Type Using the Laplacian of Gaussian Method [18].
Numerous studies have been conducted to evaluate the effectiveness of the Support Vector Machine (SVM) classifier by comparing it with other classification techniques.However, SVM's limitation of dividing data into only two categories has been a challenge.SVM uses high-dimensional feature spaces with linear functions to operate with learning systems that use hypothetical spaces, but this approach may not be optimal for all scenarios.In light of this limitation, researchers have proposed a new approach to identify optimal feature weight sets and classification configurations using Genetic Algorithms.To this end, a hybrid model combining the SVM classification algorithm and the Genetic Algorithm has been developed.This innovative model has been applied to classify rice grains' quality, and optimization using the Genetic Algorithm has been carried out to enhance the accuracy of the results.Overall, this research presents a distinctive approach to tackle the limitations of the SVM method and improve its accuracy using the Genetic Algorithm.By integrating the strengths of both techniques, the proposed model provides a promising solution .

Methodology
In this study, the Support Vector Machine classification method and Genetic Algorithm are used for optimization.Figure 1  Specifically, the following algorithms were employed: k-NN, Decision Tree, Random Forest, SVM, and Neural Network.This process enabled the researchers to evaluate the performance of each algorithm and identify the most effective one for the given task. 5. Model Support Vector Machine.The fifth stage of the study involved implementing the Support Vector Machine (SVM) model [19], [20].This approach utilizes learning algorithms that rely on the principle of optimality and employs hypotheses consisting of linear functions in large-scale features.In this study, SVM was applied to classify the "Rice Cammeo and Osmancik" dataset.By evaluating the performance of five algorithms, including SVM, this study aimed to identify the most effective approach for achieving optimal classification results.This approach was compared to several other optimization models to determine its effectiveness in increasing the accuracy value.The results showed that the Selection Optimization (Evolutionary) model delivered the highest increase in accuracy and was therefore deemed the most effective approach.8. Feature Selection.The feature selection stage in this study uses the Genetic Algorithm method.The genetic algorithm produces a population consisting of many individuals that develop according to certain selection criteria that provide conditions and values for optimization.9. Modeling.During the modeling stage, the Support Vector Machine (SVM) algorithm optimized using Selection Evolutionary Optimization was found to deliver the highest accuracy and outperform the other algorithms.This highlights the effectiveness of the proposed approach and its ability to enhance the performance of the SVM classifier in the given task.10.Evaluation.Calculation of the performance of a model is needed to find out whether the model is correct or not in the classification process.In this study, four basic metrics (accuracy, precision, recall, F-Score) will be examined to find out the differences in machine learning based algorithms.11.The proposed method.The eleventh stage of the study involved presenting the proposed method.This stage aimed to elucidate the performance of the selected approach and provide a detailed description of the proposed method.The processed data was used to create the proposed approach, and the current model was used to evaluate the processing results.A flow diagram of the proposed method is shown in Figure 2, illustrating the different stages involved in the approach and their interrelationships.
The study involved a comprehensive approach to analyzing and classifying the Rice Cammeo and Osmancik dataset.The initial stages of the study involved data collection and division into training and test data, followed by algorithm comparison to identify the optimal model.To further improve the model's accuracy, an optimization algorithm using the Genetic Algorithm was applied in conjunction with SVM classification.By integrating different techniques and algorithms, the study produced a novel and effective approach to rice grain classification.The results demonstrated that the SVM algorithm, optimized using the Genetic Algorithm, delivered the best accuracy compared to other algorithms.This approach can be used to optimize the accuracy of other classification models in various fields, making it a valuable contribution to the field of data science and machine learning.To determine the circumference of a rice grain, this calculation involves measuring the distance between pixels along the boundary of the grain.

MajorAxis Length
The length of the longest line that can be drawn on a rice grain, which corresponds to the distance between its principal axes, is determined.

MinorAxis Length
The shortest line that can be drawn on a rice grain, which corresponds to the minor axis distance, is determined.

Eccentricity
This measurement evaluates the roundness of an ellipse that has the same moment as a rice grain.
6 ConvexArea This function calculates the pixel count of the smallest convex shell surrounding the region occupied by the rice grains.

Extent
This function computes the ratio of the area covered by the rice grains to the pixel count of the bounding box that encloses them

Result and Discussions
The experimental results of the classification process utilizing the SVM algorithm yielded an accuracy rate of 93.31%.In Table 2 the results of classification using other algorithms such as the Decision Tree, k-NN, Neural Network, and Random Forest are presented.It is evident from the table that the SVM algorithm exhibits the highest accuracy value when compared to other classification algorithms.Conversely, the k-NN algorithm showed the lowest accuracy value of 88.32%.A graph comparing the performance of the different algorithms is depicted in Figure 2.   Confusion Matrix in the form of results with values for accuracy, precision, recall, and F1-score.The accuracy value indicates how accurately all data has been categorized.The higher the accuracy value, the better and more accurate the classification model.To calculate accuracy, precision, recall, and F1-score, can use the formula.
× 100% = × 100% = 0,93595608 × 100% = 93,85% . × 100% 1 = 0,915796 = 91,57% The SVM algorithm model achieved an AUC value of 0.979, indicating a good classification based on the test results.The performance of the algorithm is visualized using the ROC-AUC curve, as shown in Figure 3.The ROC curve represents the relationship between testing data and predictive data, and the Area Under the Curve (AUC) value can be obtained by calculating the area under the curve.In this case, the AUC value obtained was 0.979, indicating a relatively good classification result.However, the classification results obtained by the SVM algorithm were not yet optimal, and therefore feature optimization was necessary.To this end, a comparison of feature optimization was conducted using the optimize selection and optimize weight features, as presented in Table 4 and Table 5.   4 demonstrates that both optimize selection and optimize weight were successful in improving the accuracy of the SVM algorithm.Notably, the evolutionary method utilized in optimize selection produced the highest increase in accuracy, with a value of 93.31%.Following result, the performance of the SVM algorithm optimized using the optimize selection (evolutionary) feature was tested, and data validation was conducted using split validation with a split ratio ranging from 0.5 to 0.9.RapidMiner was used to obtain the accuracy and AUC values for each ratio.Table 4 presents the results of the performance testing for the SVM algorithm optimized using the optimize selection feature with the evolutionary method.After testing the performance of the SVM algorithm, Table 6 and Figure 4 show the results of the classification before optimization and after optimization.After testing the performance of the SVM algorithm, Table 7 and Figure 4 show the results of the classification before optimization and after optimization.The results of the study demonstrate that the use of feature optimization in the classification algorithm significantly improves the accuracy of the classification results.Based on the analysis of the Rice Cammeo and Osmancik dataset, the SVM algorithm optimized using the Genetic Algorithm feature produced a high accuracy value of 93.31%.This suggests that the optimized SVM algorithm could be utilized by farmers to accurately classify rice grain quality.Moreover, the study findings could serve as a reference for future research seeking to implement this approach into a program.

Conclusions
In research conducted on the Rice Cammeo and Osmancik datasets, the SVM algorithm classification method has been used to classify the quality of rice grains by obtaining an accuracy value of 92.81%.However, the researchers did not stop there.Researchers perform feature optimization using a genetic algorithm to improve the performance of the SVM algorithm in classifying.The results were very impressive, the accuracy value for the classification of rice grain quality increased significantly to 96.33%.This shows that the use of genetic algorithms as a tool for selecting the right features can have a positive impact on improving the performance of the classification algorithm on the dataset used.Thus, this research makes an important contribution in the development of classification methods in rice data analysis.

Fig. 1 .
Fig. 1.Algorithm Comparison Chart illustrates the various stages or steps carried out in this study.1. Problem Identification.The first stage of the problem-solving process involves identifying the problem and proposing a viable solution to address it.2. Data Collection.The second stage of the study involves data collection, for which a dataset containing 3810 records has been utilized.This dataset is available on the UCI Repository website and comprises eight attributes, including Area, Perimeter, MajorAxisLength, MinorAxisLength, Eccentricity, ConvexArea and Extent shown in Table 1. 3. Data Validation.In the third stage, data validation is performed by dividing the dataset into training and testing data using Cross Validation and Split Validation techniques.Cross Validation is utilized to determine the best algorithm performance, whereas Split Validation is employed to evaluate a specific algorithm.4. Algorithm Comparison.During the fourth stage of the study, algorithm comparison was conducted by testing the research data with multiple algorithms.

Fig. 2 .
Fig. 2. Comparison ResultUpon comparing the different algorithms, it was determined that the SVM algorithm achieved the highest accuracy and AUC values compared to other methods, with values of 92.81% and 0.979, respectively.The confusion matrix generated by the random forest classification model is presented in Table3and provides a visual representation of the performance of the mode.

Table 1 .
Attributes of Dataset

Table 3
and provides a visual representation of the performance of the mode.

Table 2 .
Comparison of Algorithm

Table 5 .
Optimize Weight + Cross Validation

Table 7 .
Accuracy of SVM