Efficient Feature Extraction for Recognition of Human Emotions through Facial Expressions Using Image Processing Algorithms

: Face emotion recognition is a challenging problem in computer vision that has been extensively studied in recent years. The project investigates the performance of Local Binary Pattern (LBP), Histogram of Oriented Gradients (HOG), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM) for face emotion recognition. The aim of this study is to evaluate the performance of different combinations of these techniques and to identify the most effective approach for this task. To achieve this, we first collected a dataset of facial expressions that includes seven basic emotions such as happy, sad, angry, surprise, Neutral, fear, and disgust. We then extract LBP and HOG features from the facial images and then KNN and SVM classifiers to classify the emotions. We experimented with various combinations of LBP, HOG, KNN, and SVM and evaluated the performance of each approach using metrics such as accuracy, precision, recall, and F1 score. This study demonstrates the effectiveness of combining LBP and HOG features with KNN and SVM for face emotion recognition. Our results suggest that SVM is the most effective model for this task, when it is combined with HOG features and can further improve the system39;s performance. The model can be implemented using MATLAB and GUI Interface. These findings have important implications for the development of accurate and reliable face emotion recognition systems for various applications, including human-computer interaction


INTRODUCTION
Facial expression recognition (FER) is the task of classifying the expressions on face images into various categories such as anger, fear, surprise, sadness, happiness and so on.FER has two important stages which are categorization and feature extraction. There are two forms of feature extraction: appearance-based and geometric-based. One of the crucial processes in which the aforementioned expressions, including grin, sad, angry, disgust, surprise, and fear, are classified is classification. Eyes, mouths, noses, eyebrows, and other facial features are included in the geometrically based feature extraction, while the exact part of the face is included in the appearance-based feature extraction [1]. Pre-processing, feature extraction, and classification are the three primary processes in this paper's main focus on various FER approaches. This research also examines the benefits of various FER strategies and their performance evaluation. The majority of the time, FER systems deal with issues like lighting, position, and skin tone changes.

RELATED WORK
In 2005, Caifeng Shan, Shaogang Gong and Peter W. McOwan et al. [2] have introduced a novel low-computation discriminative feature space for facial expression recognition capable of robust performance over a range of image resolutions. Their approach is based on the simple Local Binary Patterns (LBP) for representing salient micro-patterns of face images effectively and efficiently. The LBP features are robust to low-resolution images, which is critical in real-world applications where only low-resolution video input is available.
In 2012, S L Happy, Anjith George, Aurobinda Routray [3] have presented a facial expression classification algorithm is proposed which uses Haar classifier for face detection purpose, Local Binary Patterns(LBP) histogram of different block sizes of a face image as feature vectors and classifies various facial expressions using Principal Component Analysis (PCA). The algorithm is implemented in real time for expression classification since the computational complexity of the algorithm is small. It uses grayscale frontal face images of a person to classify six basic emotions namely happiness, sadness, disgust, fear, surprise and anger. Hua Gao, Anil Yuce, and Jean-Philippe Thiran [4] developed a new technique for a real-time non-intrusive monitoring system in 2014 that uses facial expression analysis to identify the driver's emotional states. The system recognizes disgust and rage as two basic, negative emotions that are associated to stress. They applied HAW and LD-based techniques to two sets of NIR-camera images taken from various angles. The suggested best system can accurately identify 90.5.
A face emotion identification method based on the Principal Component Analysis In 2016, Christopher Pramerdorfer, Martin Kampel [6] re-viewed existing CNNbased FER techniques and outlined their distinctions while empirically contrasting the different CNN architectures. They determine current bottlenecks and ways to boost FER performance on the basis of this. Finally, they demonstrate that recent deep CNNs may produce competitive results without the use of auxiliary data or face registration by empirically confirming that removing one such barrier significantly increases performance. A group of such CNNs achieves an accuracy score of 75.2 on the FER2013 test.
In 2019, Aditi Bhadane, Anuja Dixit, Vivek Ingle, Disha Shastri [7] Using Open CV libraries and the Haar Cascade Algorithm for face identification, the Facial Expression Recognition using Image Processing System presents a novel method for identifying facial expressions. The categorization model is trained and constructed using a machine learning technique found in the Dlib C++ library. It divides the photos into six types of common emotions using an SVM classifier. Classification is carried out using a batch of ten-image series, hence improving classification accuracy. Because the SVM classifier's accuracy is better than 90%, expressions can be distinguished with excellent precision.
In 2020 Huijun Zhang, Ling Feng, Ningyun Li, Zhanyu Jin and Lei Cao [8] have used a two-leveled stress detection network may be seen in the video's facial expressions and activity movements (TSDNet). The experiment results on the built dataset show that taking into account both facial expressions and action motions could improve detection accuracy and F1-Score of that considering only face or action method by over 7 percent. To evaluate the performance of TSDNet, they created a video dataset with 2092 labeled video clips.
In 2021 Seyed Muhammad Hossein Mousavi, S. Younes Mirinezhad [9] have proposed a database. When compared to other databases used for comparison, it has the advantages of having a large number of samples, covering both color and depth image types, and covering both FER and FMER tasks, all of which are necessary to correctly operate in the learning process of various classification algorithms. During the preprocessing stage, the Viola Jones algorithm [10] will be used to perform face detection and extraction tasks on colour images. [11] approach is used to extract faces from depth images. The new technique that has been provided is used for this phase. It is simple to extract features using the Histogram of Oriented Gradient (HOG) algorithm [12], and once color and depth image characteristics have been combined, it is time to utilize Lasso feature selection to condense the data for classification tasks. Finally, each expression is labeled and final matrix is ready for classification using Support Vector Machine(SVM) [13], Multi-Layer Neural Network (MLNN) [14] and Convolution Neural Network (CNN) [14] algorithms.
In 2021 Mohammad Failzal, Nikhil V, Prajwal C, Aruna Rao B P [15] propose a stress recognition algorithm using face images and face landmarks by using pi-camera for taking the input image. Experimental results show that the proposed algorithm recognizes stress more effectively. They used convolution neural network (CNN) for classification and training purpose, Har cascade algorithm for face detection, facial landmarks for checking eye and lips, LBPH for face detection.
In 2022 Swapna Subudhiray, Hemanta Kumar Palo, Niva Das [16] have proposed an investigation into the potential of three extracted facial features for improved facial emotion recognition using a straightforward k-nearest neighbor (KNN) classifier. Local binary pattern, Gabor, and histogram of oriented gradient (HOG) are the feature extraction methods employed (LBP). Performance metrics including precision, recall, kappa coefficient, average recognition accuracy, overall recognition accuracy, and calculation time have all been compared. Although though Gabor's computations were the longest to complete, their average accuracy of 94.8% was the greatest of all. While Gabor's computation was longer than LBP's, HOG's computation took the shortest amount of time and exhibited a minimum average accuracy of 55.2%.

Data Acquisition
In Data acquisition is a process of collecting data from different datasets. We are using different type of datasets like JAFFE, MUG and KDEF. These datasets are having large number of images. So, using large datasets in training help us to increase the accuracy of the methods.

Preprocessing
In Pre-processing is a step which can be used to rise the FER system performance and it can be done before feature extraction process. Image pre-processing includes different types of methods such as image clarity and scaling, contrast adjustment, and additional enhancement processes to improve the expression frames.

Feature Extraction
Certain features are extracted from the preprocessed images using feature extraction techniques. The dimensions of the data are reduced and the temporal complexity is decreased by feature extraction. The Histogram of Oriented Gradients (HOG) and Local Binary Patterns (LBP) are two feature extraction techniques that we apply in our project (HOG).
LBP is an extremely well-liked, effective, and straightforward texture descriptor that is applied to numerous computer vision issues [17]. By employing a straightforward thresholding technique where the intensities of the nearby pixels are compared with those of the centre pixel, it can capture the spatial pattern as well as the grayscale contrast [18]. With a 3x3 window, the fundamental LBP operation is expressed and illustrated.
Where If in ic> 0, then s(in-ic) = 1, else s(in-ic) = 0. In this example, ic stands for the intensity of the central pixel (xc, yc), and in stands for the grey values of the eight closed pixels.HOG technique is considered here as it focuses on both local and global facial expression attributes in different dimensions and orientations. Unless an object's shape is constant, the features are sensitive to shape fluctuations [19]. Nine bin histograms are used in this piece of work to describe the intensity and direction of edges using 4x4 cells to represent each patch. To extract the necessary feature vector, these features from each active facial patch are added [20]. The gradient is calculated using the HOG techniques for the pixel I(r, c).
Magnitude and angle of each pixel is calculated using the formulae mentioned below,

Classification
After the necessary feature has been retrieved, these features must be categorized into relevant groupings. Many categorization techniques exist, but in our study we employed Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). An algorithm called a support vector machine uses the creation of a hyper plane to distinguish between two classes. On either side of the plane were these two classrooms. The hyper plane is also known as a support vector since it separates these two classes. Both linear and non-linear SVMs are used to categorize the retrieved feature; non-linear SVM is typically utilized. The SVM algorithm can categorize a new batch of data when supplied after being trained.

Fig.2. Process for SVM classification
The feature vector is assigned to the closest class among the k neighbors in a KNN, which is an unsupervised nearest neighbor classifier. Based on the Euclidian distance, these characteristics are allocated. Two factors have a significant impact on the performance of this method. The big classes will outnumber the tiny classes if the value of the neighbor is high. The benefit of KNN is not visible if the value of k is too small.

Simulation Results
To calculate the % of accuracy for every combination of methods and each dataset, confusion matrix is used. Below tables represent the % of accuracy for four different combinations of methods, different cell size and for different datasets respectively.

Conclusion
In this paper, This paper concludes that, from above results HOG and SVM provides 100% accurate results for different datasets. As we know FER is important in many real time applications, the above system provides efficient results. We can use LBP and HOG combined method to increase efficiency. But, it takes more time for large datasets. In future, the above limitation also can overcome to provide more efficient system for FER.