Feasible LSTM Model for Detection of Sign and Body Language

. The use of sign and body language facilitates communication between hearing individuals and those with hearing loss. They play an important role in facilitating effective human-human and human-computer interactions. It is a visual language composed of hand gestures and facial expressions to identify the meaning conveyed by the signer. Body language detection is the process of analysing the nonverbal cues and gestures used in communication to understand the emotional state and intentions of the speaker. Deep learning methods have recently demonstrated promising results in a range of computer vision tasks, including gesture detection. Using a media pipe holistically, extraction of essential points from the body, hands, and face is possible. TensorFlow and Keras are further utilized to construct Long Short-Term Memory LSTM feasible models that can predict on-screen behaviour presenting an innovative approach for sign and body language detection using LSTM.


Introduction
Sign and body language detection pertains to the capacity to identify and comprehend the gestures and movements of individuals who communicate through sign language or nonverbal means.The significance of this technology lies in its ability to facilitate communication between individuals belonging to hearing and non-hearing communities.Sign language detection is a process that typically entails the use of cameras and computer vision algorithms to examine and interpret the gestures of the hands, face, and body of the individual performing the sign language [7].The detection of body language is a process that involves the analysis of nonverbal cues.These cues include facial expressions, posture, and gestures.The purpose of this analysis is to infer an individual's emotional state or intention.

Literature survey
The following section illustrates the summary of existing approaches [1].The aim of this approach was to assess the viability of using sensor gloves for the purpose of sign language recognition.This approach employed a methodology that involved the use of gloves equipped with sensors.These sensors were capable of capturing and transmitting the intricate movements of the hands to a computer or other electronic device.The collected data was subsequently subjected to analysis and interpretation.The findings of the study indicate that the system under investigation attained a precision rate of 88% when evaluated against a test dataset.The drawbacks of the present approach are as follows: 1.The inability of sensor gloves to accurately detect and interpret a wide variety of signs may limit their usefulness as a communication tool for people who need to convey complex messages.2. The use of high-grade data gloves may pose a financial challenge, thereby restricting their availability to certain user groups.Data Gloves for Sign Language Recognition System using flex sensors, 3axisaccelerometer, ARM& microcontroller (LCPC2138) [2].The aim of this approach was to design and implement a system capable of effectively identifying and converting sign language gestures into either textual or spoken output with a high degree of accuracy and in real-time.This method employs a dataset comprising sensor data obtained from flex sensors and a 3-axis accelerometer.A study conducted by S. H. Yang et al. in 2016 aimed to develop a system that utilised a glove equipped with flex sensors and a 3-axis accelerometer.The glove was connected to an ARM7microcontroller, which served as the main processing unit for the system.In this study, the researchers employed a Hidden Markov Model (HMM) as a classification method for hand gestures.The study presents an approach that relies on the utilisation of a motion capture system to record the threedimensional (3D) coordinates of markers affixed to the body of the sign language user [3].A custom-built marker set consisting of 25 markers placed on the signer's face, hands, and body is used.The limitations of the current method are, 1.The accuracy of motion capture systems can be compromised by environmental factors such as lighting and background noise, which can result in incomplete or inaccurate data.2. The use of 3D motion capture technology in sign language recognition has been observed to encounter difficulties in accurately identifying non-manual markers, specifically facial expressions, which are a crucial component of the language.The purpose of this paper is to present [4] a suggested system for the use of computer vision techniques in the recognition and interpretation of sign language gestures.The present study introduces a potentially effective method for the identification of American Sign Language (ASL) gestures.The current approach comes with the following drawbacks: 1.The effectiveness of vision-based approaches in sign language recognition is dependent upon the precise tracking of the signer's hands and body.However, this can prove to be challenging, particularly in instances where the signer is exhibiting rapid or unpredictable movements.

The accuracy of vision-based recognition systems can be influenced by various
environmental factors, including lighting, shadows, glare, and background noise.The above factors may significantly affect how well these systems perform.This can make it challenging for the system to correctly identify indicators.This study [5] aimed to conduct a comparative analysis of two distinct hand features in the context of sign language recognition through the use of a Support Vector Machine (SVM).This approach reports on the successful implementation of a system that demonstrates reliable real-time recognition of vowels.The real-time recognition of signs by vision-based PSL recognition systems necessitates a significant amount of processing capacity.The system may become resource-intensive and slow as a result, which could restrict its usefulness.The recognition of signs in vision-based PSL systems is restricted by the limited vocabulary of the system, thereby restricting the number of signs that can be recognised.Individuals with a need for a broader lexicon may encounter an issue with this matter.
The aim of this study was to detect various sign language signs without any background interference using a small light source [6] and [8].In this study, we investigate the application of transfer learning to the task of object detection using a pre-trained SSD MobileNet V2 architecture.Specifically, we train the architecture on a dataset that is distinct from the original dataset on which it was pre-trained.[9] The aim of this approach is to leverage the knowledge gained from the pre-training phase and apply it to the new task, thereby improving the performance of the model.The present approach carries with it the subsequent drawbacks: 1. Convolutional Neural Networks (CNNs) have been observed to exhibit high computational complexity, necessitating substantial processing resources and time, thereby rendering them in feasible for certain use cases.2. The precise detection and recognition of signals in sign language may require CNNbased systems to adhere to specific camera angles and distances.In practical scenarios where individuals may be required to sign documents from varying angles and distances, this can pose a limiting factor.The American Sign Language Lexicon Video Dataset (ASLLVD) used by [7] here contained videos of 211 signs from the American Sign Language (ASL) lexicon performed by 18 native ASL signers.The computational cost of the HOG feature extraction technique and ANN classifier can be substantial, necessitating considerable processing resources and time, which may render them unsuitable for certain use cases.
Author [10] explored the application of Transfer Learning (TL) in automated medical image analysis, highlighting its effectiveness in various tasks.TL models like AlexNet, ResNet, VGGNet, and GoogleNet prove valuable for enhancing medical image analysis.Authors [11] presented data-driven prediction techniques using ARIMA and LSTM to forecast COVID-19 cases and deaths.It uses statistical measures to assess accuracy and aims to assist several countries in managing the pandemic.Authors [12] discussed the importance of quick COVID-19 detection using Chest X-ray images and presented a DL method achieving 99% accuracy in binary classification of COVID-19 cases using opensource datasets.Authors [13] highlighted the significance of ML in prediction, pattern recognition and error reduction across diverse fields, emphasizing the impact of AI in broad domain.Authors [14] discussed the use of Scanning Electron Microscopy (SEM) for material characterization and how Python programming is employed to process SEM images, including histogram equalization and morphological operations for accurate analysis.

Proposed method
LSTM feasible model action recognition is being used to create a sign and body language detection system that will accurately recognise and translate human gestures and sign language, enhancing communication for hearing-impaired or deaf persons.The goal of developing a system that can accurately interpret and comprehend human gestures, specifically sign language and body language, is to identify sign and body language using an LSTM model.Recurrent neural networks (RNNs) of the LSTM (Long Short-Term Memory) type are well known for their capacity to efficiently capture sequential patterns and relationships.The sign and body language detection mainly focuses on 4 different stages of implementation.Then 4 modules are, namely 1.The data collection module is used to collect the data from user.This module is an essential component.2. Data preprocessing module is responsible for pre-processing the datasets.3. Training module is used train the LSTM model with sequential model to acquire results.4. The evaluation of LSTM model is done using a distinct set of videos.The model's performance will be evaluated based on its ability to correctly recognise and classify a diverse range of sign language gestures.

Data collection module
The data collection module is an essential component of this research paper.It involves the acquisition of a dataset consisting of videos depicting individuals performing a variety of sign language gestures.The dataset will serve as the primary source of information for subsequent analyses and evaluations.The proper labelling of sign language motions corresponding to each video is imperative.

Data preprocessing module
The initial step in the development of the proposed model involves the preprocessing of the dataset's videos.The Data Preprocessing module is responsible for performing this task.The use of Mediapipe in this scenario enables the extraction of hand-held key points and additional data from every video frame.The LSTM model is created by combining TensorFlow and Keras.

Training module
Following the pre-processing of data, the Long Short-Term Memory (LSTM) model can be trained utilising the extracted features.The Long Short-Term Memory (LSTM) model will be subjected to training with the aim of detecting patterns in sequential data and categorising each sign language gesture.

Evaluation module
Following the training of the LSTM model, a distinct set of videos can be used to assess the model's performance.The objective of this study is to develop a model that can effectively identify and categorise sign language gestures with a high degree of accuracy.The model's performance will be evaluated based on its ability to correctly recognise and classify a diverse range of sign language gestures.

Results and discussions 4.1 Description about Dataset
The dataset for sign language detection using an LSTM model is a collection of video data capturing various sign language gestures and motions.The dataset is curated and labelled to teach machine learning models to recognize and understand sign language motions and activities.The dataset consists of 30 video sequences, each representing 30 frames from significant events, stored as a NumPy array.The dataset contains numerous examples of each gesture under various circumstances, such as shifting lighting, backgrounds, and camera angles.The dataset is used in conjunction with Media pipe, a framework for creating multimodal machine learning pipelines, which offers pre-trained models and tools for sifting through video footage.The LSTM feasible model can learn temporal relationships and recognize sign language gestures and activities with accuracy using these features as input for training and inference.This dataset is used to train LSTM models using Media pipe for action recognition and sign language detection.

Experimental Results
The paper focuses on sign language detection using an LSTM deep learning model, specifically for action recognition.The data preparation phase involves acquiring and organizing sign language data, including key point sequences from the Media pipe Holistic model.The dataset is categorized into actions or signs, and multiple sequences are generated for each action to enhance diversity.
The LSTM model is employed for action recognition, consisting of dense layers and Rectified Linear Unit (ReLU) activation functions.The train test split method divides the dataset into training and testing sets, and the label labels for each gesture are one-hot encoded.The model is trained using the training set and stored for future use.
The trained model is loaded into the webcam feed using OpenCV's Video Capture class, and the Media pipe Holistic library is used to detect and extract landmarks for the face, pose, and hands in each frame.Key points are inputted into the LSTM feasible model to predict the current gesture, and the highest predictive probability is determined.When the prediction probability exceeds a threshold, the recognized gesture is incorporated into a sentence, and the sentence and prediction probabilities are presented on the frame.
A pre-existing body language classification model is loaded via Pickle, and facial and body landmarks are extracted from Media pipe Holistic results.The pre-trained body language model predicts the class and likelihood of the identified body language, and the frame displays the predicted class.The output is displayed using the Visualisation and Output show in OpenCV, which displays a combination of the user's sign language and body language on the frame.

Conclusion and future enhancement
Technologies that recognise body language and signs have the potential to revolutionise accessibility and communication for people who are deaf, hard of hearing, or have speech difficulties.These tools leverage advances in artificial intelligence, machine learning, and computer vision to interpret and grasp the intricate hand and body gestures used in body language and sign language.By detecting and recognising sign language, these technologies can enable real-time translation between sign language and spoken or written language.This improves communication between sign language users and non-sign language users.This may make it possible for those who have hearing loss to engage more fully in a variety of activities, including social interactions, employment, education, and using public services.Contrarily, body language detection focuses on deciphering nonverbal signs and expressions expressed through body language, including gestures and facial expressions.It makes it feasible for machines to comprehend human emotions, intentions, and communication styles and to act correctly in response.The focus of current models is typically on identifying lone motions or signs.If the models could be enhanced to handle continuous and fluid gestures and capture the dynamics of motions and transitions, it would be advantageous for continuous and natural communication.This requires understanding the subtleties of body language expressions and recognising words or sentences in sign language.

Figure 2
Figure 2 shows the Architecture of proposed Sign and Body Language Detection Model.

Fig. 2 .
Fig. 2. Work flow of the proposed system.

Fig. 3 .
Fig. 3. Sequence of videos for each sign.