Automated Handwritten Text Recognition

. A computer's capacity to recognize and convert handwritten inputs from sources like photographs and paper documents into digital format is known as Automated Handwritten Text Recognition (AHTR). Systems for reading handwriting are frequently employed in a variety of fields, including banking, finance, and the healthcare industry. In this paper, we took on the problem of categorizing any handwritten artwork, whether it be in block lettering or cursive. There are many different types of handwritten characters, including digits, symbols, and scripts in both English and other languages. This makes the evolution of handwriting more complex. It is difficult to train an Optical Character Recognition (OCR) system using these requirements. In order to convert handwritten material into digital form, this work aims to categorize each unique handwritten word. Because Convolutional Neural Networks (CNNs) are so good at this task, they are the best method for handwriting recognition system. The method will be used to identify writings in various formats.


Introduction
Automated Handwritten Text Recognition (AHTR) is a process of converting an image consisting of handwritten text into digital text using machine learning and deep learning algorithms.It involves recognizing, identifying and understanding unique patterns and shapes of the text in order to translate them into a machine -readable format.AHTR is a difficult and complex task due to the variations in handwriting styles and shapes, poor quality of images, noise and other parameters that can affect the quality of handwriting recognition.
AHTR has a wide range of real-world uses, including automated data entry where handwritten information from forms or surveys can be automatically extracted and converted into digital text, document digitization, signature validation, healthcare where AHTR extracts relevant information from handwritten medical records and prescriptions, business and biometric identification.
Deep Learning techniques, such as Convolutional Neural Network (CNN) technology, were implemented in the Keras.One of the most common deep neural networks is CNN, and it's made up of several layers of artificial neurons.In order to calculate the weighted sum of multiple inputs, artificial neurons are close copies of their biological counterparts and mathematical functions, which produce an activation value as an output.The basic features, such as vertical and diagonal edges, are usually taken out of CNN's first layer.This output is passed on to the subsequent layer that detects more complex features such as corners or edges.As we progress further into the network it can identify even more complex features such as objects, faces and other high-level representations.Deep learning is used in AHTR due to its ability to learn complex patterns and representations from vast amounts of data, which is well-suited for the challenges produced by handwriting recognition tasks.Deep learning is mainly used in AHTR due to these key reasons • Ability to Learn Hierarchical Features: Deep learning models, such as CNNs and RNN (Recurrent Neural Networks), are capable of automatically learning hierarchical features from raw input data.In the case of AHTR, deep learning models can learn to extract the details of different handwriting styles, including the variations in letter shapes, sizes and styles, which are critical for accurate recognition of handwritten text.

Literature survey
Gautham and team worked on HTR using Machine Learning (ML) Techniques in field of NLP [1].The goal of their work is to identify input images correctly, to detect the writings of different formats to convert the text in different formats.They approached this problem by using TensorFlow and OpenCV in the Pre-trained Models.The architecture they used is based on NLP.They used HKR dataset to train their model which has nearly 63000 sentences.They concluded that this model can be used in healthcare and consumer domains.Dan Shiferaw and team has done the work on HTR using Deep Learning (DL) [2].This work was to classify handwritten words so that handwritten text can be transformed to a machinereadable format using deep learning.They used a Convolutional Neural Network (CNN) with various architectures along with Long Short-Term Memory (LSTM) networks.
Preetha and their team worked on Machine Learning (ML) for Handwriting Recognition [3].Pattern recognition is one of the main applications of ML.They wanted to use this to detect handwritten docs and to digitize them.They used CNN and Slope and Slant Correction approaches to achieve their goal.They used IAM dataset and HKR dataset.G.R. Hemnath and team worked on CNN-RNN based HTR [4].Sumeet and team worked on Full -Page (FP) Handwriting Recognition (HR) using image-to-sequence extraction [5].To overcome the limitations of other models of Deep Neural Networks in HTR.Their models consisted of ResNet.Mayuri and team worked on Handwritten Text Conversion (HTC) using Deep Learning Techniques such as CNN [6].The goal of their work is to turn handwritten notes into text documents which are computer readable using a paragraph as input.They approached this problem by using (CNN) and OpenCV.The architecture they used is based on CNN.They used an IAM dataset to train their model which has nearly 5700 sentences.
Ahmed and team worked on a HTR system using Machine Learning (ML) techniques with the application of Artificial Neural Networks (ANN) [7].Their proposal aims to recognize a given image as input to develop an efficient handwritten text and number recognition system for English characters.They tackled that problem by using optical character recognition (OCR).Monica and team worked on Optical Character Recognition (OCR) using Deep Learning Techniques i.e., CRNN [8].The goal of their work is to recognize text on any image, such as scanned documents and photos.They approached this problem by using Deep Learning techniques such as CNN and Recurrent Neural Network (RNN).
Shubman and team worked on HTR using Deep Learning and Computer vision techniques [9].The goal of their work is to design an application for mobile devices that recognizes handwriting in different styles and shapes.They concluded that this model can be used in day-to-day life for recognizing handwritten characters.Yugandhar and Jayaram worked on HTR using deep learning and TensorFlow [10].The goal of their work was to recognize the handwritings from the input images which contain writings in different forms.The architecture they used is based on CNN.They used an IAM dataset to train their model which has nearly 1500 pages of scanned text.They concluded that this model can be used in healthcare and consumer domains.
Jae and team worked on OCR using Deep Learning Techniques i.e., CNN [11].The goal of their work is to recognize text in any image format and distinguish between handwritten and typewritten (machine printed) texts.They approached this problem by using Deep Learning techniques such as CNN.They used an IAM dataset consisting of 1,539 scanned form sheets from 657 authors.They concluded that this model could be further developed by addressing the problem of layer separation to reconstruct both components.
Hazem and team worked on OCR using deep learning techniques such as Artificial Neural Network (ANN) [12].The goal of their work is to identify input images correctly, to detect the writings of different formats.They approached this problem by using Deep Learning techniques such as 3-layer ANN, also supervised learning approaches are utilized.They used "55 samples of each English alphabet" as datasets.They came to a conclusion that the Scaled Conjugate Gradient algorithm has been turned out to be a better learning algorithm and produce better accuracy than the Resilient Back propagation algorithm in terms of accuracy and training time, while using the same configuration.
Reeve and team worked on HTR using Deep Learning without recurrent connections [13].The aim of their work is to recognize the handwritings from the input images which contain writings in different forms such as memos, whiteboards, medical records, historical documents.They approached this problem by using Tensor Flow and Neural Networks.The architecture they used is based on CNN+LSTM.They used an IAM dataset to train their model which has nearly 1500 pages of scanned text.They concluded that this model can be used in healthcare and consumer domains.
Denis and his research team worked on OCR using Deep Learning techniques i.e., CNN+LSTM [14].Their approach aims to identify text that is present in any kind of picture, including photos and scanned papers.They used deep learning methods like CNN+LSTM and Recurrent Neural Network (RNN) to solve this issue.They utilized datasets from "RIMES and IAM".12,723 pages total, gathered in the context of mail writing situations, make up the RIMES dataset.They came to the conclusion that the fields of education, law, and healthcare may employ this approach.
Thierry and team worked on Faster DAN: Multi-target Queries with Document Positional Encoding (DPE) for end-to-end Handwritten Document Recognition (HDR) [15].Their goal was to approach the HDR by combining both AHTR and Document Layout Analysis (DLA) with XML layout mark-ups.They used an encoder of a Fully Convolutional Encoder (FCN) for feature extraction in scanned images and a transformer i.e., a decoder in their model DAN.They used different datasets RIMES, both READ 2016 (single and double page), for training their model.They concluded that their work has overcome a major drawback of prediction time of other models.
Authors [16] highlighted the significance of ML in prediction, pattern recognition and error reduction across diverse fields, emphasizing the impact of AI in broad domain.The paper [17] discussed the use of machine learning and neural networks, especially CNN, for recognizing handwriting patterns, with a focus on Telugu film industry names, achieving high accuracy (98.3%).The paper [18] explores the distinct ML applications in predicting heart attacks using patient health records.It compares Random Forest and CNN methods, and findings showed that Random Forest's better performance in terms of accuracy.The approach [19] utilized Advanced Deep Learning with global threshold to improve Ecommerce product classification, achieving high accuracy and challenging existing technology.Authors [20] introduced an efficient method for early-stage disease detection in tomato plants using image processing techniques.Moreover, present work employed clustering, feature extraction, and neural networks, demonstrating superior performance compared to existing methods.

Problem statement
The problem statement of a handwritten text recognition paper is to develop an automated system that can accurately recognize and transcribe handwritten text from various sources, such as documents, forms, and manuscripts.The objective of the paper was to convert the handwritten text into digital format for further processing, analysis, and storage, i.e.

Objectives
The preliminary objective of the work was to accurately convert the handwritten text into digitized text.
• To create a model which can recognize different handwriting styles like cursive, italic, bold, etc., and to convert them.
• To create a user-friendly model for both novice and experienced users.The primary objective of automated handwritten text recognition is to convert handwritten text into machine-readable format, allowing it to be stored, processed, and analyzed digitally.

Description
The proposed AHTR aims to develop an accurate system to convert handwritten text into digital format.It utilizes computer vision and machine learning techniques.The proposed work inculcates image acquisition, pre-processing, segmentation, and feature extraction methods.Machine learning models are trained on labeled datasets to recognize and transcribe handwritten components.The system's output is reliable and editable digital text.The paper has applications in historical document digitization, handwritten note transcription, and automated data entry.It contributes to preserving and accessing valuable handwritten information.

Architecture of the proposed work
An architecture diagram is the visual representation of the structure and organization of a system or software application.Usually, it demonstrates how the system's many parts interact with one another.Architecture drawings can be used to explain a system's architecture to various stakeholders, including consumers, managers, and engineers.Architecture diagrams can take many forms, depending on the type of system or application being designed.They can be high-level diagrams that show the overall structure of the system or more detailed diagrams that show the interactions between individual components.Some common types of architecture diagrams include block diagrams, flowcharts, and UML diagrams.Overall, an architecture diagram is a useful tool for communicating the design of a system and can help ensure that all stakeholders have a shared understanding of the system's structure and behavior.
Figure 2 explains the sequence of operations performed in developing the current system.The input is a picture, which CNN's Image Processing Layers then process further.The necessary characteristics from the original picture are selected by these layers of a CNN through training.Each subcaste carries out three operations: videlicet, complication, activation, and decreased image production.We use the KERAS library and the TensorFlow (TF) to construct the CNN model for handwritten characters.The goal of the branch of research known as "Automated handwritten text recognition" (AHTR) is to create technologies that can recognise and convert handwritten text into text that can be read by machines efficiently.A handwritten text recognition system's architecture diagrams typically include a number of crucial elements.Starting with a submitted image of handwritten text, the handwritten text recognition system begins to work.This image can be retrieved from a variety of devices, including digital tablets, photos, and scanned papers.
• Preprocessing: To improve the quality and extract pertinent information, preprocessing is applied to the raw image.These preprocessing procedures could involve removing undesired artifacts, adjusting contrast, binarizing (turning the image black and white), skew correction (aligning the text lines), and reducing noise.• Text Line Segmentation: The picture is separated into individual lines of text in this step.
Text line segmentation is essential for dividing lines within a text document and increasing recognition accuracy.

Modules-connectivity diagram
A Module connectivity picture is a piece of architecture diagram which shows the connections and interaction between the different modules or components of a system or software application.In a module connectivity diagram, each module is represented as a block or node, and the connections between them are represented as lines or arrows.The diagram typically shows the flow of data or control between the modules, as well as any dependencies or relationships between them.Module connectivity diagrams are often used in software engineering and systems design to help developers and designers understand the overall structure of a system and how its various components interact with each other.They can also be utilized to locate possible bottlenecks or areas for system architecture optimization.Data collection is a crucial step in AHTR that involves gathering and preparing a dataset of handwritten text images and relevant transcriptions for training and testing the recognition system.Here the data is collected as an input image or else as any scanned document for further processing.This image or document is a snapshot of any handwritten text which needs to be converted into digital one.For handwritten text recognition algorithms to accurately recognize the input image, pre-processing techniques are essential.In order to reflect the distinctive properties of the segmented words or characters, significant information must be retrieved from them as part of the automated handwritten text recognition (AHTR) process.The classification model uses these attributes as input for recognition.The process of teaching a model to recognise and accurately transcribe handwritten text is known as handwritten text recognition training.Testing for AHTR is assessing how well a trained model performs on unobserved data to gauge its efficacy and accuracy in recognizing handwritten text.For training and testing our model we used two datasets named HandWritten_Character and Handwriting Recognition.For training and testing the model we used two datasets named HandWritten_Character and Handwriting Recognition.Handwritten character The Data set contains all the English rudiments (small and caps), integers (0-9) and some special characters (@, #, $, &) the images are 32 by 32 pixel black and white images.There are 39 orders in all.9 for integers (i.e., 1 to 9), 26 for rudiments (small and capital letters are combined to make a single class of each character).Number 0 is combined in character O order to prevent mismatching brackets and some special characters (@, #, $, &).Handwritten recognition More than 400,000 handwritten names were gathered for this dataset through charitable systems.It functions well in machine-published sources in general.Due to the wide range of unique writing styles, it still presents delicate issues for robots to honour handwritten characters.The total number of first names and last names is 207,024.A training set (331,059), testing set (41.382), and confirmation set (41.382) were each created independently from the data.Below are the screenshots of graphs of different plots of the working model behaviour: On the other side, validation accuracy refers to a model's accuracy on a validation dataset.The validation dataset is a distinct dataset that is not used to train the model but is instead used to assess the model's effectiveness while it is being trained.Detecting overfitting and determining the ideal hyper parameters for the model, such as the learning rate or the quantity of hidden layers, are accomplished by utilizing a validation dataset.A strong model should generally have strong training accuracy and strong validation accuracy.It is a sign that the model is overfitting the training data and may not generalize effectively to new, unseen data if the training accuracy is high but the validation accuracy is poor.The model may be too simplistic and needs to be more complex, or the data may be noisy or insufficient, if both the training accuracy and the validation accuracy are low.Several distinct scanned handwritten graphics in various styles were used to test the Handwritten Character Recognition system.The outcomes were incredibly positive.The suggested system pre-processes the image to get rid of the noise.The bitmap image representation is used to extract features, and the result is a classification accuracy of about 91%.The suggested system is advantageous because it trains the neural network with less features, which leads to faster convergence (reduced training time).Less computation is required for feature extraction, training, and testing, which is another benefit.

Results and discussions
Optimisation techniques, usually referred to as optimizers, are employed in AHTR to efficiently train the neural network models.In order to reduce the loss function and enhance the performance of the model, these optimizers change the model's parameters during training.Here are a few AHTR optimizers that are frequently used.Stochastic Gradient Descent is a basic and popular optimisation approach.Based on the gradient of the loss function calculated on a small batch of training instances, it modifies the model's parameters.SGD adjusts parameters incrementally, working its way towards the ideal outcome.
Adam is a well-known optimizer that combines the ideas of momentum and adaptive learning rates.Adam stands for Adaptive Moment Estimation.Based on the first and second moments of the gradients, it modifies the learning rate for each parameter.Due to its capacity to manage sparse gradients and different learning rates, Adam does well in many deep learning problems, including AHTR.Root Mean Square Propagation (RMSprop) -based on the exponentially weighted average of the squared gradients, RMSprop changes the learning rate for each parameter.It facilitates minimising the influence of large gradient values and accelerates convergence.RMSprop is renowned for its reliability and effectiveness in neural network training.Adamax is an extension of Adam that includes the update rule's infinity norm (gradients' maximum absolute value).When the gradients are sparse, it delivers a steadier update and performs better in models with huge parameter spaces.
An AHTR system can be evaluated using a variety of measures, including accuracy, precision, recall, F1 score, and word error rate (WER).Precision compares the proportion of correctly recognised characters out of all recognised characters, whereas accuracy compares the proportion of correctly recognised characters in relation to the total number of characters.The proportion of properly identified characters among all characters is known as recall, and the F1 score is a weighted average of recall and precision.For quicker training, the acquired data is preprocessed by scaling the images between 0 and 1 and resizing them to a standard size of 32x32 pixels.Following that, the data is divided into training and validation sets.Three convolutional layers, each followed by a max-pooling layer and a dropout layer to lessen overfitting, make up the model architecture.Following this are two dense layers, a flatten layer, and the final dense layer, which produces probabilities for each of the 35 classes.AHTR has significant importance in various fields such as: Historical Preservation, Data Entry, Accessibility, Education, and Fraud Detection Overall, the significance of a proposed AHTR lies in its ability to make handwritten documents and manuscripts more accessible and easier to use.This can help preserve historical records, streamline data entry, improve accessibility, and aid in fraud detection.

Conclusion and future enhancements
By using methods like pre-processing and successive model training, we have here offered an adaptive strategy for identifying the offline paragraphs in the system we have constructed.The input paragraph images are separated into line and word images using OpenCV contour algorithms as a first step in pre-processing before being sent into the NN model layers for recognition.Thus, our approach extracts a paragraph of handwritten English characters from an input image, and then a CNN model trained on the Handwritten_Character dataset predicts the text.Therefore, we may claim that by utilising deep learning techniques, we have made it possible to develop an approach that is novel for the issue.Our work plays an important role in the Healthcare Industry so that they can digitize all the health records of the patients.COVID-19 has made it possible for online classes but still exams are written offline on a paper and scanned for the connection.Digitizing these scripts will help for easy correction and verification.
In this, we achieved accuracy of more than 90%.This method will deliver a recognition result that is both efficient and effective.With the least amount of noise in the input image, this approach provides an accurate reading of the text.The dataset solely determines the correctness.We can achieve more accuracy if we enhance the data.The best outcomes also come from attempting to avoid writing in cursive.By using this model, we were able to successfully reach an accuracy of 91% (roughly).

Fig. 2 .
Fig. 2. Architecture diagram.• Word/Character Segmentation: Following the division of the text into lines, the words or characters that make up each line are then divided.To prepare the text for recognition, this technique separates it into discrete components (words or characters).For this, it is possible to use methods like linked component analysis, contour analysis, or deep learning-based approaches.• Classification: After the characteristics have been retrieved, the segmented words or characters are recognized using a model for classification.The input characteristics for this model are linked to the matching labels that represent the ground truth in a collection of labelled examples.Support Vector Machines (SVM), k-Nearest Neighbours (k-NN), recurrent neural networks (RNN), and connectionist temporal classification (CTC) models are common algorithms for classification used in AHTR.

Figure 3
Figure 3 Represents the Module-Connectivity diagram and its connections between the modules.The list of modules are, (a) Data collection and Preprocessing and Feature extraction, (b) Training the model, (c) Testing the model, Output.Data collection is a crucial step in AHTR that involves gathering and preparing a dataset of handwritten text images and relevant transcriptions for training and testing the recognition system.Here the data is collected as an input image or else as any scanned document for further processing.This image or document is a snapshot of any handwritten text which needs to be converted into digital one.For handwritten text recognition algorithms to accurately recognize the input image, pre-processing techniques are essential.In order to reflect the distinctive properties of the segmented words or characters, significant information must be retrieved from them as part of the automated handwritten text recognition (AHTR) process.The classification model uses these attributes as input for recognition.The process of teaching a model to recognise and accurately transcribe handwritten text is known as handwritten text recognition training.Testing for AHTR is assessing how well a trained model performs on unobserved data to gauge its efficacy and accuracy in recognizing handwritten text.

A
handwritten character dataset is a collection of pictures or samples of handwritten characters that are used for training and testing machine learning models for optical character /doi.org/10.1051/e3sconf/20234300102222 430 recognition (OCR) tasks.These datasets typically consist of a large number of images of individual handwritten characters or groups of characters that are annotated with the correct labels or transcriptions.Handwritten character datasets may include characters from different languages and scripts, such as Latin, Chinese, Arabic, or Devanagari, depending on the target application.The samples may be written in different styles, including printed or cursive handwriting, and may vary in size, orientation, and quality.Examples of popular handwritten character datasets include the MNIST (Modified National Institute of Standards and Technology) dataset, which has pictures of handwritten digits, and the EMNIST (Extended MNIST) dataset, which includes both digits and letters in uppercase and lowercase.Other examples of handwritten character datasets include IAM Handwriting Database, CEDAR (Consortium for Handwriting Recognition), and RIMES (Real Image and Multilingual Embedded Scene) dataset.These datasets are commonly used in research and development of OCR systems, as they provide a large number of labelled samples of varying complexity and allow for the evaluation of the performance of OCR models on real-world data.

Fig. 4 .
Fig. 4. Training vs validation accuracy.The performance of a model during training is evaluated using training accuracy and validation accuracy in machine learning.The accuracy of a model on training data is known as training accuracy.The model is trained on a labelled dataset during the training phase, and its performance is assessed using the same dataset.Training accuracy can be a useful measure of how well a model fits the data, but it can also be deceptive if the model is overfitting the