Writer identification using VLAD Encoding of the Histogram of Gradient Angle Distribution

. The use of computers and automatic systems has enabled scientific researchers to improve the classification rate in the field of writer identification. In our paper, we will propose an identification system based on the use of Histogram of Gradient Angle Distribution (HGAD) in square patches centered around Harris Keypoint locations. A global descriptor per image is calculated subsequently via the VLAD encoding of the local descriptors relating to the histograms of the square patches. The study carried out on two public datasets CVL and BFL made it possible to achieve very interesting identification rates with 99.4% in BFL and 99.7% in CVL.


Introduction
Writer identification is a very important area of research.Writer recognition seeks to identify the author of a questioned document according to several parameters based on his handwriting style.The work carried out made it possible to cover several languages such as Arabic [4], English [25], German [10], French [6], Portuguese [5] and Chinese [28].A wide variety of systems have been proposed for writer recognition, which can be divided into two groups: structure-based and texture-based systems.The first category represents systems based on the calculation of structural characteristics such as curvature, spacing and inclination of handwritten texts, while the second category represents systems based on the extraction of local or global textural features from handwritten document [17].
Among the proposed systems which are based on the structural approach, a study conducted by [26].The author employed textural features, such as slant, line separation, and character shapes which represent a subset of the verification tools used by forensic document examiners.
A textural approach based on LBP, LPQ and LTP were presented by Hannad et al. in 2016 [16].These local characteristics are calculated from small fragments of a handwritten document.The same author published another study [15] employing the combination of oriented gradient histograms (HOG) with LBP and LPQ.The proposed technique improved the identification rate on the IFN / ENIT dataset to 96.9%.
In addition to the systems mentioned above, several methods have been proposed based on other local descriptors, such as Run length [12], edge-hinge [12], edge-direction [12], Contour-hinge [8], Contourdirection [8], CLBP [1], VLBP [1] ... The introduction of neural networks has made it possible to achieve good performance [24,25,13,27,10,18].Methods based on deep learning generally employ two classification approaches.An end-to-end approach [18] that is based on CNN's last layer classification.While in the second approach, local descriptors extracted from a given layer are introduced directly into a classifier [13] or encoded using VLAD, Triangulation Embedding or Fisher Vector encoding techniques [25,10] to have global descriptors.
In our study, we will propose an identification system based on a new local features called Histogram of Gradient Angle Distribution (HGAD) and calculated at the level of small square fragments.To have more discriminating features, we chose fragments centered around the Harris Keypoints locations.After the constitution of the histograms (HGAD), we proceed to the encoding of the resulting features using the VLAD encoding method.For the classification step, we chose KNN.In our study, we investigate several patch sizes and several angle intervals.
The highlights of our study are:  Propose an identification system based on the Angle Distribution Histogram in patches centered around Harris Keypoints. Check the impact of the fragment size on the performance of the proposed system. Conduct tests on two public datasets CVL and BFL.

Methodology
As shown in Figure 1, our approach uses textural features.These features are calculated from the fragments of images.To extract these fragments, we first detect the locations of HARRIS key points.Then, we calculate for each pixel of the handwritten image the corresponding gradient angle.Next, we extract the Histogram of Gradient Angle Distribution (HGAD) from the square fragments around HARRIS locations.Global descriptors are calculated by encoding the histograms (HGAD) using the VLAD encoding system.To classify these global descriptors, we use the KNN method.

Angle Distribution Histogram
The Histogram of Oriented Gradient (HOG) [11], is a feature descriptor that has been widely used in the field of computer vision for the detection and classification of objects.
The HOG descriptor uses the gradient intensity distribution to describe the local shape of a given object.To calculate the corresponding histogram, we divide the image into small square cells (4x4 to 12x12) and calculate the orientations of the gradients around each pixel in the region.Dalal and Triggs found that dividing angles into 9 classes gives very important results.The next step is to group several cells into larger blocks.Each cell can participate in two or more blocks.The blocks can be rectangular (R-HOG) or circular (C-HOG).The tests carried out by the authors demonstrated that the best performances were obtained with cells of size 6x6 pixels and rectangular blocks containing 9 cells.
Inspired by the HOG histogram, our descriptor is based on the calculation of the angles formed by the gradients around a pixel.But unlike HOG, our descriptor uses between 720 and 1400 angle classes calculated on fragments of larger size (20x20) and without using the notions of cells and blocks.To take advantage of the strengths of keypoints, we rely on Harris detector Corner to extract the fragments on which the Histogram of Gradient Angle Distribution (HGAD) is calculated.As shown by the tests carried out on the two public datasets BFL and CVL, the introduction of VLAD encoding method allows to have more interesting results.

VLAD Encoding
Considered as a non-probabilistic version of the Fisher Kernel, the VLAD encoding [20] has been widely used in the field of Writer identification.Normalization techniques such as power normalization and intranormalization [3] have improved the good results achieved by VLAD.
Consider X = x1,...,xN as the local descriptors of a given image.VLAD encoding requires three main steps.In the first step, a dictionary D=c1,...,ck of k clusters is generated using the k-mean method.In the second step, each local descriptor xj is assigned to its nearest center ci.Then, for each cluster, we compute its corresponding vector vi by calculating the sum of the differences between its center ci and the assigned local descriptors xj.
In the last step, we concatenate all the vi into a global descriptor, then we apply the power normalization and the intra-normalization steps.

Experiments & Results
In this section we will present and discuss the results of the tests carried out with the proposed methodology.We first describe the datasets used.Then, we present the results obtained by using several sizes of fragments, several classes of angles and several sizes of clusters and mini-batches.Finally, we present a comparison of the performances with the state of the art.

Datasets
CVL [22] : is a public dataset.It was produced by 310 writers.283 of them produced 7 different handwritten texts (6 in English and 1 in German) and 27 writers wrote only 5 documents.In our study we use a cropped version which contains only the handwritten section.For each writer class, we chose a handwritten image as the test page and the rest as training images.
BFL [14] : the Brazilian BFL dataset contains 945 handwritten images of 315 writers.Each writer produced 3 pages.The main purpose of the dataset is to design a platform for writer identification/verification tasks for forensics and Brazilian Federal Police.In our study we chose a page for the test phase and the other two pages for the training phase.

Impact of Fragment Size
In our study we will use the Top-K metric as the identification rate which refers to the case in which the true writer of a test document is present in the top K most likely writers classified by our system.As we can see in Table 1, our system is sensitive to variations in fragment sizes.Indeed, for very small and very large sizes, the identification results are very modest.Unlike medium-sized fragments that give promising results.
The second remark concerns the wide choice of sizes allowing to have good scores.Indeed, according to Table 1, all sizes between 9x9 and 60x60 allow to have top-1 identification rates greater than 98% and top-3 scores exceeding 99.4%.This implies that our system is not very sensitive to the variation in size and that the results obtained remain almost stable for a very wide interval of fragment size.
The same remark concerning the BFL dataset, can be shared with the CVL dataset where we find that the best result is obtained for the fragment size 20x20 with a top-1 identification rate of 99.7%.But unlike BFL, the stability interval of results at the CVL dataset level is smaller, because very good results are only achieved for fragment sizes between 13x13 and 35x35 with identification rates top-1 and top-3 exceeding 98% and 98.7% respectively.As we mentioned before, the HOG [11] descriptor is based on the division of images, in small square cells from 4x4 to 12x12 and on the division of the angles of the gradients into several classes.Dalal and Triggs found that the best number of classes is 9.In our approach, according to figure 3, the best results are recorded for very large numbers of classes (≥ 360).For small numbers of classes (≤ 120), the identification rates do not exceed 36% for the two datasets CVL and BFL.

Impact of the Number of Angle Classes
This implies that the increase in the number of classes helps to have more discriminatory information.This makes it possible to differentiate between the textual imprints of each writer.
For CVL, the good performance is that corresponding to the number of classes of 720 which gives an identification rate of 99.7%.Likewise for BFL, the best performance which is 99.4% is achieved for a number of gradient angle classes of 720.

Comparison with State of the Art
To see the level of performance of our system with respect to the old systems, we present in the two tables 2 and 3 some state of art works.From Table 2 we can see that our system outperforms other works.Likewise, Table 3 shows the performance of our system compared to other systems.In this table only the system of [26] which gives a result like ours.

Conclusion
In this article, we have proposed a new approach to writer identification based on VLAD encoding of gradient angle distribution histograms.The results obtained using two datasets BFL and CVL show the level of competence of our system compared to the state of the art.
In our approach, we used square fragments centered around HARRIS key points.However, other approaches can complement this study using either circular or rectangular fragments.In addition, in our study we have chosen fragments centered around HARRIS key points, but other scenarios based on other types of key points, such as SIFT, SURF or FAST, can be proposed.

Fig. 3 .
Fig. 3. Variation of the identification rate according to the number of angle classes using a number of clusters k = 64 and a fragment size of 20x20 for CVL and 19x19 for BFL.

Table 1 .
Classification results using the VLAD encoding system with a number of clusters (k = 64) and several fragment sizes

Table 2 .
Comparison with state of art on BFL dataset

Table 3 .
Comparison with state of art on CVL dataset