Prelimenary results of red lesion segmentation in WCE images

. Wireless capsule endoscopy (WCE) is a novel imaging technique that can view the entire small bowel in human body. Thus, it is presented as an excellent diagnostic tool for evaluation of gastrointestinal diseases compared with traditional endoscopies. However, the diagnosis by the physicians is tedious since it requires reviewing the video extracted from the capsule and analysing all of its frames. This tedious task has encouraged the researchers to provide automated diagnostic technics for WCE frameworks to detect symptoms of gastrointestinal illness. In this paper, we present the prelimenary results of red lesion detection in WCE images using Dense-Unet deep learning segmentation model. To this end, we have used a dataset containing two subsets of anonymized video capsule endoscopy images with annotated red lesions. The ﬁrst set, used in this work, has 3,295 non-sequential frames and their corresponding annotated masks. The results obtained by the proposed scheme are promising.


Introduction
Diseases of the digestive tract, such as esophagus, stomach and small intestine, colon, and other digestive organs cancers pose a serious threat to human health. Many types of endoscopy are employed to examine the patient's gastrointestinal tract. For example, gastro-copy, pressure enteroscopy and colonoscopy are used to examine the human digestive system. However, most of the above endoscopy tests are limited to examine the human small intestine. To overcome these limitations, the WCE examination developed by Given Imaging in 2000 was presented as an excellent diagnostic tool for evaluation of gastrointestinal bleeding, ulcers, Crohn's disease and others digestive abnormalities. Capsule endoscopy is mainly used to find the cause of unexplained bleeding in the digestive tract or in case of inflammation, red lesion or tumors in the small intestine. Nevertheless, this review or analysis of these videos by physicians can take several hours, which is tiring, so they may miss parts where abnormalities of the gastrointestinal tract are present, since these parts are often present only in a few frames of the video sequence. Furthermore, the size and distribution of the anomalies make their identification with the naked eye very difficult. Thus, computer-assisted diagnosis and o✏ine post processing have been developed to facilitate the reading of the WCE video sequences, as well as to alleviate the heavy burden of doctors. ⇤ e-mail: charfisaid@gmail.com ⇤⇤ e-mail: melansari@gmail.com ⇤⇤⇤ e-mail: a.ellahyani@uiz.ac.ma ⇤⇤⇤⇤ e-mail: eljaafari.ilyas@gmail.com Many approaches have been proposed for automatic detection of GI tract diseases using handcrafted features in WCE images [1]. In [2,3], a multi-scale analysis-based grey-level co-occurrence matrix (GLCM) is conducted for ulcer detection. In the second work, local features extractor is proposed for abnormalities detection in WCE images. In this work, we mainly focus on those based on deep learning models. In [4], Zou et al. used deep convolutional neural network (DCNN) framework to classify WCE images digestive organs. Authors in [5] proposed a novel WCE classification system by a hybrid CNN with extreme learning machine (ELM). Instead of utilizing the conventional full-connection classifier in DCNN classification system, the cascaded ELM was used as a strong classifier in the previous work. Zhu et al. [6] introduced DCNN model to extract generic features of WCE images and applied SVM as a classifier to detect lesions. In [7,8], systems for small intestine motility characterization and bleeding detection, based on Deep CNN were introduced. In [9] authors proposed a novel deep feature learning method, named stacked sparse autoencoder with image manifold constraint (SSAEIM). In [10], authors proposed to split the image into several patches and extract color features pertaining to each block using a CNN. A novel o✏ine and online 3D deep learning integration framework by leveraging the 3D fully convolutional network (3D-FCN) was stated in [11]. A study in [12], proposed a novel method that is based on CNN to automatically recognize polyp in small bowel WCE image. They utilized the Alexnet architecture, one of the classical CNN, to ex-tract the features of WCE images and classify polyp images from normal ones. He el al. [13] proposed a novel deep hookworm detection framework (DHDF) for WCE images. This new framework simultaneously models visual appearances and tubular patterns of hookworms. In [14], authors proposed a method for WCE videos sammurazition based on a hybrid unsupervised method using long short-term memory (LSTM), variational autoencoder (VAE), pointer network (Ptr-Net), generative adversarial network (GAN), and de-redundancy mechanism (DM) techniques. In [15], authors used deep CNN and metric learning with Triplet Loss function for polyp detection. Mohebbian et al. [16] exposed an approach based on learning single models using samples from only one class, and ensemble all models for multiclass classification is proposed. In the first step, deep features are extracted based on an autoencoder architecture from the preprocessed images. Then, these features are oversampled using Synthetic Minority Over-sampling Technique and clustered using Ordering Points to Identify the Clustering Structure. To create one-class classification model, the Support Vector Data Descriptions are trained on each cluster with the help of Ant Colony Optimization, which is also used for tuning clustering parameters for improving F1-score. This process is applied on each class and ensemble of final models used for multiclass classification. Training a CNN model, from scratch, for WCE images classification was introduced by researchers in [17]. Their method aims to multi diseases detection i.e., normal mucosa, bile predominant, air bubbles, debris, inflamed mucosa, a typical vascularity, and bleeding. Coelho et al. [18] presented a method for red lesion detection Using U-net architecture. An approach for intestinal hemorrhage detection was stated in [19]. The scheme is based on CNN architecture after reducing the color palette using minimum variance quantization. A framework for best features selection was introduced in [20], the authors presented a fully automated system for stomach infection recognition based on deep learning features fusion and selection. In this design, ulcer images are assigned manually and support to a saliency-based method for ulcer detection. Later, pre-trained deep learning model named VGG16 is employing and re-trained using transfer learning. Features of re-trained model are extracted from two consecutive fully connected layers and fused by array-based approach. Besides, the best individuals are selected through the metaheuristic approach name PSO along mean value-based fitness function. The selected individuals are finally recognized through Cubic SVM. Ellahyani et al. [21] proposed a method using ELM. They extract extracted HOG features from images in HSV colour space. In [20], a novel semi-supervised learning method with Adaptive Aggregated Attention (AAA) module for automatic WCE image classification is proposed. Firstly, a novel deformation field based image preprocessing strategy is proposed to remove the black background and circular boundaries in WCE images. Then they proposed a synergic network to learn discriminative image features, consisting of two branches The first brach consists of an estimator of abnormal regions and a distiller of abnormal information.
The first branch employs the proposed AAA module to extract global dependencies and incorporate context information to highlight the most meaningful regions. The second branch focuses on these calculated attention regions for accurate and robust abnormality classification. Finally, these two branches are jointly optimized by minimizing the proposed discriminative angular (DA) loss and Jensen-Shannon divergence (JS) loss with labeled data as well as unlabeled data. Hajabdollahi et al. [22], proposed a bifurcated structure with one branch performing classification, and the other performs the segmentation. Initially, separate network structures are trained for each abnormality separately and then primary parts of these networks are merged. A research in PubMed for all original publications on the subject of deep learning applications in WCE was conducted in [23]. A survey on deep learning models for WCE image analysis was carried out in [24]. In [25], a parametric rectified nonlinear unit activation function was stated. The rest of the paper is organized as follows. Section 2 details the new proposed method for red lesion detection through the WCE images. Experimental results are illustrated in section 3. At the end, the paper in concluded in section 4.

Proposed method
In this section, the method used for red lesion detection is presented. The model used in this study is the Dense-Unet architecture, proposed in [26] to segment images. The Dense-UNet, which is based on U-net structure, employed the dense concatenation to deepen the depth of the network architecture and achieve feature reuse. This model included four expansion modules (each module consisted of four down-sampling layers) to extract features. Figure 1 depicts the model architecture. It can be seen as a two symmetrical paths i.e., upsamling and downsampling paths. These two paths are concatanated at the end. The Dense-UNet contains 10 dense_blocks. These blocks can be split into 5 dense_blocks in the dense upsampling path and 5 dense_blocks in the dense downsampling path. Each dense_block composes of 4 densely layers connected and having the same feature size [26].

Experimental Results
In this section, the obtained results of the proposed approach are presented. Accuracy and loss have been employed to assess the choices of the Dense-Unet model. First, the dataset exploited in this work is introduced. Then, the resutls of the red lesion detection using Dense-Unet are given. Finally, comparison between Dense-Unet and Unet is carried out.

Dataset
The dataset used in this work was first presented in [18]. It contains 3,295 images with red lesion and 3,295 annotated masks, each correspanding to an abnormal image. The size of the images is 320⇥320 pixels. The dataset was split into training and testing sets. For training we have used 2,636 images and the remaing ones for testing. Fig. 2 shows some of the images of the dataset.

Implementation Details
The Dense-Unet network was trained from scratch with red lesion dataset, which was split randomly in 80% for training and 20% for validation, to detect and segment red lesions. The training was made using binary cross entropy as cost function, in 1 cycle of 20 epochs with the Adam optimizer. The learning rate was 1E-4. The model evaluation was performed by comparing its predictions with the annotated masks, used as ground truth, based on accuracy metric. The network was implemented in Python 3.6 and all experiments were performed on a machine provided by google Colab with GPU and 12GB RAM. The Dense-Unet was implemented using Keras with TensorFlow as backend.

Results
Before exposing the results obtained by the proposed method, definition of the measures employed to assess the performance is given below.

Accuracy =
Number o f correct predictions T otal samples (1) As mentioned before, the proposed approach to detect red lesion in WCE images, is carried out using Dense-Unet. This model achieved 96.10% in accuracy and a loss of 0.1761 in the training set. In the validation set, the model reached 0.3118 and 92.79% in the loss and accuracy, respectively. Figure 3, shows some test images in the segmented red lesion region.

Comparison with state-of-the-art methods
In order to assess the obtained results, we compared the proposed approach with state-of-the-art one. For this purpose, we compared the proposed method with the one presented in [18] since the authors used the same dataset in their work. The method presented in this work used Unet architecture for red lesion detection. Table 1 shows the results obtained for both appraches. As can be seen in the table the proposed method outperformed the method in [18] with 0.22%.

Conclusion and perspectives
In this paper, we presented the preliminary results of red lesion detection using Dense-Unet architecture in WCE images. We compared the results obtained with a literature approach in the same dataset and the performance is  satisfactory. Hence, we can conclude that the proposed scheme shows promising results in terms of accuracy and loss. It achieved 0.1761, 96.10% in terms of loss and accuracy, respectively. In the future work, we are planning to improve the robustness of the method by applying transfer learning. Moreover, fine tuning various parameters will lead to further improvements.