Combination of Convolutional Neural Network and AdaBoost for Breast Cancer Diagnosis

. Breast cancer is a cancer that develops from breast tissue. Early symptoms of breast cancer include the existence of a lump in the breast, a change in breast shape, or dimpling on the skin. This research explores the potential of ensemble learning, with a focus on the AdaBoost algorithm, to enhance the performance of Convolutional Neural Networks (CNN) in image classification tasks, particularly with breast cancer image datasets. The architectures in focus were VGG-16, ResNet50, and Inception V4, three prevalent CNN models with proven efficiency in image recognition tasks. Coupling these CNN models with AdaBoost led to notable performance improvements in individual tests. The study further pushed the envelope by constructing an ensemble model that combined all three CNN models. This ensemble, with AdaBoost, demonstrated impressive performance across various datasets. With precision and recall scores exceeding 0.94, an F1-Score of 0.96, and an overall accuracy of 0.95 to 0.99. The significant performance boost can be attributed to the richer feature space generated by the ensemble of multiple CNN models and the iterative refinement of predictions provided by the AdaBoost algorithm. Despite the ensemble model's complexity and increased computational demand, the results provide a compelling justification for its use. Further research could delve into optimizing such ensemble models, exploring other ensemble strategies, or testing the models on diverse datasets and tasks beyond image classification.


Introduction
Breast cancer is a diverse illness that consists of several entities with unique histological and biological characteristics, clinical presentations and behaviors, and therapeutic responses [1].Breast cancer accounted for 29% of all new cancer cases and 14% of all cancerrelated deaths among women worldwide from 2000 to 2012.In diagnosing breast cancer, mammography is the method of observing the human breast with low-energy X-rays for screening.Due to recent technological advancements in the modern era, instead of using X-ray film to evaluate breast tissue for breast cancer, digital mammography alternatively employs digital receptors and computers, allowing for modification and a better view of the data for radiologists [2].However, the workload of radiologists grew significantly due to the collection of larger datasets and the difficulty of image interpretation.Therefore, deep learning technologies such as Convolutional Neural Networks (CNN) are being developed to aid radiologists and enhance the accuracy of screening mammography.
Numerous types of research have been conducted using CNN architectures across many different fields.Several studies showcase the efficacy of ensemble learning and deep learning in diverse applications.Mohamed et al. [3] developed a deep learning-based system that combines the UNet network with a suggested deep learning model to automate and increase the accuracy of thermography systems.At the same time, the utilization of BreastNet for breast cancer diagnosis was proposed by Zuluaga-Gomez et al. [4].CNN architectures have demonstrated the capability to achieve a high accuracy score of 99.10% [5].However, CNN can be sensitive to noise or irrelevant features in the training data.If the noise or irrelevant features are not properly accounted for or removed, the model can learn to incorporate them into its learned representations, leading to higher variance.It is commonly referred to as neural networks having a large variance.
Variance in CNN models often results in overfitting.Overfitting occurs when a model memorizes the noise in the training data instead of learning the underlying patterns, leading to poor generalization of new data.Ensemble learning is a technique that reduces variance by combining predictions from multiple models trained on the same dataset [6].By integrating the predictions, ensemble models can decrease variance and produce superior predictions compared to individual models.In classification tasks, ensemble models generate predictions using multiple models, and the final result is determined by combining these predictions.Ensemble learning compensates for the weaknesses of individual models by leveraging the strengths of multiple models, leading to more accurate and reliable classifiers.In this research, the ensemble learning model AdaBoost is utilized with CNN models to reduce instances of overfitting.The ensemble model used in this research is AdaBoost, which will be ensembled with CNN models, namely VGG-16, ResNet50, and Inception V4.The CNN models are trained on the Breast Cancer datasets MIAS, CBIS-DDSM, and INbreast and then initialized with AdaBoost to investigate the effect and behavior of the utilization of AdaBoost regarding its capabilities of reducing overfitting on CNN models.

Literature Review
There are a variety of applications of ensemble learning and deep learning methodologies across diverse fields.Das [7] utilized an Adaptive Activation Function-based U-Net for automated COVID-19 X-ray diagnosis, replacing the fully connected layer of CNN with SVM, Autoencoder, and Naive Bayes classifiers, achieving better performance than other algorithms.Yazdizadeh et al. [8] developed ensemble learning with CNN to classify transportation mode data, with a random forest meta-learner achieving the highest accuracy.Altameem et al. [9] used four deep CNN architectures to classify mammography images, with a fuzzy ensemble approach outperforming individual methods.

Methodology
The proposed research aims to assess the impact of using the AdaBoost algorithm, specifically AdaBoost, in constructing Convolutional Neural Network (CNN) architectures for diagnosing breast cancer mammography images.The CNN architectures used are VGG-16, ResNet50, and Inception V4.The procedure of the proposed research is visible in Figure 1.
In Table 1, the dataset that is used for the proposed research is listed.The data for building these models come from three different providers: the MIAS dataset from University College London's Medical Physics and Bio-engineering Unit [10], the INbreast dataset from The Portuguese Institute of Oncology [11], and the CBIS-DDSM dataset from Massachusetts General Hospital and Mayo Clinic's Biomedical Imaging Resource [12].

VGG-16
The VGG-16 architecture comprises 16 layers, including 13 convolutional layers and three fully connected layers [13].The convolutional layers use small 3×3 filters with a stride of 1, and the pooling layers use max pooling with 2×2 filters and a stride of 2. The fully connected layers have 4096 neurons each, followed by a final softmax layer for classification.

ResNet50
The ResNet50 architecture takes input images of size 224×224 pixels.The architecture comprises 50 layers, including convolutional layers, batch normalization layers, activation layers, and identity blocks [14].Pretrained weights from the ImageNet dataset to initialize the weights of the ResNet50 model are used as they can significantly speed up the training process and improve the performance of the network.

Inception V4
The Inception V4 architecture takes input images of size 299×299 pixels.The architecture comprises a stem module, multiple inception modules, and a final classification layer [15].Layers to the model are added individually, starting with the input layer, and the appropriate activation functions and padding are used.
The stem module is designed to extract features from the input image, while the inception modules consist of multiple parallel convolutional layers and pooling layers.

AdaBoost (AdaBoost)
AdaBoost produces a powerful classifier by combining weak classifiers that have been iteratively trained [16].
All training samples are given equal weights at the start of the algorithm.A weak classifier is trained on the weighted training data on each iteration, and its performance is evaluated.In AdaBoost, a decision stump is trained on a weighted subset of the training data, where the weights are initially set to be equal for all samples.During each iteration, a decision stump is trained on a different subset of the training data, where the weights of the misclassified samples are increased, and the weights of the correctly classified samples are decreased.Combining the decision stumps with their weights, the final strong learner is obtained.The weights of the inaccurately classified samples are increased based on the classifier's performance, so the subsequent weak classifier concentrates more on those samples.This procedure is continued until a stopping criterion is fulfilled, such as a maximum number of iterations or a minimal error rate.The formula to calculate the sample weight is Equation 1.
where N is the total number of data points, the final strong classifier is constructed by taking a weighted combination of the weak classifiers, where the weights are inversely proportional to the accuracy of the classifiers after all the weak classifiers have been trained.Each weak classifier produces a prediction during classification, and the final prediction is generated by combining all of the weak classifiers' predictions using their weights.

Results
During the training process of VGG-16, a notable and consistent increase in the accuracy score was observed with each epoch.This consistent improvement indicates that the model is effectively learning and capturing the underlying patterns in the data, as seen in Figure 2.

Fig. 2. Model training result of VGG-16
The ResNet50 model exhibited excellent training performance, as evidenced by the steady increase in accuracy with each epoch, as depicted in Figure 3.This consistent improvement in accuracy demonstrates the effectiveness of the ResNet50 architecture in learning and capturing intricate patterns within the dataset.

Fig. 3. Model training result of ResNet50
The observed increase in accuracy in both models with each epoch signifies that the model's learning process is efficient and successfully converges toward an optimal solution.This steady improvement suggests that the chosen hyperparameters and architecture are well-suited for the task, enabling the model to capture and understand the complex relationships within the data.The initial phase of training a CNN model included meticulous hyper-parameter tuning that ensured optimal model performance, as shown in Table 2.For the learning rate, the parameter chosen for the procedure is at the value of 0.001, and the deployment of the Adam optimizer has demonstrated strong results.A dropout layer value of 0.5 is chosen for the parameter as it yields the best score in the experimental phase, completing the model executed for ten epochs.

Results of Individual CNN Models
Table 3 outlines the performance accuracy of the three models.VGG-16 and ResNet50 exhibited equal performance accuracy of 0.90 on the MIAS and INbreast datasets and a slightly lower accuracy of 0.89 on the CBISDDSM dataset.On the contrary, Inception V4 showed significantly lower accuracy across all datasets, scoring 0.46 on the MIAS dataset and 0.44 on both the CBIS-DDSM and INbreast datasets.6 presents the confusion matrix of the ResNet50 model combined with the AdaBoost algorithm.For the actual benign class, 94% were correctly predicted as benign, while 3% were misclassified as malignant, and another 3% as normal.When the actual class was malignant, the model had a slightly lower correct prediction rate, with 89% correctly identified, 5% incorrectly predicted as benign, and 6% misclassified as normal.The model achieved its highest accuracy in predicting the normal class, with 92% correctly identified, while 2% and 6% were misclassified as benign and malignant, respectively.

Results of Individual CNN models with AdaBoost
In the building of the ensemble model, a method was adopted where the last layer of features extracted from each model was concatenated as the final layers of the models are highly representative of the learned patterns and carry a wealth of information regarding the dataset and thus could be combined to form a larger feature space that effectively encapsulates the strengths of each model.8 outlines the confusion matrix of the ensemble model, which incorporates VGG-16, ResNet50, and Inception V4, all combined with the AdaBoost algorithm.For the benign class, 96% were correctly predicted, with just 2% each misclassified as malignant or normal.The model shows even more precise predictions for the malignant class, correctly identifying 97% of cases, misclassifying only 2% as benign and 1% as normal.The model achieved excellent accuracy for the normal class, correctly predicting 97% of cases and erroneously identifying just 2% as benign and 1% as malignant.

Discussion
In Table 5, the performance of each CNN model is shown, and conclusions can be drawn.VGG-16 decreased performance when ensembled with AdaBoost, with a decrease of 4% in accuracy.This may be caused by the nature of VGG-16, in which the model excels in simpler tasks but needs help executing complex tasks and predicting new unseen data [17].When initialized with AdaBoost, this could lead to a problem where the training on a new subset of features is not efficient because of the limited capability of the model to process complex data.Table V shows that the ResNet50 model with AdaBoost improved significantly.This is because ResNet50 excels in processing hierarchical features from images due to its unique design, which employs shortcut connections to mitigate the vanishing gradient problem in deep neural networks [18].When initialized with AdaBoost, ResNet50's ability to handle complex data and AdaBoost iterative process work synergistically, forming a robust model.This combination heightens sensitivity to complex patterns and challenging instances, improving performance [19].Although the observed improvements may seem minor, they can have a significant impact when dealing with large datasets, underscoring the effectiveness of this method, especially in precisioncritical fields like medical imaging.Inception V4, which suffered from overfitting, increased its accuracy when initialized with AdaBoost.From Table V, it can be seen that the model had an increase of 6% in its accuracy.This means that AdaBoost is successful in reducing the overall variance in the training process of the model and reducing overfitting.The results show that the degree of improvement can vary depending on how well each model's characteristics align with the instances on which AdaBoost focuses.
The ensemble CNN model, harnessing the combined strength of the three CNN models and AdaBoost, achieved a high accuracy (near or above 0.94) across all datasets, attesting to the benefits of coupling robust CNN architectures with iterative learning.The superior performance of the ensemble model, which blends the features of VGG-16, ResNet50, and Inception V4, is due to several reasons.Mainly, it integrates the individual strengths of each architecture into a richer feature space [20] [21].VGG-16 excels in capturing diverse features, while ResNet50 is designed to navigate complex patterns without performance degradation.Inception V4, with its varied kernel sizes and modules, is efficient at feature extraction across different scales.Consequently, the fusion of these models results in a robust and comprehensive feature set, enhancing the model's ability to handle complex tasks [20].
The ensemble model's exceptional performance can also be attributed to several factors.Firstly, the diversity in the base models curtails error variance, as other models' correct predictions may offset individual model errors, thus enhancing generalization and reducing overfitting [22].Secondly, the AdaBoost algorithm adds another level of sophistication by focusing on previously misclassified instances, providing a corrective feedback loop that progressively increases the model's accuracy.The blend of distinct CNN models with the AdaBoost error-correcting ability results in improved overall accuracy.

Conclusion
This research aimed to enhance the performance of Convolutional Neural Networks (CNNs) in image classification tasks using ensemble learning.Three prominent CNN architectures, VGG-16, ResNet50, and Inception V4, were combined with the AdaBoost algorithm.The findings indicated significant performance improvements when these models were individually integrated with AdaBoost.A further combination of these models into an ensemble, complemented by AdaBoost, showed superior performance, achieving accuracy scores between 0.95 and 0.99 across various datasets.This improvement is attributed to the diverse feature space created by combining each model's strengths and AdaBoost's iterative learning.Despite the compelling results, such ensemble models' higher computational demand and complexity must be acknowledged.Future work should consider more advanced data pre-processing, other ensemble algorithms and strategies, hyperparameter optimization, and more diverse datasets or tasks.Integrating other CNN architectures with AdaBoost can also be considered, as different architectures may yield a more compelling result.Lastly, expert annotations from radiologists to segment the tumor location in the breast on the dataset can also be considered as it may increase the accuracy of the model.
The dissemination and publication of this research were funded by the Department of Computer Science and Electronics, the Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Indonesia.

Fig. 1 .
Fig. 1.The procedure of the proposed research.

Table 1 .
Research Dataset Information

Table 2 .
Hyper-parameter of the model

Table 3 .
Results Of The Individual CNN Model Normal' class, the model correctly predicted at a rate of 0.89 and misclassified as 'Benign' and 'Malignant' with rates of 0.07 and 0.04, respectively.
represent the predicted class.For the 'Benign' class, the model predicted correctly with an accuracy of 0.89.However, it misclassified 'Benign' as 'Malignant' and 'Normal' at 0.05 and 0.06 respectively.For the 'Malignant' class, the ResNet50 model achieved an even higher accuracy of 0.94, with misclassification rates of 0.01 for 'Benign' and 0.03 for 'Normal'.Finally, for the '

Table 4 .
Confusion Matrix Of The Individual CNN Model (VGG-16 With MIAS)

Table 5 .
Results Of Individual CNN Models With Adaboost

Table 6 .
Confusion Matrix Of The CNN Model With Adaboost (Resnet50 With Mias)

Table 7
displays the performance of the Ensemble CNN-AdaBoost model across various datasets.With an impressive accuracy of 0.97, the Ensemble CNN-

Table 7 .
Results Of CNN-Adaboost Ensemble

Table 8 .
Confusion Matrix Of CNN-Adaboost Ensemble