Loading...
A Fusion-Based Approach for Skin Cancer Detection Combining Clinical Images, Dermoscopic Images, and Metadata
Siddiqui, Ansah Juned
Siddiqui, Ansah Juned
Date
2025-10
Author
Advisor
Type
Thesis
Degree
Citations
Altmetric:
Files
35.232-2025.62a Ansah Juned Siddiqui.pdf
Adobe PDF, 2.04 MB
- Embargoed until 2027-02-19
Description
A Master of Science thesis in Machine Learning by Ansah Juned Siddiqui entitled, “A Fusion-Based Approach for Skin Cancer Detection Combining Clinical Images, Dermoscopic Images, and Metadata”, submitted in October 2025. Thesis advisor is Dr. Salam Dhou and thesis co-advisor is Dr. Tamer Shanableh. Soft copy is available (Thesis, Completion Certificate, Approval Signatures, and AUS Archives Consent Form).
Abstract
Skin cancer classification is a critical task where artificial intelligence (AI) can enhance diagnostic accuracy for both binary and multi-class classification. This research proposes a multimodal AI- driven framework to classify skin cancer lesions using dermoscopic images, clinical images, and patient metadata. By leveraging the MRA-MIDAS: Multimodal Image Dataset for AI-based Skin Cancer, this thesis aims to build a comprehensive model that is able to classify the patient cases into malignant or benign and further into specific malignant and benign classes. The objective of this thesis is achieved through developing an effective fusion strategy that captures complementary information from multiple modalities to improve both binary and multi-class classification performance. A major challenge in this task is handling varying image resolutions, which can impact feature consistency across modalities. To address this issue, the study introduces a novel region-of- interest (ROI) extraction method using object detection and feature matching techniques for precise spatial alignment. Additionally, various multimodal fusion strategies—early, intermediate, and late fusion are explored to determine the optimal approach for integrating tabular and image data. The experimental setup includes standalone image classification using Convolutional Neural Networks(CNN), metadata-based classification using classical machine learning algorithms, and multiple fusion techniques to assess their impact on overall classification performance. Both binary and multi-class classification tasks are conducted and evaluated. For binary classification, the late- fusion multimodal approach combining images and metadata with EfficientNetB7 and weighted averaging achieved the best performance, reaching an accuracy of 79.19%. For multi-class classification, the multimodal EfficientNet framework demonstrated strong results, with an accuracy of 90.3% for the malignant classifier (five classes), 71.91% for the benign classifier (six classes) and 70.5% for the unified classifier (11 classes). By systematically analyzing the effectiveness of various fusion approaches, this thesis showed that late fusion with weighted averaging was the most promising strategy, with EfficientNet-based models yielding the best overall performance. The malignant classifier achieved the highest accuracy, suggesting that malignant subtypes may possess more consistent visual features making classification easier as compared to the initial binary classification.
