PROTEGO: Improving Breast Cancer Diagnosis with Prototype-Contrastive Autoencoder and Conformal Prediction on the WDBC Dataset

Marselina Endah Hiswati; Mohammad Diqi

doi:10.52436/1.jutif.2025.6.5.5294

Authors

Marselina Endah Hiswati Departement of Informatics, Universitas Respati Yogyakarta, Indonesia
Mohammad Diqi Departement of Informatics, Universitas Respati Yogyakarta, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.5.5294

Keywords:

Breast Cancer Diagnosis, Conformal Prediction, Prototype-Contrastive Autoencoder, Representation Learning, Uncertainty Quantification

Abstract

Breast cancer remains one of the leading causes of mortality among women, making accurate and trustworthy early detection a critical challenge in healthcare. To address this, we propose PROTEGO, a Prototype-Contrastive Autoencoder with integrated Conformal Prediction, designed to achieve both high diagnostic accuracy and reliable uncertainty quantification. The framework combines dual-head autoencoding, supervised contrastive learning, prototype-based regularization, and conformal calibration to generate discriminative yet interpretable representations. Using the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, PROTEGO was trained and evaluated through stratified data splits, with performance measured by AUROC, AUPRC, F1-score, Balanced Accuracy, Brier score, calibration error, and conformal coverage metrics. The results show that PROTEGO achieves highly competitive performance with an AUROC of 0.992 and an AUPRC of 0.995, while uniquely providing conformal coverage guarantees with an average set size close to one and more than 92% decisive predictions. Ablation studies confirm the complementary role of each component in enhancing both accuracy and calibration. These findings demonstrate that integrating prototype-guided representation learning with conformal prediction establishes a clinically meaningful diagnostic framework. PROTEGO highlights the importance of unifying precision and reliability in medical AI, offering a step toward more interpretable, safe, and clinically trustworthy systems for breast cancer detection.

Downloads

Download data is not yet available.

References

R. Rahman, D. Saha, W. Dkhar, S. Malli, and N. Barnes Abraham, “Development of a machine learning predictive model for early detection of breast cancer,” F1000Research, vol. 14, p. 164, Feb. 2025, doi: 10.12688/f1000research.161073.1.

M. S. A. Reshan et al., “Enhancing Breast Cancer Detection and Classification Using Advanced Mu lti-Model Features and Ensemble Machine Learning Techniques,” Life, vol. 13, no. 10, p. 2093, Oct. 2023, doi: 10.3390/life13102093.

G. Anastasi et al., “Machine learning techniques in breast cancer preventive diagnosis: a r eview,” Multimed. Tools Appl., vol. 83, no. 35, pp. 82805–82848, Mar. 2024, doi: 10.1007/s11042-024-18775-y.

M. R. Darbandi, M. Darbandi, S. Darbandi, I. Bado, M. Hadizadeh, and H. R. Khorram Khorshid, “Artificial intelligence breakthroughs in pioneering early diagnosis an d precision treatment of breast cancer: A multimethod study,” Eur. J. Cancer, vol. 209, p. 114227, Sept. 2024, doi: 10.1016/j.ejca.2024.114227.

M. Kumar, S. Singhal, S. Shekhar, B. Sharma, and G. Srivastava, “Optimized Stacking Ensemble Learning Model for Breast Cancer Detection and Classification Using Machine Learning,” Sustainability, vol. 14, no. 21, p. 13998, Oct. 2022, doi: 10.3390/su142113998.

F. Silva-Aravena, H. Núñez Delafuente, J. H. Gutiérrez-Bahamondes, and J. Morales, “A Hybrid Algorithm of ML and XAI to Prevent Breast Cancer: A Strategy to Support Decision Making,” Cancers, vol. 15, no. 9, p. 2443, Apr. 2023, doi: 10.3390/cancers15092443.

T. Islam et al., “Predictive modeling for breast cancer classification in the context of Bangladeshi patients by use of machine learning approach with explain able AI,” Sci. Rep., vol. 14, no. 1, Apr. 2024, doi: 10.1038/s41598-024-57740-5.

T. Tran, U. Le, and Y. Shi, “An effective up-sampling approach for breast cancer prediction with im balanced data: A machine learning model-based comparative analysis,” PLOS ONE, vol. 17, no. 5, p. e0269135, May 2022, doi: 10.1371/journal.pone.0269135.

D. Wolf et al., “Self-supervised pre-training with contrastive and masked autoencoder m ethods for dealing with small datasets in deep learning for medical im aging,” Sci. Rep., vol. 13, no. 1, Nov. 2023, doi: 10.1038/s41598-023-46433-0.

V. Gorade, S. Mittal, and R. Singhal, “PaCL: Patient-aware contrastive learning through metadata refinement f or generalized early disease diagnosis,” Comput. Biol. Med., vol. 167, p. 107569, Dec. 2023, doi: 10.1016/j.compbiomed.2023.107569.

W.-C. Wang, E. Ahn, D. Feng, and J. Kim, “A Review of Predictive and Contrastive Self-supervised Learning for Me dical Images,” Mach. Intell. Res., vol. 20, no. 4, pp. 483–513, June 2023, doi: 10.1007/s11633-022-1406-4.

P.-X. Li, H.-P. Hsieh, Y. Fan-Chiang, D.-Y. Wu, and C.-C. Ko, “Enhancing Robust Liver Cancer Diagnosis: A Contrastive Multi-Modality Learner with Lightweight Fusion and Effective Data Augmentation,” ACM Trans. Comput. Healthc., vol. 5, no. 2, pp. 1–13, Apr. 2024, doi: 10.1145/3639414.

D. Painuli, S. Bhardwaj, and U. köse, “Recent advancement in cancer diagnosis using machine learning and deep learning techniques: A comprehensive review,” Comput. Biol. Med., vol. 146, p. 105580, July 2022, doi: 10.1016/j.compbiomed.2022.105580.

T. E. Tavolara, M. N. Gurcan, and M. K. K. Niazi, “Contrastive Multiple Instance Learning: An Unsupervised Framework for Learning Slide-Level Representations of Whole Slide Histopathology Ima ges without Labels,” Cancers, vol. 14, no. 23, p. 5778, Nov. 2022, doi: 10.3390/cancers14235778.

J. Guo, S. Lu, L. Jia, W. Zhang, and H. Li, “Encoder-Decoder Contrast for Unsupervised Anomaly Detection in Medical Images,” IEEE Trans. Med. Imaging, vol. 43, no. 3, pp. 1102–1112, Mar. 2024, doi: 10.1109/tmi.2023.3327720.

H. M. Rai and J. Yoo, “A comprehensive analysis of recent advancements in cancer detection us ing machine learning and deep learning models for improved diagnostics,” J. Cancer Res. Clin. Oncol., vol. 149, no. 15, pp. 14365–14408, Aug. 2023, doi: 10.1007/s00432-023-05216-w.

Y. Peng, L. He, D. Hu, Y. Liu, L. Yang, and S. Shang, “Decoupling Deep Learning for Enhanced Image Recognition Interpretabili ty,” ACM Trans. Multimed. Comput. Commun. Appl. Ions, vol. 20, no. 10, pp. 1–24, Oct. 2024, doi: 10.1145/3674837.

M. A. Choukali, M. C. Amirani, M. Valizadeh, A. Abbasi, and M. Komeili, “Pseudo-class part prototype networks for interpretable breast cancer c lassification,” Sci. Rep., vol. 14, no. 1, May 2024, doi: 10.1038/s41598-024-60743-x.

A. Ragno, B. L. Rosa, and R. Capobianco, “Prototype-Based Interpretable Graph Neural Networks,” IEEE Trans. Artif. Intell., vol. 5, no. 4, pp. 1486–1495, Apr. 2024, doi: 10.1109/tai.2022.3222618.

L. Gallée, C. S. Lisson, T. Ropinski, M. Beer, and M. Götz, “Proto-Caps: interpretable medical image classification using prototype learning and privileged information,” PeerJ Comput. Sci., vol. 11, p. e2908, May 2025, doi: 10.7717/peerj-cs.2908.

C. Wang, F. Liu, Y. Chen, H. Frazer, and G. Carneiro, “Cross- and Intra-Image Prototypical Learning for Multi-Label Disease D iagnosis and Interpretation,” IEEE Trans. Med. Imaging, vol. 44, no. 6, pp. 2568–2580, June 2025, doi: 10.1109/tmi.2025.3541830.

G. Xiong, S. Bekiranov, and A. Zhang, “ProtoCell4P: an explainable prototype-based neural network for patient classification using single-cell RNA-seq,” Bioinformatics, vol. 39, no. 8, Aug. 2023, doi: 10.1093/bioinformatics/btad493.

Q. Teng, Z. Liu, Y. Song, K. Han, and Y. Lu, “A survey on the interpretability of deep learning in medical diagnosis,” Multimed. Syst., vol. 28, no. 6, pp. 2335–2355, June 2022, doi: 10.1007/s00530-022-00960-4.

M. Champendal, H. Müller, J. O. Prior, and C. S. dos Reis, “A scoping review of interpretability and explainability concerning art ificial intelligence methods in medical imaging,” Eur. J. Radiol., vol. 169, p. 111159, Dec. 2023, doi: 10.1016/j.ejrad.2023.111159.

S. Ali, F. Akhlaq, A. S. Imran, Z. Kastrati, S. M. Daudpota, and M. Moosa, “The enlightening role of explainable artificial intelligence in medica l & healthcare domains: A systematic literature review,” Comput. Biol. Med., vol. 166, p. 107555, Nov. 2023, doi: 10.1016/j.compbiomed.2023.107555.

C. Papangelou, K. Kyriakidis, P. Natsiavas, I. Chouvarda, and A. Malousi, “Reliable machine learning models in genomic medicine using conformal p rediction,” medRxiv, Sept. 2024, doi: 10.1101/2024.09.09.24312995.

J. Vazquez and J. C. Facelli, “Conformal Prediction in Clinical Medical Sciences,” J. Healthc. Inform. Res., vol. 6, no. 3, pp. 241–252, Jan. 2022, doi: 10.1007/s41666-021-00113-8.

J. Fayyad, S. Alijani, and H. Najjaran, “Empirical Validation of Conformal Prediction for Trustworthy Skin Lesi ons Classification,” Comput Methods Programs Biomed, 2023, doi: 10.48550/ARXIV.2312.07460.

T. J. Loftus et al., “Uncertainty-aware deep learning in healthcare: A scoping review,” PLOS Digit. Health, vol. 1, no. 8, p. e0000085, Aug. 2022, doi: 10.1371/journal.pdig.0000085.

X. Zhou, B. Chen, Y. Gui, and L. Cheng, “Conformal Prediction: A Data Perspective,” ACM Comput. Surv., May 2025, doi: 10.1145/3736575.

G. Singh, G. Moncrieff, Z. Venter, K. Cawse-Nicholson, J. Slingsby, and T. B. Robinson, “Uncertainty quantification for probabilistic machine learning in earth observation using conformal prediction,” Sci. Rep., vol. 14, no. 1, July 2024, doi: 10.1038/s41598-024-65954-w.

K. Lenhof, L. Eckhart, L.-M. Rolli, A. Volkamer, and H.-P. Lenhof, “Reliable anti-cancer drug sensitivity prediction and prioritization,” Sci. Rep., vol. 14, no. 1, May 2024, doi: 10.1038/s41598-024-62956-6.

H. Olsson et al., “Estimating diagnostic uncertainty in artificial intelligence assisted pathology using conformal prediction,” Nat. Commun., vol. 13, no. 1, Dec. 2022, doi: 10.1038/s41467-022-34945-8.

M. Chua et al., “Tackling prediction uncertainty in machine learning for healthcare,” Nat. Biomed. Eng., vol. 7, no. 6, pp. 711–718, Dec. 2022, doi: 10.1038/s41551-022-00988-x.

B. Lambert, F. Forbes, A. Tucholka, S. Doyle, H. Dehaene, and M. Dojat, “Trustworthy clinical AI solutions: a unified review of uncertainty qua ntification in deep learning models for medical image analysis,” Artif Intell Med., 2022, doi: 10.48550/ARXIV.2210.03736.

K. Davoudi and P. Thulasiraman, “Evolving convolutional neural network parameters through the genetic algorithm for the breast cancer classification problem,” SIMULATION, vol. 97, no. 8, pp. 511–527, Mar. 2021, doi: 10.1177/0037549721996031.

M. M. Srikantamurthy, V. P. S. Rallabandi, D. B. Dudekula, S. Natarajan, and J. Park, “Classification of benign and malignant subtypes of breast cancer histo pathology imaging using hybrid CNN-LSTM based transfer learning,” BMC Med. Imaging, vol. 23, no. 1, Jan. 2023, doi: 10.1186/s12880-023-00964-0.

R. Das, U. B. Maulik, B. Boote, S. Sen, and S. Bhattacharya, “Multi-path Convolutional Neural Network to Identify Tumorous Sub-class es for Breast Tissue from Histopathological Images,” SN Comput. Sci., vol. 3, no. 5, July 2022, doi: 10.1007/s42979-022-01273-z.

M. Sepahvand and F. Abdali-Mohammadi, “Joint learning method with teacher–student knowledge distillation for on-device breast cancer image classification,” Comput. Biol. Med., vol. 155, p. 106476, Mar. 2023, doi: 10.1016/j.compbiomed.2022.106476.

X. Li, X. Shen, Y. Zhou, X. Wang, and T.-Q. Li, “Classification of breast cancer histopathological images using interleaved DenseNet with SENet (IDSNet),” PLOS ONE, vol. 15, no. 5, p. e0232127, May 2020, doi: 10.1371/journal.pone.0232127.

A. Ijaz et al., “Modality Specific CBAM-VGGNet Model for the Classification of Breast Histopathology Images via Transfer Learning,” IEEE Access, vol. 11, pp. 15750–15762, 2023, doi: 10.1109/access.2023.3245023.

D. Kaplun, A. Krasichkov, P. Chetyrbok, N. Oleinikov, A. Garg, and H. S. Pannu, “Cancer Cell Profiling Using Image Moments and Neural Networks with Model Agnostic Explainability: A Case Study of Breast Cancer Histopathological (BreakHis) Database,” Mathematics, vol. 9, no. 20, p. 2616, Oct. 2021, doi: 10.3390/math9202616.

W. Liu, M. Juhas, and Y. Zhang, “Fine-Grained Breast Cancer Classification With Bilinear Convolutional Neural Networks (BCNNs),” Front. Genet., vol. 11, Sept. 2020, doi: 10.3389/fgene.2020.547327.

A. M. Zaalouk, G. A. Ebrahim, H. K. Mohamed, H. M. Hassan, and M. M. A. Zaalouk, “A Deep Learning Computer-Aided Diagnosis Approach for Breast Cancer,” Bioengineering, vol. 9, no. 8, p. 391, Aug. 2022, doi: 10.3390/bioengineering9080391.

K. George, P. Sankaran, and P. J. K, “Computer assisted recognition of breast cancer in biopsy images via fusion of nucleus-guided deep convolutional features,” Comput. Methods Programs Biomed., vol. 194, p. 105531, Oct. 2020, doi: 10.1016/j.cmpb.2020.105531.

K. Das, S. Conjeti, J. Chatterjee, and D. Sheet, “Detection of Breast Cancer From Whole Slide Histopathological Images Using Deep Multiple Instance CNN,” IEEE Access, vol. 8, pp. 213502–213511, 2020, doi: 10.1109/access.2020.3040106.

A. M. Alhassan, “An improved breast cancer classification with hybrid chaotic sand cat and Remora Optimization feature selection algorithm,” PLOS ONE, vol. 19, no. 4, p. e0300622, Apr. 2024, doi: 10.1371/journal.pone.0300622.

A. Ashurov, S. A. Chelloug, A. Tselykh, M. S. A. Muthanna, A. Muthanna, and M. S. A. M. Al-Gaashani, “Improved Breast Cancer Classification through Combining Transfer Learn ing and Attention Mechanism,” Life, vol. 13, no. 9, p. 1945, Sept. 2023, doi: 10.3390/life13091945.