Integrative Deep Learning Architecture for High-Accuracy Medical Image Segmentation: Combining U-Net, ResNet, and Transformers

Devi Zakiyatus Sholekhah; Dian  Noviar

doi:10.51903/jtie.v4i1.288

Authors

Devi Zakiyatus Sholekhah Universitas Islam Negeri Sunan Kalijaga
Dian Noviar Universitas Islam Negeri (UIN) Sunan Kalijaga https://orcid.org/0009-0001-2154-4010

DOI:

https://doi.org/10.51903/jtie.v4i1.288

Keywords:

Medical Image Segmentation, Deep Learning, Hybrid Model

Abstract

Medical image segmentation plays a vital role in diagnosis and treatment planning by extracting clinically relevant information from imaging data. Conventional methods often struggle with variations in anatomical structure and imaging quality, leading to suboptimal segmentation. Recent advancements in Deep Learning, particularly Convolutional Neural Networks (CNNs) and Transformers, have improved segmentation accuracy; however, individual models such as U-Net, ResNet, and Transformer still face limitations in preserving spatial details, extracting deep features, and modeling long-range dependencies. This study proposes a hybrid Deep Learning model that integrates U-Net, ResNet, and Transformer to overcome these challenges and enhance segmentation performance. The proposed hybrid model was evaluated on several publicly available datasets, including BraTS, ISIC, and DRIVE, using Dice Similarity Coefficient (DSC) and Intersection over Union (IoU) as performance metrics. Experimental results indicate that the hybrid model achieved a DSC of 0.92 and an IoU of 0.86, outperforming U-Net (DSC: 0.82, IoU: 0.75), ResNet (DSC: 0.85, IoU: 0.78), and Transformer (DSC: 0.88, IoU: 0.80). Additionally, the model maintained an inference time of 55 ms per image, demonstrating its potential for real-time applications. This study highlights the benefits of combining CNN-based and Transformer-based architectures to capture both local details and global context, providing an effective and efficient solution for medical image segmentation.

References

Aboussaleh, I., Riffi, J., Fazazy, K. El, Mahraz, M. A., & Tairi, H. (2023). Efficient U-Net Architecture with Multiple Encoders and Attention Mechanism Decoders for Brain Tumor Segmentation. Diagnostics, 13(5), 872. https://doi.org/10.3390/diagnostics13050872

Arkin, E., Yadikar, N., Xu, X., Aysa, A., & Ubul, K. (2023). A Survey: Object Detection Methods from CNN to Transformer. Multimedia Tools and Applications, 82(14), 21353–21383. https://doi.org/10.1007/s11042-022-13801-3

Athisayamani, S., Antonyswamy, R. S., Sarveshwaran, V., Almeshari, M., Alzamil, Y., & Ravi, V. (2023). Feature Extraction Using a Residual Deep Convolutional Neural Network (ResNet-152) and Optimized Feature Dimension Reduction for MRI Brain Tumor Classification. Diagnostics, 13(4), 668. https://doi.org/10.3390/diagnostics13040668

Carles, M., Kuhn, D., Fechter, T., Baltas, D., Mix, M., Nestle, U., Grosu, A. L., Martí-Bonmatí, L., Radicioni, G., & Gkika, E. (2024). Development and Evaluation of Two Open-Source nnU-Net Models for Automatic Segmentation of Lung Tumors on PET and CT Images with and Without Respiratory Motion Compensation. European Radiology, 34(10), 6701–6711. https://doi.org/10.1007/s00330-024-10751-2

Chen, X., Li, D., Liu, M., & Jia, J. (2023). CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation. Remote Sensing, 15(18), 4455. https://doi.org/10.3390/rs15184455

Ebert, N., Stricker, D., & Wasenmüller, O. (2023). PLG-ViT: Vision Transformer with Parallel Local and Global Self-Attention. Sensors, 23(7), 1–22. https://doi.org/10.3390/s23073447

Eum, I., Kim, J., Wang, S., & Kim, J. (2025). Heavy Equipment Detection on Construction Sites Using You Only Look Once (YOLO-Version 10) with Transformer Architectures. Applied Sciences, 15(5), 2320. https://doi.org/10.3390/app15052320

Han, N., Zhou, L., Xie, Z., Zheng, J., & Zhang, L. (2022). Multi-Level U-Net Network for Image Super-Resolution Reconstruction. Displays, 73, 102192. https://doi.org/10.1016/j.displa.2022.102192

Ji, Z., Mu, J., Liu, J., Zhang, H., Dai, C., Zhang, X., & Ganchev, I. (2024). ASD-Net: A Novel U-Net Based Asymmetric Spatial-Channel Convolution Network for Precise Kidney and Kidney Tumor Image Segmentation. Medical and Biological Engineering and Computing, 62(6), 1673–1687. https://doi.org/10.1007/s11517-024-03025-y

Jiang, Y. ;, Liang, J. ;, Cheng, T. ;, Lin, X. ;, Zhang, Y. ;, Dong, J., Jiang, Y., Liang, J., Cheng, T., Lin, X., Zhang, Y., & Dong, J. (2022). MTPA_Unet: Multi-Scale Transformer-Position Attention Retinal Vessel Segmentation Network Joint Transformer and CNN. Sensors, 22(12), 4592. https://doi.org/10.3390/s22124592

Maurício, J., Domingues, I., & Bernardino, J. (2023). Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review. Applied Sciences (Switzerland), 13(9), 5521. https://doi.org/10.3390/app13095521

Melyani, M., Prasetyo, T. F., Rahadjeng, I. R., Mufid, Z., Rafik, A., Shaura, R. K., Daniel, D., & Emita, I. (2024). Design Framework of Expert System Program in Otolaryngology Disease Diagnosis use Extreme Programming (XP)Method(Case Study in THB Bekasi Hospital). Journal of Technology Informatics and Engineering, 3(3), 397–416. https://doi.org/10.51903/jtie.v3i3.209

Mohapatra, R. K., Jolly, L., Lyngdoh, D. C., Mourya, G. K., Changaai Mangalote, I. A., Alam, S. I., & Dakua, S. P. (2024). A Comprehensive Survey to Study the Utilities of Image Segmentation Methods in Clinical Routine. Network Modeling Analysis in Health Informatics and Bioinformatics, 13(1), 1–26. https://doi.org/10.1007/s13721-023-00436-z

Obuchowicz, R., Strzelecki, M., & Piórkowski, A. (2024). Clinical Applications of Artificial Intelligence in Medical Imaging and Image Processing—A Review. Cancers, 16(10), 1–16. https://doi.org/10.3390/cancers16101870

Pan, S., Liu, X., Xie, N., & Chong, Y. (2023). EG-TransUNet: A Transformer-Based U-Net With Enhanced and Guided Models for Biomedical Image Segmentation. BMC Bioinformatics, 24(1), 1–22. https://doi.org/10.1186/s12859-023-05196-1

Priyadi, P., Migunani, M., & Sasmoko, D. (2024). Enhancing Big Data Processing Efficiency in AI-Based Healthcare Systems: A Comparative Analysis of Random Forest and Deep. Journal of Technology Informatics and Engineering, 3(3), 263–278. https://doi.org/10.51903/jtie.v3i3.205

Pu, Q., Xi, Z., Yin, S., Zhao, Z., & Zhao, L. (2024). Advantages of Transformer and its Application for Medical Image Segmentation: A Survey. BioMedical Engineering Online, 23(1), 1–22. https://doi.org/10.1186/s12938-024-01212-4

Punn, N. S., & Agarwal, S. (2022). Modality Specific U-Net Variants for Biomedical Image Segmentation: A Survey. In Artificial Intelligence Review (Vol. 55, Issue 7). Springer Netherlands. https://doi.org/10.1007/s10462-022-10152-1

Rayed, M. E., Islam, S. M. S., Niha, S. I., Jim, J. R., Kabir, M. M., & Mridha, M. F. (2024). Deep Learning for Medical Image Segmentation: State-of-the-Art Advancements and Challenges. Informatics in Medicine Unlocked, 47, 101504. https://doi.org/10.1016/j.imu.2024.101504

Shi, P., Duan, M., Yang, L., Feng, W., Ding, L., & Jiang, L. (2022). An Improved U-Net Image Segmentation Method and Its Application for Metallic Grain Size Statistics. Materials, 15(13), 4417. https://doi.org/10.3390/ma15134417

Wang, H., Chen, X., Zhang, T., Xu, Z., & Li, J. (2022). CCTNet: Coupled CNN and Transformer Network for Crop Segmentation of Remote Sensing Images. Remote Sensing, 14(9), 1–20. https://doi.org/10.3390/rs14091956

Wu, W., Huo, L., Yang, G., Liu, X., & Li, H. (2025). Research into the Application of ResNet in Soil: A Review. Agriculture, 15(6), 661. https://doi.org/10.3390/agriculture15060661

Xiao, H., Li, L., Liu, Q., Zhu, X., & Zhang, Q. (2023). Transformers in Medical Image Segmentation: A Review. Biomedical Signal Processing and Control, 84, 104791. https://doi.org/10.1016/j.bspc.2023.104791

Xu, Y., Quan, R., Xu, W., Huang, Y., Chen, X., & Liu, F. (2024). Advances in Medical Image Segmentation: A Comprehensive Review of Traditional, Deep Learning and Hybrid Approaches. Bioengineering, 11(10), 1034. https://doi.org/10.3390/bioengineering11101034

Yang, F., & Wang, B. (2024). Dual Channel‐Spatial Self‐Attention Transformer and CNN Synergy Network for 3D Medical Image Segmentation. Applied Soft Computing, 167, 112255. https://doi.org/10.1016/j.asoc.2024.112255

Yousef, R., Khan, S., Gupta, G., Siddiqui, T., Albahlal, B. M., Alajlan, S. A., & Haq, M. A. (2023). U-Net-Based Models towards Optimal MR Brain Image Segmentation. Diagnostics, 13(9), 1624. https://doi.org/10.3390/diagnostics13091624

Zhang, C., Deng, X., & Ling, S. H. (2024). Next-Gen Medical Imaging: U-Net Evolution and the Rise of Transformers. Sensors 2024, Vol. 24, Page 4668, 24(14), 4668. https://doi.org/10.3390/s24144668

Zhang, J., Qin, Q., Ye, Q., & Ruan, T. (2023). ST-Unet: Swin Transformer Boosted U-Net With Cross-Layer Feature Enhancement for Medical Image Segmentation. Computers in Biology and Medicine, 153, 106516. https://doi.org/10.1016/j.compbiomed.2022.106516