Integrative Deep Learning Architecture for High-Accuracy Medical Image Segmentation: Combining U-Net, ResNet, and Transformers
DOI:
https://doi.org/10.51903/jtie.v4i1.288Keywords:
Medical Image Segmentation, Deep Learning, Hybrid ModelAbstract
Medical image segmentation plays a vital role in diagnosis and treatment planning by extracting clinically relevant information from imaging data. Conventional methods often struggle with variations in anatomical structure and imaging quality, leading to suboptimal segmentation. Recent advancements in Deep Learning, particularly Convolutional Neural Networks (CNNs) and Transformers, have improved segmentation accuracy; however, individual models such as U-Net, ResNet, and Transformer still face limitations in preserving spatial details, extracting deep features, and modeling long-range dependencies. This study proposes a hybrid Deep Learning model that integrates U-Net, ResNet, and Transformer to overcome these challenges and enhance segmentation performance. The proposed hybrid model was evaluated on several publicly available datasets, including BraTS, ISIC, and DRIVE, using Dice Similarity Coefficient (DSC) and Intersection over Union (IoU) as performance metrics. Experimental results indicate that the hybrid model achieved a DSC of 0.92 and an IoU of 0.86, outperforming U-Net (DSC: 0.82, IoU: 0.75), ResNet (DSC: 0.85, IoU: 0.78), and Transformer (DSC: 0.88, IoU: 0.80). Additionally, the model maintained an inference time of 55 ms per image, demonstrating its potential for real-time applications. This study highlights the benefits of combining CNN-based and Transformer-based architectures to capture both local details and global context, providing an effective and efficient solution for medical image segmentation.
References
Aboussaleh, I., Riffi, J., Fazazy, K. El, Mahraz, M. A., & Tairi, H. (2023). Efficient U-Net Architecture with Multiple Encoders and Attention Mechanism Decoders for Brain Tumor Segmentation. Diagnostics, 13(5), 872. https://doi.org/10.3390/diagnostics13050872
Arkin, E., Yadikar, N., Xu, X., Aysa, A., & Ubul, K. (2023). A Survey: Object Detection Methods from CNN to Transformer. Multimedia Tools and Applications, 82(14), 21353–21383. https://doi.org/10.1007/s11042-022-13801-3
Athisayamani, S., Antonyswamy, R. S., Sarveshwaran, V., Almeshari, M., Alzamil, Y., & Ravi, V. (2023). Feature Extraction Using a Residual Deep Convolutional Neural Network (ResNet-152) and Optimized Feature Dimension Reduction for MRI Brain Tumor Classification. Diagnostics, 13(4), 668. https://doi.org/10.3390/diagnostics13040668
Carles, M., Kuhn, D., Fechter, T., Baltas, D., Mix, M., Nestle, U., Grosu, A. L., Martí-Bonmatí, L., Radicioni, G., & Gkika, E. (2024). Development and Evaluation of Two Open-Source nnU-Net Models for Automatic Segmentation of Lung Tumors on PET and CT Images with and Without Respiratory Motion Compensation. European Radiology, 34(10), 6701–6711. https://doi.org/10.1007/s00330-024-10751-2
Chen, X., Li, D., Liu, M., & Jia, J. (2023). CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation. Remote Sensing, 15(18), 4455. https://doi.org/10.3390/rs15184455
Ebert, N., Stricker, D., & Wasenmüller, O. (2023). PLG-ViT: Vision Transformer with Parallel Local and Global Self-Attention. Sensors, 23(7), 1–22. https://doi.org/10.3390/s23073447
Eum, I., Kim, J., Wang, S., & Kim, J. (2025). Heavy Equipment Detection on Construction Sites Using You Only Look Once (YOLO-Version 10) with Transformer Architectures. Applied Sciences, 15(5), 2320. https://doi.org/10.3390/app15052320
Han, N., Zhou, L., Xie, Z., Zheng, J., & Zhang, L. (2022). Multi-Level U-Net Network for Image Super-Resolution Reconstruction. Displays, 73, 102192. https://doi.org/10.1016/j.displa.2022.102192
Ji, Z., Mu, J., Liu, J., Zhang, H., Dai, C., Zhang, X., & Ganchev, I. (2024). ASD-Net: A Novel U-Net Based Asymmetric Spatial-Channel Convolution Network for Precise Kidney and Kidney Tumor Image Segmentation. Medical and Biological Engineering and Computing, 62(6), 1673–1687. https://doi.org/10.1007/s11517-024-03025-y
Jiang, Y. ;, Liang, J. ;, Cheng, T. ;, Lin, X. ;, Zhang, Y. ;, Dong, J., Jiang, Y., Liang, J., Cheng, T., Lin, X., Zhang, Y., & Dong, J. (2022). MTPA_Unet: Multi-Scale Transformer-Position Attention Retinal Vessel Segmentation Network Joint Transformer and CNN. Sensors, 22(12), 4592. https://doi.org/10.3390/s22124592
Maurício, J., Domingues, I., & Bernardino, J. (2023). Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review. Applied Sciences (Switzerland), 13(9), 5521. https://doi.org/10.3390/app13095521
Melyani, M., Prasetyo, T. F., Rahadjeng, I. R., Mufid, Z., Rafik, A., Shaura, R. K., Daniel, D., & Emita, I. (2024). Design Framework of Expert System Program in Otolaryngology Disease Diagnosis use Extreme Programming (XP)Method(Case Study in THB Bekasi Hospital). Journal of Technology Informatics and Engineering, 3(3), 397–416. https://doi.org/10.51903/jtie.v3i3.209
Mohapatra, R. K., Jolly, L., Lyngdoh, D. C., Mourya, G. K., Changaai Mangalote, I. A., Alam, S. I., & Dakua, S. P. (2024). A Comprehensive Survey to Study the Utilities of Image Segmentation Methods in Clinical Routine. Network Modeling Analysis in Health Informatics and Bioinformatics, 13(1), 1–26. https://doi.org/10.1007/s13721-023-00436-z
Obuchowicz, R., Strzelecki, M., & Piórkowski, A. (2024). Clinical Applications of Artificial Intelligence in Medical Imaging and Image Processing—A Review. Cancers, 16(10), 1–16. https://doi.org/10.3390/cancers16101870
Pan, S., Liu, X., Xie, N., & Chong, Y. (2023). EG-TransUNet: A Transformer-Based U-Net With Enhanced and Guided Models for Biomedical Image Segmentation. BMC Bioinformatics, 24(1), 1–22. https://doi.org/10.1186/s12859-023-05196-1
Priyadi, P., Migunani, M., & Sasmoko, D. (2024). Enhancing Big Data Processing Efficiency in AI-Based Healthcare Systems: A Comparative Analysis of Random Forest and Deep. Journal of Technology Informatics and Engineering, 3(3), 263–278. https://doi.org/10.51903/jtie.v3i3.205
Pu, Q., Xi, Z., Yin, S., Zhao, Z., & Zhao, L. (2024). Advantages of Transformer and its Application for Medical Image Segmentation: A Survey. BioMedical Engineering Online, 23(1), 1–22. https://doi.org/10.1186/s12938-024-01212-4
Punn, N. S., & Agarwal, S. (2022). Modality Specific U-Net Variants for Biomedical Image Segmentation: A Survey. In Artificial Intelligence Review (Vol. 55, Issue 7). Springer Netherlands. https://doi.org/10.1007/s10462-022-10152-1
Rayed, M. E., Islam, S. M. S., Niha, S. I., Jim, J. R., Kabir, M. M., & Mridha, M. F. (2024). Deep Learning for Medical Image Segmentation: State-of-the-Art Advancements and Challenges. Informatics in Medicine Unlocked, 47, 101504. https://doi.org/10.1016/j.imu.2024.101504
Shi, P., Duan, M., Yang, L., Feng, W., Ding, L., & Jiang, L. (2022). An Improved U-Net Image Segmentation Method and Its Application for Metallic Grain Size Statistics. Materials, 15(13), 4417. https://doi.org/10.3390/ma15134417
Wang, H., Chen, X., Zhang, T., Xu, Z., & Li, J. (2022). CCTNet: Coupled CNN and Transformer Network for Crop Segmentation of Remote Sensing Images. Remote Sensing, 14(9), 1–20. https://doi.org/10.3390/rs14091956
Wu, W., Huo, L., Yang, G., Liu, X., & Li, H. (2025). Research into the Application of ResNet in Soil: A Review. Agriculture, 15(6), 661. https://doi.org/10.3390/agriculture15060661
Xiao, H., Li, L., Liu, Q., Zhu, X., & Zhang, Q. (2023). Transformers in Medical Image Segmentation: A Review. Biomedical Signal Processing and Control, 84, 104791. https://doi.org/10.1016/j.bspc.2023.104791
Xu, Y., Quan, R., Xu, W., Huang, Y., Chen, X., & Liu, F. (2024). Advances in Medical Image Segmentation: A Comprehensive Review of Traditional, Deep Learning and Hybrid Approaches. Bioengineering, 11(10), 1034. https://doi.org/10.3390/bioengineering11101034
Yang, F., & Wang, B. (2024). Dual Channel‐Spatial Self‐Attention Transformer and CNN Synergy Network for 3D Medical Image Segmentation. Applied Soft Computing, 167, 112255. https://doi.org/10.1016/j.asoc.2024.112255
Yousef, R., Khan, S., Gupta, G., Siddiqui, T., Albahlal, B. M., Alajlan, S. A., & Haq, M. A. (2023). U-Net-Based Models towards Optimal MR Brain Image Segmentation. Diagnostics, 13(9), 1624. https://doi.org/10.3390/diagnostics13091624
Zhang, C., Deng, X., & Ling, S. H. (2024). Next-Gen Medical Imaging: U-Net Evolution and the Rise of Transformers. Sensors 2024, Vol. 24, Page 4668, 24(14), 4668. https://doi.org/10.3390/s24144668
Zhang, J., Qin, Q., Ye, Q., & Ruan, T. (2023). ST-Unet: Swin Transformer Boosted U-Net With Cross-Layer Feature Enhancement for Medical Image Segmentation. Computers in Biology and Medicine, 153, 106516. https://doi.org/10.1016/j.compbiomed.2022.106516
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Technology Informatics and Engineering

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

