A Transformer and LLSKA-Based U-Shaped Network for Medical Image Segmentation — T-LLSKA UNet
Lao Tei*, Deng Zengjie, Gui Hao, Li Guanxi and Peng Lei
November 26, 2025
Abstract
With the rapid advancement of artificial intelligence, particularly the breakthrough progress of deep learning technologies, the field of medical image segmentation has achieved remarkable improvements in both accuracy and efficiency. These technological advancements have greatly fostered innovation in modern healthcare systems, playing a crucial role in enhancing diagnostic precision and treatment efficiency. This paper focuses on addressing the challenge in medical image segmentation decoders, which often struggle to balance global contextual understanding with local detail representation. To overcome this limitation, an improved Large Kernel Separable Attention (LSKA) module is proposed. By analyzing the quadratic parameter growth problem that occurs when the number of feature channels increases in LSKA, two enhanced modules are designed: the Sparse Full-channel LSKA (LSKA-SF) and the Local-channel Large Separable Kernel Attention (LLSKA). LSKA-SF introduces sparsity through group convolution to effectively reduce parameter complexity, while LLSKA enhances feature representation via local cross-channel interactions, successfully mitigating the quadratic parameter growth trend. Based on these improvements, this study constructs a hybrid U-shaped segmentation network named Transformer-LLSKA UNet, which integrates a Transformer encoder and an LLSKA-based decoder to efficiently model both global contextual information and local spatial details. Experimental results demonstrate that Transformer-LLSKA UNet achieves outstanding segmentation performance on multiple organ datasets, including the aorta, liver, and spleen. Specifically, it achieves an average Dice coefficient of 83.43% and an HD95 of 15.15, indicating significant improvements in segmentation accuracy and boundary precision. These results validate the superior generalization ability and practical value of the proposed model, highlighting its potential to advance intelligent medical image analysis and clinical decision-making applications.
Keywords: LSKA; LLSKA; Transformer; UNet
References
- Ronneberger O, Fischer P and Brox T. “U-net: Convolutional networks for biomedical image segmentation”. Proceedings of the Medical image computing and computer-assisted intervention-MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 (2015).
- Cao H., et al. “Swin-unet: Unet-like pure transformer for medical image segmentation”. Proceedings of the European conference on computer vision (2022).
- Chen J., et al. “Transunet: Transformers make strong encoders for medical image segmentation”. arXiv preprint arXiv:210204306 (2021).
- Dosovitskiy A., et al. “An image is worth 16x16 words: Transformers for image recognition at scale”. arXiv preprint arXiv:201011929 (2020).
- Liu Z., et al. “Swin transformer: Hierarchical vision transformer using shifted windows”. Proceedings of the IEEE/CVF international conference on computer vision (2021).
- Guo M-H., et al. “Visual attention network”. Computational visual media 9.4 (2023): 733-752.
- Lau KW, Po L-M and Rehman YAU. “Large separable kernel attention: Rethinking the large kernel attention design in cnn”. Expert Systems with Applications 236 (2024): 121352.
- Tu Z., et al. “Maxvit: Multi-axis vision transformer”. Proceedings of the European conference on computer vision (2022).
- Cai Z and Shen Q. “Falconnet: Factorization for the light-weight convnets”. Proceedings of the International Conference on Neural Information Processing (2023).
- Miccai. Multi-Atlas Abdomen Labeling Challenge: Synapse multi-organ segmentation dataset 52 (2015).
- Shaker AM., et al. “UNETR++: delving into efficient and accurate 3D medical image segmentation”. IEEE Transactions on Medical Imaging (2024).
- Codella NC., et al. “Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic)”. Proceedings of the 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018) (2018).
- Codella N., et al. “Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic)”. arXiv preprint arXiv:190203368 (2019).
- Aghdam EK., et al. “Attention swin u-net: Cross-contextual attention mechanism for skin lesion segmentation”. Proceedings of the 2023 IEEE 20th international symposium on biomedical imaging (ISBI) (2023).
- Asadi-Aghbolaghi M., et al. “Multi-level context gating of embedded collective knowledge for medical image segmentation”. arXiv preprint arXiv:200305056 (2020).
- Azad R., et al. “Bi-directional ConvLSTM U-Net with densley connected convolutions”. Proceedings of the IEEE/CVF international conference on computer vision workshops (2019).
- Eskandari S and Lumpp J. “Inter-scale dependency modeling for skin lesion segmentation with transformer-based networks”. arXiv preprint arXiv:231013727 (2023).
- Xu G., et al. “Levit-unet: Make faster encoders with transformer for medical image segmentation”. Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV) (2023).
- Huang X., et al. “Missformer: An effective transformer for 2d medical image segmentation”. IEEE transactions on medical imaging 42.5 (2022): 1484-1494.
- Huang H., et al. “ScaleFormer: revisiting the transformer‐based backbones from a scale‐wise perspective for medical image segmentation”. arXiv preprint, ar-Xiv: 220714552 (2022).
- Heidari M., et al. “Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation”. Proceedings of the IEEE/CVF winter conference on applications of computer vision (2023).
- Azad R., et al. “Dae-former: Dual attention-guided efficient transformer for medical image segmentation”. Proceedings of the International workshop on predictive intelligence in medicine (2023).
- Azad R., et al. “Transdeeplab: Convolution-free transformer-based deeplab v3+ for medical image segmentation”. Proceedings of the International Workshop on PRedictive Intelligence in MEdicine (2022).
- Rahman MM and Marculescu R. “Medical image segmentation via cascaded attention decoding”. Proceedings of the IEEE/CVF winter conference on applications of computer vision (2023).