A Novel 3D Unsupervised Domain Adaptation Framework for Cross-Modality Medical Image Segmentation
We consider the problem of volumetric (3D) unsupervised domain adaptation (UDA) in cross-modality medical image segmentation, aiming to perform segmentation on the unannotated target domain (e.g. MRI) with the help of labeled source domain (e.g. CT). Previous UDA methods in medical image analysis usually suffer from two challenges: 1) they focus on processing and analyzing data at 2D level only, thus missing semantic information from the depth level; 2) one-to-one mapping is adopted during the style-transfer process, leading to insufficient alignment in the target domain. Different from the existing methods, in our work, we conduct a first of its kind investigation on multi-style image translation for complete image alignment to alleviate the domain shift problem, and also introduce 3D segmentation in domain adaptation tasks to maintain semantic consistency at the depth level. In particular, we develop an unsupervised domain adaptation framework incorporating a novel quartet self-attention module to efficiently enhance relationships between widely separated features in spatial regions on a higher dimension, leading to a substantial improvement in segmentation accuracy in the unlabeled target domain. In two challenging cross-modality tasks, specifically brain structures and multi-organ abdominal segmentation, our model is shown to outperform current state-of-the-art methods by a significant margin, demonstrating its potential as a benchmark resource for the biomedical and health informatics research community.
MATR Multimodal Medical Image Fusion via Multiscale Adaptive Transformer
Owing to the limitations of imaging sensors, it is challenging to obtain a medical image that simultaneously contains functional metabolic information and structural tissue details. Multimodal medical image fusion, an effective way to merge the complementary information in different modalities, has become a significant technique to facilitate clinical diagnosis and surgical navigation. With powerful feature representation ability, deep learning (DL)-based methods have improved such fusion results but still have not achieved satisfactory performance. Specifically, existing DL-based methods generally depend on convolutional operations, which can well extract local patterns but have limited capability in preserving global context information. To compensate for this defect and achieve accurate fusion, we propose a novel unsupervised method to fuse multimodal medical images via a multiscale adaptive Transformer termed MATR. In the proposed method, instead of directly employing vanilla convolution, we introduce an adaptive convolution for adaptively modulating the convolutional kernel based on the global complementary context. To further model long-range dependencies, an adaptive Transformer is employed to enhance the global semantic extraction capability. Our network architecture is designed in a multiscale fashion so that useful multimodal information can be adequately acquired from the perspective of different scales. Moreover, an objective function composed of a structural loss and a region mutual information loss is devised to construct constraints for information preservation at both the structural-level and the feature-level. Extensive experiments on a mainstream database demonstrate that the proposed method outperforms other representative and state-of-the-art methods in terms of both visual quality and quantitative evaluation. We also extend the proposed method to address other biomedical image fusion issues, and the pleasing fusion results illustrate that MATR has good generalization capability. The code of the proposed method is available at https://github.com/tthinking/MATR.
Hybrid cross-modality fusion network for medical image segmentation with contrastive learning
Medical image segmentation has been widely adopted in artificial intelligence-based clinical applications. The integration of medical texts into image segmentation models has significantly improved the segmentation performance. It is crucial to design an effective fusion manner to integrate the paired image and text features. Existing multi-modal medical image segmentation methods fuse the paired image and text features through a non-local attention mechanism, which lacks local interaction. Besides, they lack a mechanism to enhance the relevance of the paired features and keep the discriminability of unpaired features in the training process, which limits the segmentation performance. To solve the above problem, we propose a hybrid cross-modality fusion network (HCFNet) based on contrastive learning for medical image segmentation. The key designs of our proposed method are a multi-stage cross-modality contrastive loss and a hybrid cross-modality feature decoder. The multi-stage cross-modality contrastive loss is utilized to enhance the discriminability of the paired features and separate the unpaired features. Furthermore, the hybrid cross-modality feature decoder conducts local and non-local cross-modality feature interaction by a local cross-modality fusion module and a non-local cross-modality fusion module, respectively. Experimental results show that our method achieved state-of-the-art results on two public medical image segmentation datasets.
Cross-Modality Interaction Network for Medical Image Fusion
Multi-modal medical image fusion maximizes the complementary information from diverse modality images by integrating source images. The fused medical image could offer enhanced richness and improved accuracy compared to the source images. Unfortunately, the existing deep learning-based medical image fusion methods generally rely on convolutional operations, which may not effectively capture global information such as spatial relationships or shape features within and across image modalities. To address this problem, we propose a unified AI-Generated Content (AIGC)-based medical image fusion, termed Cross-Modal Interactive Network (CMINet). The CMINet integrates a recursive transformer with an interactive Convolutional Neural Network. Specifically, the recursive transformer is designed to capture extended spatial and temporal dependencies within modalities, while the interactive CNN aims to extract and merge local features across modalities. Benefiting from cross-modality interaction learning, the proposed method can generate fused images with rich structural and functional information. Additionally, the architecture of the recursive network is structured to reduce parameter count, which could be beneficial for deployment on resource-constrained devices. Comprehensive experiments on multi-model medical images (MRI and CT, MRI and PET, and MRI and SPECT) demonstrate that the proposed method outperforms the state-ofthe-art fusion methods subjectively and objectively.
A cascaded framework with cross-modality transfer learning for whole heart segmentation
Automatic and accurate segmentation of the whole heart structure from 3D cardiac images plays an important role in helping physicians diagnose and treat cardiovascular disease. However, the time-consuming and laborious manual labeling of the heart images results in the inefficiency of utilizing the existing CT or MRI for training the deep learning network, which decrease the accuracy of whole heart segmentation. However, multi-modality data contains multi-level information of cardiac images due to different imaging mechanisms, which is beneficial to improve the segmentation accuracy. Therefore, this paper proposes a cascaded framework with cross-modality transfer learning for whole heart segmentation (CM-TranCaF), which consists of three key modules: modality transfer network (MTN), U-shaped multi-attention network (MAUNet) and spatial configuration network (SCN). In MTN, MRI images are transferred from MRI domain to CT domain, to increase the data volume by adopting the idea of adversarial training. The MAUNet is designed based on UNet, while the attention gates (AGs) are integrated into the skip connection to reduce the weight of background pixels. Moreover, to solve the problem of boundary blur, the position attention block (PAB) is also integrated into the bottom layer to aggregate similar features. Finally, the SCN is used to finetune the segmentation results by utilizing the anatomical information between different cardiac substructures. By evaluating the proposed method on the dataset of the MM-WHS challenge, CM-TranCaF achieves a Dice score of 91.1% on the testing dataset. The extensive experimental results prove the effectiveness of the proposed method compared to other state-of-the-art methods.
Diff-IF Multi-modality image fusion via diffusion model with fusion knowledge prior
武汉大学
Brain tumor segmentation based on the dual-path network of multi-modal MRI images
Because of the tumor with infiltrative growth, the glioma boundary is usually fused with the brain tissue, which leads to the failure of accurately segmenting the brain tumor structure through single-modal images. The multi-modal ones are relatively complemented to the inherent heterogeneity and external boundary, which provide complementary features and outlines. Besides, it can retain the structural characteristics of brain diseases from multi angles. However, due to the particularity of multi-modal medical image sampling that increases uneven data density and dense structural vascular tumor mitosis, the glioma may have atypical boundary fuzzy and more noise. To solve this problem, in this paper, the dualpath network based on multi-modal feature fusion (MFF-DNet) is proposed. Firstly, the proposed network uses different kernels multiplexing methods to realize the combination of the large-scale perceptual domain and the non-linear mapping features, which effectively enhances the coherence of information flow. Then, the over-lapping frequency and the vanishing gradient phenomenon are reduced by the residual connection and the dense connection, which alleviate the mutual influence of multi-modal channels. Finally, a dual-path model based on the DenseNet network and the feature pyramid networks (FPN) is established to realize the fusion of low-level, middle-level, and high-level features. Besides, it increases the diversification of glioma non-linear structural features and improves the segmentation precision. A large number of ablation experiments show the effectiveness of the proposed model. The precision of the whole brain tumor and the core tumor can reach 0.92 and 0.90, respectively.
FPL+ Filtered Pseudo Label-Based Unsupervised Cross-Modality Adaptation for 3D Medical Image Segmentation
Adapting a medical image segmentation model to a new domain is important for improving its cross-domain transferability, and due to the expensive annotation process, Unsupervised Domain Adaptation (UDA) is appeal- ing where only unlabeled images are needed for the adaptation. Existing UDA methods are mainly based on image or feature alignment with adversarial training for regularization, and they are limited by insufficient supervision in the target domain. In this paper, we propose an enhanced Filtered Pseudo Label (FPL+)-based UDA method for 3D medical image segmentation. It first uses cross-domain data augmentation to translate labeled images in the source domain to a dual-domain training set consisting of a pseudo source-domain set andapseudo target-domain set. To leverage the dual-domain augmented images to train a pseudo label generator, domain-specific batch normalization layers are used to deal with the domain shift while learning the domain-invariant structure features, generating high-quality pseudo labels for target-domain images. We then combine labeled source-domain images and target-domain images with pseudo labels to train a final segmentor, where image-level weighting based on uncertainty estimation and pixel-level weighting based on dual-domain consensus are proposed to mitigate the adverse effect of noisy pseudo labels. Experiments on three public multi-modal datasets for Vestibular Schwannoma, brain tumor and whole heart segmentation show that our method surpassed ten state-of-the-art UDA methods, and it even achieved better results than fully supervised learning in the target domain in some cases.

目录