Cross-Domain Few-Shot Semantic Segmentation via Doubly Matching Transformation
Nanjing University of Aeronautics and Astronautics 、State Key Laboratory of Integrated Services Networks, Xidian University
Prompt-and-Transfer Dynamic Class-Aware Enhancement for Few-Shot Segmentation
Snipaste_2025-03-05_19-30-17
Prompting Multi-Modal Image Segmentation with Semantic Grouping
Multi-modal image segmentation is one of the core issues in computer vision. The main challenge lies in integrating common information between modalities while retaining specific patterns for each modality. Existing methods typically perform full fine-tuning on RGB-based pre-trained parameters to inherit the powerful representation of the foundation model. Although effective, such paradigm is not optimal due to weak transferability and scarce downstream data. Inspired by the recent success of prompt learning in language models, we propose the Grouping Prompt Tuning Framework(GoPT), which introduces explicit semantic grouping to learn modal-related prompts, adapting the frozen pre-trained foundation model to various downstream multi-modal segmentation tasks. Specifically, a class-aware uni-modal prompter is designed to balance intra- and inter-modal semantic propaga-
tion by grouping modality-specific class tokens, thereby improving the adaptability of spatial information. Furthermore,
an alignment-induced cross-modal prompter is introduced to aggregate class-aware representations and share prompt parameters among different modalities to assist in modeling common statistics. Extensive experiments show the superiority of our GoPT, which achieves SOTA performance on various downstream multi-modal image segmentation tasks by training only < 1% model parameters.
Disentangle then Parse Night-time Semantic Segmentation with Illumination Disentanglement
University of Science and Technology of China Shanghai AI Laboratory
SED A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
研究背景: 传统的方法只能分割训练集的种类,不能识别出来在训练集中没有的未知场景,同时两阶段和单阶段的方法都存在不足。两阶段的框架存在不足:计算效率低,没有充分利用上下文信息;单阶段的框架存在不足:对于低分辨率的输入,主干网络对空间信息变得不敏感,即使加入额外的网络来提供空间信息,也会增加计算资源,分割种类的增加也会增加计算资源。
High Quality Segmentation for Ultra High-resolution Images
**摘要:**To segment 4K or 6K ultra high-resolution images needs extra computation consideration in image segmentation. Common strategies, such as down-sampling, patch crop- ping, and cascade model, cannot address well the balance issue between accuracy and computation cost. Motivated by the fact that humans distinguish among objects continu- ously from coarse to precise levels, we propose the Contin- uous Refinement Model (CRM) for the ultra high-resolution segmentation refinement task. CRM continuously aligns the feature map with the refinement target and aggregates fea- tures to reconstruct these image details. Besides, our CRM shows its significant generalization ability to fill the resolu- tion gap between low-resolution training images and ultra high-resolution testing ones. We present quantitative per- formance evaluation and visualization to show that our pro- posed method is fast and effective on image segmentation refinement. Code is available at https://github.com/dvlab-research/Entity/tree/main/CRM.