DSMF-Net Dual Semantic Metric Learning Fusion Network for Few-Shot Aerial Image Semantic Segmentation
Semantic segmentation of aerial images is crucial yet resource-intensive. Inspired by human ability to learn rapidly, few-shot semantic segmentation offers a promising solution by utilizing limited labeled data for efficient model training and generalization. However, the intrinsic complexities of aerial images, compounded by scarce samples, often result in inadequate feature representation and semantic ambiguity, detracting from themodel’s performance. In this article, we propose to tackle these challenging problems via dual semantic metric learning and multisemantic features fusion
and introduce a novel few-shot segmentation Network (DSMF-Net). On the one hand, we consider the inherent semantic gap between the feature of graph and grid structures and metric learning of few-shot segmentation. To exploit multiscale global semantic context, we construct scale-aware graph prototypes from different stages of the feature layers based on graph convolutional networks (GCNs), while also incorporating prior-guided metric learning to further enhance context at the high-level convolution features. On the other hand, we design a pyramid-based fusion and condensa-
tion mechanism to adaptively merge and couple the multisemantic information from support and query images. The indication and fusion of different semantic features can effectively emphasize the representation and coupling abilities of the network. We have conducted extensive experiments over the challenging iSAID-5i andDLRSD benchmarks. The experiments have demonstrated our network’s effectiveness and efficiency, yielding on-par performance with the state-of-the-art methods.
Kill Two Birds with One Stone Domain Generalization for Semantic Segmentation via Network Pruning
::: tip
Stronger, Fewer, & Superior Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation(DGSS)
https://github.com/w1oves/Rein.git
LGAD Local and Global Attention Distillation for Efficient Semantic Segmentation
Shaoxing University、Central South University
Class Tokens Infusion for Weakly Supervised Semantic Segmentation
Weakly Supervised Semantic Segmentation (WSSS) relies on Class Activation Maps (CAMs) to extract spatial information from image-level labels. With the success of Vision Transformer (ViT), the migration of ViT is actively conducted in WSSS. This work proposes a novel WSSS framework with Class Token Infusion (CTI). By infusing the class tokens from images, we guide class tokens to possess class-specific distinct characteristics and global-local consistency. For this, we devise two kinds of token infusion: 1) Intra-image Class Token Infusion (I-CTI) and 2)Cross-image Class Token Infusion (C-CTI). In I-CTI, we infuse the class tokens from the same but differently augmented images and thus make CAMs consistent among var-
ious deformations (i.e. view, color). In C-CTI, by infusing the class tokens from the other images and imposing the resulting CAMs to be similar, it learns class-specific distinct characteristics. Besides the CTI, we bring the background (BG) concept into ViT with the BG token to reduce the false positive activation ofCAMs. We demonstrate the effectiveness ofour method on PASCAL VOC 2012 and MS COCO 2014 datasets, achieving state-of-the-art results in weakly supervised semantic segmentation. The code is available at https://github.com/yoon307/CTI.
USE Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Bosch Research North America、Bosch Center for Artificial Intelligence (BCAI)
LLMFormer Large LanguageModel for Open-Vocabulary Semantic Segmentation
Hunan University、Monash University
CorrMatch Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
Nankai University、NKIARI, Shenzhen Futian、SICE, UESTC