v1v2v3 (latest)

Masked-attention Mask Transformer for Universal Image Segmentation

2 December 2021

Papers citing "Masked-attention Mask Transformer for Universal Image Segmentation"

50 / 1,661 papers shown

Title
Stitch: Training-Free Position Control in Multimodal Diffusion Transformers Jessica Bader Mateusz Pach Maria A. Bravo Serge Belongie Zeynep Akata 124 1 0 30 Sep 2025
K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model Bangwei Guo Yunhe Gao Meng Ye Difei Gu Yang Zhou L. Axel Dimitris N. Metaxas VLM 122 0 0 29 Sep 2025
IRIS: Intrinsic Reward Image Synthesis Yihang Chen Yuanhao Ban Yunqi Hong Cho-Jui Hsieh 65 0 0 29 Sep 2025
CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D Mohamad Amin Mirzaei Pantea Amoie Ali Ekhterachian Matin Mirzababaei Babak Khalaj 3DPC 108 0 0 29 Sep 2025
HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation Cong Chen Ziyuan Huang Cheng Zou Huanyi Zheng Kaixiang Ji Jiajia Liu Jingdong Chen Hao Chen Chunhua Shen 122 2 0 28 Sep 2025
Token Merging via Spatiotemporal Information Mining for Surgical Video Understanding Xixi Jiang Chen Yang Dong Zhang Pingcheng Dong Xin Yang Kwang-Ting Cheng 68 0 0 28 Sep 2025
CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP Na Min An Inha Kang Minhyun Lee Hyunjung Shim VLM 129 0 0 27 Sep 2025
CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones Wenyi Gong Mieszko Lis 107 0 0 26 Sep 2025
UniMapGen: A Generative Framework for Large-Scale Map Construction from Multi-modal Data Yujian Yuan Changjie Wu Xinyuan Chang S. Wang Hang Zhang Shiyi Liang Shuang Zeng Mu Xu Ning Guo 124 2 0 26 Sep 2025
Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation Jinbae Seo Hyeongjun Kwon Kwonyoung Kim Jiyoung Lee Kwanghoon Sohn VOS 176 0 0 26 Sep 2025
Boosting LiDAR-Based Localization with Semantic Insight: Camera Projection versus Direct LiDAR Segmentation Sven Ochs Philip Schorner M. Zofka Johann Marius Zöllner 60 0 0 24 Sep 2025
Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning Xun Li Rodrigo Santa Cruz Mingze Xi Hu Zhang Madhawa Perera ... Brandon J. Matthews Feng Xu Matt Adcock Dadong Wang Jiajun Liu 112 0 0 24 Sep 2025
MLF-4DRCNet: Multi-Level Fusion with 4D Radar and Camera for 3D Object Detection in Autonomous Driving Yuzhi Wu Li Xiao Jun Liu Guangfeng Jiang X. Xia 104 0 0 23 Sep 2025
Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation Yunzhe Shen Kai Peng Leiye Liu Wei Ji Jingjing Li Miao Zhang Yongri Piao Huchuan Lu VOS 168 0 0 23 Sep 2025
Surgical Video Understanding with Label Interpolation Garam Kim Tae Kyeong Jeong Juyoun Park 60 0 0 23 Sep 2025
Visual Instruction Pretraining for Domain-Specific Foundation Models Yuxuan Li Y. Zhang Wenhao Tang Yimian Dai Ming-Ming Cheng Xiang Li Jian Yang LRM 239 3 0 22 Sep 2025
Region-Aware Deformable Convolutions Abolfazl Saheban Maleki Maryam Imani 118 0 0 18 Sep 2025
Improving Generalized Visual Grounding with Instance-aware Joint LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025 Ming Dai Wenxuan Cheng Jiang-Jiang Liu Lingfeng Yang Zhenhua Feng Wankou Yang Jingdong Wang ObjD ISeg 207 3 0 17 Sep 2025
Re-purposing SAM into Efficient Visual Projectors for MLLM-Based Referring Image Segmentation Xiaobo Yang Xiaojin Gong VLM 95 0 0 17 Sep 2025
Mitigating Query Selection Bias in Referring Video Object Segmentation Dingwei Zhang Dong Zhang Jinhui Tang 93 0 0 17 Sep 2025
White Aggregation and Restoration for Few-shot 3D Point Cloud Semantic Segmentation Jiyun Im Subeen Lee Miso Lee Jae-Pil Heo 3DPC 180 0 0 17 Sep 2025
Masked Feature Modeling Enhances Adaptive Segmentation Wenlve Zhou Zhiheng Zhou Tiantao Xian Yikui Zhai Weibin Wu Biyun Ma 96 0 0 17 Sep 2025
Road Obstacle Video Segmentation Shyam Nandan Rai Shyamgopal Karthik Mariana-Iuliana Georgescu Barbara Caputo Carlo Masone Zeynep Akata VOS 177 0 0 16 Sep 2025
NavMoE: Hybrid Model- and Learning-based Traversability Estimation for Local Navigation via Mixture of Experts Botao He A. Shahidzadeh Yu Chen Jiayi Wu Tianrui Guan ... Howie Choset Dinesh Manocha Glen Chou Cornelia Fermüller Yiannis Aloimonos MoE 219 0 0 16 Sep 2025
CLAIRE: A Dual Encoder Network with RIFT Loss and Phi-3 Small Language Model Based Interpretability for Cross-Modality Synthetic Aperture Radar and Optical Land Cover Segmentation Debopom Sutradhar Arefin Ittesafun Abian M. R Reem E. Mohamed Sheikh Izzal Azid Sami Azam 88 0 0 15 Sep 2025
Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing Bingyu Li Haocheng Dong Da Zhang Zhiyuan Zhao Junyu Gao Xuelong Li 121 3 0 15 Sep 2025
Microsurgical Instrument Segmentation for Robot-Assisted Surgery Tae Kyeong Jeong Garam Kim Juyoun Park 64 0 0 15 Sep 2025
RailSafeNet: Visual Scene Understanding for Tram SafetyPortuguese Conference on Artificial Intelligence (EPIA), 2025 Ondřej Valach Ivan Gruber 60 0 0 15 Sep 2025
Leveraging Multi-View Weak Supervision for Occlusion-Aware Multi-Human Parsing Laura Bragagnolo Matteo Terreran Leonardo Barcellona Stefano Ghidoni 3DH 96 0 0 12 Sep 2025
Multimodal SAM-adapter for Semantic SegmentationIEEE Access (IEEE Access), 2025 Iacopo Curti Pierluigi Zama Ramirez Alioscia Petrelli Luigi Di Stefano 114 1 0 12 Sep 2025
I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation Jordan Sassoon Michal Szczepanski Martyna Poreba MQ VLM 119 0 0 12 Sep 2025
DGFusion: Depth-Guided Sensor Fusion for Robust Semantic Perception Tim Broedermannn Christos Sakaridis Luigi Piccinelli Wim Abbeloos Luc Van Gool MDE 260 1 0 11 Sep 2025
Prompt-Driven Image Analysis with Multimodal Generative AI: Detection, Segmentation, Inpainting, and Interpretation Kaleem Ahmad MLLM 74 0 0 10 Sep 2025
UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning Huy Le Nhat Chung Tung Kieu Jingkang Yang Ngan Le VOS OCL 329 1 0 07 Sep 2025
A biologically inspired separable learning vision model for real-time traffic object perception in DarkExpert systems with applications (ESWA), 2025 Hulin Li Qiliang Ren Jun Li Hanbing Wei Zheng Liu Linfang Fan ViT VLM 96 1 0 05 Sep 2025
Skywork UniPic 2.0: Building Kontext Model with Online RL for Unified Multimodal Model Hongyang Wei Baixin Xu Hongbo Liu Cyrus Wu J. Liu ... Ying He Yang Liu Xuchen Song Eric Li Y. Zhou 153 10 0 04 Sep 2025
A Multidimensional AI-powered Framework for Analyzing Tourist Perception in Historic Urban Quarters: A Case Study in Shanghai Kaizhen Tan Yufan Wu Yuxuan Liu Haoran Zeng 40 0 0 04 Sep 2025
InstaDA: Augmenting Instance Segmentation Data with Dual-Agent System Xianbao Hou Yonghao He Zeyd Boukhers John See Hu Su Wei Sui Cong Yang DiffM VLM 85 0 0 03 Sep 2025
SOPSeg: Prompt-based Small Object Instance Segmentation in Remote Sensing Imagery Chenhao Wang Yingrui Ji Yu Meng Yunjian Zhang Yao Zhu 157 1 0 03 Sep 2025
Unsupervised Instance Segmentation with SuperpixelsPattern Recognition (Pattern Recogn.), 2025 Cuong Manh Hoang SSeg 158 1 0 03 Sep 2025
MedDINOv3: How to adapt vision foundation models for medical image segmentation? Yuheng Li Yizhou Wu Yuxiang Lai Mingzhe Hu Xiaofeng Yang MedIm 242 6 0 02 Sep 2025
Can General-Purpose Omnimodels Compete with Specialists? A Case Study in Medical Image Segmentation Yizhe Zhang Qiang Chen Tao Zhou 99 0 0 31 Aug 2025
No More Sibling Rivalry: Debiasing Human-Object Interaction Detection Bin Yang Yulin Zhang Hong-Yu Zhou Sibei Yang 128 0 0 31 Aug 2025
DGL-RSIS: Decoupling Global Spatial Context and Local Class Semantics for Training-Free Remote Sensing Image Segmentation Boyi Li Ce Zhang Richard M. Timmerman Wenxuan Bao 60 0 0 30 Aug 2025
Learning Yourself: Class-Incremental Semantic Segmentation with Language-Inspired Bootstrapped Disentanglement Ruitao Wu Yifan Zhao Jia Li CLL 140 1 0 30 Aug 2025
Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance Luozhijie Jin Zijie Qiu J. Liu Zijie Diao Lifeng Qiao Ning Ding Alex Lamb Xipeng Qiu AI4CE 94 2 0 28 Aug 2025
Image Quality Assessment for Machines: Paradigm, Large-scale Database, and Models Xiaoqi Wang Yun Zhang Weisi Lin 120 0 0 27 Aug 2025
AutoQ-VIS: Improving Unsupervised Video Instance Segmentation via Automatic Quality Assessment Kaixuan Lu Mehmet Onurcan Kaya Dim P. Papadopoulos 56 0 0 27 Aug 2025
FreeVPS: Repurposing Training-Free SAM2 for Generalizable Video Polyp Segmentation Qiang Hu Ying Zhou Gepeng Ji Nick Barnes Qiang Li Zhiwei Wang 112 0 0 27 Aug 2025
IELDG: Suppressing Domain-Specific Noise with Inverse Evolution Layers for Domain Generalized Semantic Segmentation Qizhe Fan Chaoyu Liu Zhonghua Qiao Xiaoqin Shen 96 0 0 27 Aug 2025