Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2112.01527
Cited By
v1
v2
v3 (latest)
Masked-attention Mask Transformer for Universal Image Segmentation
2 December 2021
Bowen Cheng
Ishan Misra
Alex Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked-attention Mask Transformer for Universal Image Segmentation"
50 / 1,648 papers shown
Title
Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset
C. Wang
Hua Li
Chongyi Li
Huazhong Liu
Xiongxin Tang
Sam Kwong
76
0
0
20 Oct 2025
Self-Supervised Learning to Fly using Efficient Semantic Segmentation and Metric Depth Estimation for Low-Cost Autonomous UAVs
Sebastian Mocanu
E. Slusanschi
Marius Leordeanu
81
0
0
18 Oct 2025
Aria Gen 2 Pilot Dataset
Chen Kong
James Fort
Aria Kang
Jonathan Wittmer
Simon Green
...
Xiaqing Pan
Jakob Julian Engel
C. Ren
Mingfei Yan
Richard Newcombe
64
0
0
17 Oct 2025
Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology
Xinrui Huang
Fan Xiao
Dongming He
Anqi Gao
Dandan Li
Xiaofan Zhang
Shaoting Zhang
Xudong Wang
MedIm
LM&MA
161
0
0
16 Oct 2025
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
Mingxuan Liu
Honglin He
Elisa Ricci
Wayne Wu
Bolei Zhou
VGen
108
0
0
16 Oct 2025
Multi-modal video data-pipelines for machine learning with minimal human supervision
Mihai Cristian Pîrvu
Marius Leordeanu
VGen
84
0
0
16 Oct 2025
MaskCaptioner: Learning to Jointly Segment and Caption Object Trajectories in Videos
Gabriel Fiastre
Antoine Yang
Cordelia Schmid
VOS
337
0
0
16 Oct 2025
MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning
Mattia Segu
Marta Tintore Gazulla
Yongqin Xian
Luc Van Gool
Federico Tombari
54
0
0
16 Oct 2025
EuroMineNet: A Multitemporal Sentinel-2 Benchmark for Spatiotemporal Mining Footprint Analysis in the European Union (2015-2024)
W. Yu
Vincent Nwazelibe
Xianping Ma
Xiaokang Zhang
R. Gloaguen
Xiao Xiang Zhu
Pedram Ghamisi
52
0
0
16 Oct 2025
UniVector: Unified Vector Extraction via Instance-Geometry Interaction
Yinglong Yan
Jun Yue
Shaobo Xia
Hanmeng Sun
Tianxu Ying
Chengcheng Wu
Sifan Lan
Min He
Pedram Ghamisi
Leyuan Fang
84
0
0
15 Oct 2025
UniFusion: Vision-Language Model as Unified Encoder in Image Generation
Kevin Li
Manuel Brack
Sudeep Katakol
Hareesh Ravi
Ajinkya Kale
104
2
0
14 Oct 2025
MSCloudCAM: Multi-Scale Context Adaptation with Convolutional Cross-Attention for Multispectral Cloud Segmentation
Md Abdullah Al Mazid
Liangdong Deng
N. Rishe
124
0
0
12 Oct 2025
A Machine Learning Perspective on Automated Driving Corner Cases
Sebastian Schmidt
Julius Körner
Stephan Günnemann
136
0
0
12 Oct 2025
Unified Open-World Segmentation with Multi-Modal Prompts
Yang Liu
Yufei Yin
Chenchen Jing
M. Zhu
Hao Chen
Yuling Xi
Bo Feng
Hao Wang
Shiyu Li
Chunhua Shen
VLM
74
0
0
12 Oct 2025
Complementary and Contrastive Learning for Audio-Visual Segmentation
IEEE transactions on multimedia (TMM), 2025
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Pingping Zhang
Huchuan Lu
VOS
166
2
0
11 Oct 2025
Probabilistic Hyper-Graphs using Multiple Randomly Masked Autoencoders for Semi-supervised Multi-modal Multi-task Learning
Pîrvu Mihai-Cristian
Leordeanu Marius
122
1
0
11 Oct 2025
Explainable Human-in-the-Loop Segmentation via Critic Feedback Signals
Pouya Shaeri
Ryan T. Woo
Yasaman Mohammadpour
Ariane Middel
72
0
0
11 Oct 2025
Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
Weikai Huang
Jieyu Zhang
Taoyang Jia
Chenhao Zheng
Ziqi Gao
J. S. Park
Winson Han
Ranjay Krishna
129
0
0
10 Oct 2025
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
Yongding Tao
Tian Wang
Yihong Dong
Huanyu Liu
Kechi Zhang
Xiaolong Hu
Ge Li
88
0
0
10 Oct 2025
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution
Shian Du
Menghan Xia
Chang-rui Liu
Quande Liu
Xintao Wang
Pengfei Wan
Xiangyang Ji
VGen
SupR
204
0
0
09 Oct 2025
LTCA: Long-range Temporal Context Attention for Referring Video Object Segmentation
C. Yan
Jingyun Wang
Guoliang Kang
VOS
149
1
0
09 Oct 2025
AlignGS: Aligning Geometry and Semantics for Robust Indoor Reconstruction from Sparse Views
Yijie Gao
Houqiang Zhong
Tianchi Zhu
Zhengxue Cheng
Qiang Hu
Li Song
3DV
99
0
0
09 Oct 2025
Semantic Segmentation Algorithm Based on Light Field and LiDAR Fusion
Jie Luo
Yuxuan Jiang
Xin Jin
M. Liu
Yihui Fan
72
0
0
08 Oct 2025
Locality-Sensitive Hashing-Based Efficient Point Transformer for Charged Particle Reconstruction
Shitij Govil
Jack P. Rodgers
Yuan-Tang Chou
Siqi Miao
Amit Saha
...
G. Dezoort
Mia Liu
Javier Duarte
Pan Li
Shih-Chieh Hsu
3DV
86
0
0
08 Oct 2025
Data Factory with Minimal Human Effort Using VLMs
Jiaojiao Ye
Jiaxing Zhong
Qian Xie
Yuzhou Zhou
Niki Trigoni
Andrew Markham
DiffM
VLM
160
0
0
07 Oct 2025
Human Action Recognition from Point Clouds over Time
James Dickens
3DPC
3DH
220
0
0
07 Oct 2025
From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance
Ardalan Aryashad
Parsa Razmara
Amin Mahjoub
Seyedarmin Azizi
Mahdi Salmani
Arad Firouzkouhi
VLM
81
0
0
04 Oct 2025
UGround: Towards Unified Visual Grounding with Unrolled Transformers
Rui Qian
Xin Yin
Chuanhang Deng
Zhiyuan Peng
Jian Xiong
Wei Zhai
Dejing Dou
95
0
0
04 Oct 2025
What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework
Hongze Wang
Boyang Sun
Jiaxu Xing
Fan Yang
Marco Hutter
Dhruv Shah
Davide Scaramuzza
Marc Pollefeys
56
0
0
02 Oct 2025
ClustViT: Clustering-based Token Merging for Semantic Segmentation
Fabio Montello
Ronja Güldenring
Lazaros Nalpantidis
VLM
60
0
0
02 Oct 2025
FRIEREN: Federated Learning with Vision-Language Regularization for Segmentation
Ding-Ruei Shen
FedML
VLM
118
0
0
02 Oct 2025
Holistic Order Prediction in Natural Scenes
Pierre Musacchio
Hyunmin Lee
Jaesik Park
3DV
223
0
0
02 Oct 2025
Robust Context-Aware Object Recognition
Klara Janouskova
Cristian Gavrus
Jirí Matas
112
0
0
01 Oct 2025
Semantic Visual Simultaneous Localization and Mapping: A Survey on State of the Art, Challenges, and Future Directions
Thanh Nguyen Canh
Haolan Zhang
Xiem HoangVan
N. Chong
117
0
0
01 Oct 2025
KeySG: Hierarchical Keyframe-Based 3D Scene Graphs
Abdelrhman Werby
Dennis Rotondi
Fabio Scaparro
Kai O. Arras
3DV
74
0
0
01 Oct 2025
SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation
Sangmin Lee
Woongjib Choi
Jihyun Kim
Hong-Goo Kang
72
0
0
01 Oct 2025
IMAGEdit: Let Any Subject Transform
Fei Shen
Weihao Xu
Rui Yan
Dong Zhang
Xiangbo Shu
Jinhui Tang
VGen
88
0
0
01 Oct 2025
Stitch: Training-Free Position Control in Multimodal Diffusion Transformers
Jessica Bader
Mateusz Pach
Maria A. Bravo
Serge Belongie
Zeynep Akata
96
1
0
30 Sep 2025
IRIS: Intrinsic Reward Image Synthesis
Yihang Chen
Yuanhao Ban
Yunqi Hong
Cho-Jui Hsieh
57
0
0
29 Sep 2025
CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D
Mohamad Amin Mirzaei
Pantea Amoie
Ali Ekhterachian
Matin Mirzababaei
Babak Khalaj
3DPC
104
0
0
29 Sep 2025
K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model
Bangwei Guo
Yunhe Gao
Meng Ye
Difei Gu
Yang Zhou
L. Axel
Dimitris N. Metaxas
VLM
102
0
0
29 Sep 2025
HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation
Cong Chen
Ziyuan Huang
Cheng Zou
Huanyi Zheng
Kaixiang Ji
Jiajia Liu
Jingdong Chen
Hao Chen
Chunhua Shen
106
2
0
28 Sep 2025
Token Merging via Spatiotemporal Information Mining for Surgical Video Understanding
Xixi Jiang
Chen Yang
Dong Zhang
Pingcheng Dong
Xin Yang
Kwang-Ting Cheng
64
0
0
28 Sep 2025
CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP
Na Min An
Inha Kang
Minhyun Lee
Hyunjung Shim
VLM
97
0
0
27 Sep 2025
CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones
Wenyi Gong
Mieszko Lis
95
0
0
26 Sep 2025
Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation
Jinbae Seo
Hyeongjun Kwon
Kwonyoung Kim
Jiyoung Lee
Kwanghoon Sohn
VOS
156
0
0
26 Sep 2025
UniMapGen: A Generative Framework for Large-Scale Map Construction from Multi-modal Data
Yujian Yuan
Changjie Wu
Xinyuan Chang
S. Wang
Hang Zhang
Shiyi Liang
Shuang Zeng
Mu Xu
Ning Guo
112
1
0
26 Sep 2025
Boosting LiDAR-Based Localization with Semantic Insight: Camera Projection versus Direct LiDAR Segmentation
Sven Ochs
Philip Schorner
M. Zofka
Johann Marius Zöllner
52
0
0
24 Sep 2025
Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning
Xun Li
Rodrigo Santa Cruz
Mingze Xi
Hu Zhang
Madhawa Perera
...
Brandon J. Matthews
Feng Xu
Matt Adcock
Dadong Wang
Jiajun Liu
100
0
0
24 Sep 2025
Surgical Video Understanding with Label Interpolation
Garam Kim
Tae Kyeong Jeong
Juyoun Park
60
0
0
23 Sep 2025
Previous
1
2
3
4
5
...
31
32
33
Next