Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2401.10229
Cited By
OMG-Seg: Is One Model Good Enough For All Segmentation?
18 January 2024
Xiangtai Li
Haobo Yuan
Wei Li
Henghui Ding
Size Wu
Wenwei Zhang
Yining Li
Kai Chen
Chen Change Loy
VLM
MLLM
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (1294★)
Papers citing
"OMG-Seg: Is One Model Good Enough For All Segmentation?"
37 / 37 papers shown
Title
VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models
Silin Cheng
Kai Han
MLLM
VPVLM
VLM
214
0
0
27 Nov 2025
Explicit Memory through Online 3D Gaussian Splatting Improves Class-Agnostic Video Segmentation
IEEE Robotics and Automation Letters (IEEE RA-L), 2025
Anthony Opipari
Aravindhan K. Krishnan
Shreekant Gayaka
Min Sun
Cheng-Hao Kuo
Arnie Sen
Odest Chadwicke Jenkins
VOS
3DGS
297
0
0
27 Oct 2025
Unified Open-World Segmentation with Multi-Modal Prompts
Yang Liu
Yufei Yin
Chenchen Jing
M. Zhu
Hao Chen
Yuling Xi
Bo Feng
Hao Wang
Shiyu Li
Chunhua Shen
VLM
98
0
0
12 Oct 2025
Complementary and Contrastive Learning for Audio-Visual Segmentation
IEEE transactions on multimedia (TMM), 2025
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Pingping Zhang
Huchuan Lu
VOS
194
3
0
11 Oct 2025
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
Chenyue Zhou
Mingxuan Wang
Yanbiao Ma
Chenxu Wu
Wanyi Chen
...
Guoli Jia
Lingling Li
Z. Lu
Y. Lu
Wenhan Luo
LRM
407
9
0
29 Sep 2025
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
Quanzhu Niu
Dengxian Gong
Shihao Chen
Tao Zhang
Yikang Zhou
Haobo Yuan
Lu Qi
Xiangtai Li
Shilin Xu
VOS
244
0
0
21 Sep 2025
Human-in-Context: Unified Cross-Domain 3D Human Motion Modeling via In-Context Learning
Mengyuan Liu
Xinshun Wang
Zhongbin Fang
Deheng Ye
Xia Li
Tao Tang
Songtao Wu
Xiangtai Li
Ming-Hsuan Yang
3DH
165
0
0
14 Aug 2025
MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes
Henghui Ding
Kaining Ying
Chang-rui Liu
Shuting He
Xudong Jiang
Yu-Gang Jiang
Juil Sock
Song Bai
VOS
303
20
0
07 Aug 2025
X-SAM: From Segment Anything to Any Segmentation
Hao Wang
Limeng Qiao
Zequn Jie
Zhijian Huang
Chengjian Feng
Qingfang Zheng
Lin Ma
X. Lan
Xiaodan Liang
VLM
117
5
0
06 Aug 2025
Multimodal Referring Segmentation: A Survey
Henghui Ding
Song Tang
Shuting He
Chang-rui Liu
Zuxuan Wu
Yu-Gang Jiang
346
10
0
01 Aug 2025
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
Zhang Li
Biao Yang
Qiang Liu
Shuo Zhang
Zhiyin Ma
Liang Yin
Linger Deng
Yabo Sun
Yuliang Liu
Xiang Bai
412
0
0
08 Jul 2025
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Guohuan Xie
Syed Ariff Syed Hesham
Wenya Guo
Bing Li
Ming-Ming Cheng
Guolei Sun
Yun-Hai Liu
154
1
0
16 Jun 2025
Vision Generalist Model: A Survey
International Journal of Computer Vision (IJCV), 2025
Ziyi Wang
Yongming Rao
Shuofeng Sun
Xinrun Liu
Yi Wei
...
Zuyan Liu
Yanbo Wang
Hongmin Liu
Jie Zhou
Jiwen Lu
277
0
0
11 Jun 2025
GLD-Road:A global-local decoding road network extraction model for remote sensing images
Ligao Deng
Yupeng Deng
Yu Meng
Jingbo Chen
Zhihao Xi
Diyou Liu
Qifeng Chu
195
1
0
11 Jun 2025
Cross-View Multi-Modal Segmentation @ Ego-Exo4D Challenges 2025
Yuqian Fu
Runze Wang
Yanwei Fu
Danda Pani Paudel
Luc Van Gool
122
3
0
06 Jun 2025
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection
Zhenglin Huang
Tianxiao Li
Xiangtai Li
Haiquan Wen
Yiwei He
...
Hao Fei
Xi Yang
Xiaowei Huang
Bei Peng
Guangliang Cheng
634
6
0
24 May 2025
TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting
Zhicong Wu
Hongbin Xu
Gang Xu
Ping Nie
Zhixin Yan
Jinkai Zheng
Liangqiong Qu
Ming Li
Liqiang Nie
3DGS
278
4
0
13 Apr 2025
Your ViT is Secretly an Image Segmentation Model
Computer Vision and Pattern Recognition (CVPR), 2025
Tommie Kerssies
Niccolò Cavagnero
Alexander Hermans
Narges Norouzi
Giuseppe Averta
Bastian Leibe
Gijs Dubbelman
Daan de Geus
ViT
VLM
309
21
0
24 Mar 2025
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
Computer Vision and Pattern Recognition (CVPR), 2025
Shengqiong Wu
Hao Fei
Jingkang Yang
Xiaochen Li
Juncheng Li
Hao Zhang
Tat-Seng Chua
273
4
0
19 Mar 2025
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang
Changxu Cheng
Lingfeng Wang
Senda Chen
Wuyue Zhao
VLM
297
8
0
17 Mar 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
...
Yunhai Tong
Lu Qi
Jiashi Feng
Ming-Hsuan Yang
Ming-Hsuan Yang
VLM
550
68
0
07 Jan 2025
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Neural Information Processing Systems (NeurIPS), 2024
Hao Fei
Shengqiong Wu
Hao Zhang
Tat-Seng Chua
Shuicheng Yan
443
71
0
31 Dec 2024
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2024
Enis Simsar
Thomas Hofmann
F. Tombari
Pinar Yanardag
MoMe
313
7
0
12 Dec 2024
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Sule Bai
Yong-Jin Liu
Yifei Han
Haoji Zhang
Yansong Tang
VLM
588
19
0
24 Nov 2024
ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
VOS
LRM
457
16
0
18 Jul 2024
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Tao Zhang
Xiangtai Li
Hao Fei
Haobo Yuan
Shengqiong Wu
Shunping Ji
Chen Change Loy
Shuicheng Yan
LRM
MLLM
VLM
285
119
0
27 Jun 2024
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Xiangyu Zhao
Xiangtai Li
Haodong Duan
Haian Huang
Yining Li
Kai Chen
Hua Yang
VLM
MLLM
311
19
0
25 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
International Conference on Learning Representations (ICLR), 2024
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
502
57
0
07 Jun 2024
Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models
Mohamad Al Al Mdfaa
Raghad Salameh
Geesara Kulathunga
Sergey Zagoruyko
Gonzalo Ferrer
274
3
0
03 May 2024
DGMamba: Domain Generalization via Generalized State Space Model
Shaocong Long
Qianyu Zhou
Hefei Ling
Xuequan Lu
Chenhao Ying
Yuan Luo
Lizhuang Ma
Shuicheng Yan
303
17
0
11 Apr 2024
Clustering Propagation for Universal Medical Image Segmentation
Yuhang Ding
Liulei Li
Wenguan Wang
Yi Yang
172
21
0
25 Mar 2024
FaceXFormer: A Unified Transformer for Facial Analysis
Kartik Narayan
VS Vibashan
Rama Chellappa
Vishal M. Patel
ViT
430
36
0
19 Mar 2024
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
European Conference on Computer Vision (ECCV), 2024
Xiaojie Li
Jianlong Wu
Hefei Ling
Yue Yu
Yue Yu
Guohao Li
Min Zhang
SSL
255
8
0
18 Mar 2024
Explore In-Context Segmentation via Latent Diffusion Models
AAAI Conference on Artificial Intelligence (AAAI), 2024
Chaoyang Wang
Xiangtai Li
Henghui Ding
Lu Qi
Jiangning Zhang
Yunhai Tong
Chen Change Loy
Shuicheng Yan
DiffM
339
12
0
14 Mar 2024
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
Computer Vision and Pattern Recognition (CVPR), 2024
Yiran Song
Qianyu Zhou
Hefei Ling
Deng-Ping Fan
Xuequan Lu
Lizhuang Ma
VLM
434
20
0
04 Jan 2024
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Chaoyang Zhu
Long Chen
ObjD
VLM
487
64
0
18 Jul 2023
Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yue Han
Jiangning Zhang
Zhucun Xue
Chao Xu
Xintian Shen
Yabiao Wang
Chengjie Wang
Yong Liu
Xiangtai Li
329
22
0
03 Jan 2023
1