Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2406.16860
Cited By
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
24 June 2024
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
Sai Charitha Akula
Jihan Yang
Shusheng Yang
Adithya Iyer
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (61 upvotes)
Papers citing
"Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs"
50 / 413 papers shown
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Haochen Wang
Yuhao Wang
Tao Zhang
Yikang Zhou
Yanwei Li
...
Anran Wang
Yunhai Tong
Z. Wang
X. Li
Zhaoxiang Zhang
VLM
226
0
0
21 Oct 2025
FineVision: Open Data Is All You Need
Luis Wiedmann
Orr Zohar
Amir Mahla
Xiaohan Wang
Rui Li
Thibaud Frere
Leandro von Werra
Aritra Roy Gosthipaty
Andrés Marafioti
VLM
196
13
0
20 Oct 2025
SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning
Xiaojun Guo
Runyu Zhou
Yifei Wang
Qi Zhang
Chenheng Zhang
...
Xiaohan Wang
Jiajun Chai
Guojun Yin
Wei Lin
Y. Wang
LRM
VLM
159
2
0
18 Oct 2025
VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs
Jiaying Zhu
Yurui Zhu
Xin Lu
Wenrui Yan
Dong Li
Kunlin Liu
Xueyang Fu
Zheng-Jun Zha
MQ
VLM
254
0
0
18 Oct 2025
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models
Young-Jun Lee
Byung-Kwan Lee
Jianshu Zhang
Yechan Hwang
ByungSoo Ko
...
Xuankun Rong
Eojin Joo
Seung-Ho Han
Bowon Ko
Ho-Jin Choi
LRM
138
4
0
18 Oct 2025
RL makes MLLMs see better than SFT
Junha Song
Sangdoo Yun
Dongyoon Han
Jaegul Choo
Byeongho Heo
OffRL
193
0
0
18 Oct 2025
Vision-Centric Activation and Coordination for Multimodal Large Language Models
Yunnan Wang
Fan Lu
Kecheng Zheng
Ziyuan Huang
Ziqiang Li
Wenjun Zeng
Xin Jin
MLLM
366
0
0
16 Oct 2025
Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
Xiaoqian Shen
Wenxuan Zhang
Jun-Cheng Chen
Mohamed Elhoseiny
VLM
LRM
114
5
0
15 Oct 2025
Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models
Xinmiao Huang
Qisong He
Zhenglin Huang
Boxuan Wang
Zhuoyun Li
Guangliang Cheng
Yi Dong
Xiaowei Huang
CoGe
281
0
0
15 Oct 2025
Scope: Selective Cross-modal Orchestration of Visual Perception Experts
Tianyu Zhang
Suyuchen Wang
Chao Wang
Juan A. Rodriguez
Ahmed Masry
Xiangru Jian
Yoshua Bengio
Perouz Taslakian
MoE
279
0
0
14 Oct 2025
Point Prompting: Counterfactual Tracking with Video Diffusion Models
Ayush Shrivastava
Sanyam Mehta
Daniel Geng
Andrew Owens
DiffM
VGen
129
1
0
13 Oct 2025
Scaling Language-Centric Omnimodal Representation Learning
Chenghao Xiao
Hou Pong Chan
Hao Zhang
Weiwen Xu
Mahani Aljunied
Yu Rong
140
0
0
13 Oct 2025
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&Ro
AIFin
AI4TS
LRM
AI4CE
250
5
0
13 Oct 2025
Data or Language Supervision: What Makes CLIP Better than DINO?
Yiming Liu
Y. Zhang
Dhruba Ghosh
Ludwig Schmidt
Serena Yeung-Levy
VLM
126
1
0
13 Oct 2025
Task-Aware Resolution Optimization for Visual Large Language Models
Weiqing Luo
Zhen Tan
Y. Li
Xinyu Zhao
Kwonjoon Lee
Behzad Dariush
Tianlong Chen
82
0
0
10 Oct 2025
Unleashing Perception-Time Scaling to Multimodal Reasoning Models
Yifan Li
Z. Chen
Z. F. Wu
Kun Zhou
Ruipu Luo
Can Zhang
Z. He
Yufei Zhan
Wayne Xin Zhao
Minghui Qiu
LRM
VLM
146
1
0
10 Oct 2025
Evaluating Small Vision-Language Models on Distance-Dependent Traffic Perception
Nikos Theodoridis
Tim Brophy
Reenu Mohandas
Ganesh Sistu
Fiachra Collins
Anthony G. Scanlan
Ciarán Eising
VLM
LRM
140
1
0
09 Oct 2025
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
Hongxing Li
Dingming Li
Zixuan Wang
Yuchen Yan
Hang Wu
Wenqi Zhang
Yongliang Shen
Weiming Lu
Jun Xiao
Yueting Zhuang
LRM
VLM
261
7
0
09 Oct 2025
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Kang Liao
Size Wu
Zhonghua Wu
Linyi Jin
Chao Wang
Y. Wang
Fei Wang
Wei Li
Chen Change Loy
MLLM
VGen
183
2
0
09 Oct 2025
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Yi Han
Cheng Chi
Enshen Zhou
Shanyu Rong
Jingkun An
Pengwei Wang
Zhongyuan Wang
Lu Sheng
Shanghang Zhang
LRM
239
9
0
08 Oct 2025
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Suresh Damodaran
Paul D. Rowe
AAML
135
9
0
07 Oct 2025
Visual Representations inside the Language Model
Benlin Liu
Amita Kamath
Madeleine Grunde-McLaughlin
Winson Han
Ranjay Krishna
151
2
0
06 Oct 2025
MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
Umberto Cappellazzo
Minsu Kim
Pingchuan Ma
Honglie Chen
Xubo Liu
Stavros Petridis
Maja Pantic
MoE
155
0
0
05 Oct 2025
InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions
Junde Xu
Yapin Shi
Lijun Lang
Taoyong Cui
Z. Zhang
Guangyong Chen
Jiezhong Qiu
Pheng-Ann Heng
180
0
0
03 Oct 2025
OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows
John Nguyen
Marton Havasi
Tariq Berrada
Luke Zettlemoyer
Ricky T. Q. Chen
207
4
0
03 Oct 2025
RefineShot: Rethinking Cinematography Understanding with Foundational Skill Evaluation
Hang Wu
Yujun Cai
Haonan Ge
H. Chen
Ming-Hsuan Yang
Yiwei Wang
CoGe
175
1
0
02 Oct 2025
Mitigating Modal Imbalance in Multimodal Reasoning
Chen Henry Wu
Neil Kale
Aditi Raghunathan
LRM
146
1
0
02 Oct 2025
VIRTUE: Visual-Interactive Text-Image Universal Embedder
Wei-Yao Wang
Kazuya Tateishi
Qiyu Wu
Shusuke Takahashi
Yuki Mitsufuji
VLM
146
0
0
01 Oct 2025
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Junlin Han
Shengbang Tong
David Fan
Yufan Ren
Koustuv Sinha
Juil Sock
Filippos Kokkinos
LRM
VLM
204
7
0
30 Sep 2025
Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification
Artur Barros
C. Caetano
João Macedo
J. A. dos Santos
Sandra Avila
112
0
0
30 Sep 2025
Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding
Bingkui Tong
Jiaer Xia
Kaiyang Zhou
MLLM
181
1
0
29 Sep 2025
Vision Function Layer in Multimodal LLMs
Cheng Shi
Yizhou Yu
Sibei Yang
129
3
0
29 Sep 2025
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
Xiang An
Yin Xie
Kaicheng Yang
Wenkang Zhang
X. Zhao
...
Ziyong Feng
Ziwei Liu
Bo Li
Jiankang Deng
Jiankang Deng
MLLM
VLM
SyDa
344
43
0
28 Sep 2025
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
Lin Long
Changdae Oh
Seongheon Park
Yixuan Li
VLM
MLLM
187
1
1
27 Sep 2025
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
Divyam Madaan
Varshan Muhunthan
Kyunghyun Cho
S. Chopra
123
1
0
27 Sep 2025
MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
Jonas Belouadi
T. Boubekeur
Adrien Kaiser
109
0
0
26 Sep 2025
The Photographer Eye: Teaching Multimodal Large Language Models to Understand Image Aesthetics like Photographers
Computer Vision and Pattern Recognition (CVPR), 2025
Daiqing Qi
Handong Zhao
Jing Shi
Simon Jenni
Yifei Fan
Franck Dernoncourt
Scott D. Cohen
Sheng Li
VLM
239
1
0
23 Sep 2025
History-Aware Visuomotor Policy Learning via Point Tracking
Jingjing Chen
Hongjie Fang
Chenxi Wang
Shiquan Wang
Cewu Lu
165
2
0
21 Sep 2025
Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
Yuheng Shi
Xiaohuan Pei
Minjing Dong
Chang Xu
ObjD
269
0
0
21 Sep 2025
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Yanghao Li
Rui Qian
Bowen Pan
Haotian Zhang
Haoshuo Huang
...
Zhengdong Zhang
Chen Chen
Yang Zhao
Ruoming Pang
Zhifeng Chen
MLLM
205
4
0
19 Sep 2025
Decoupled Proxy Alignment: Mitigating Language Prior Conflict for Multimodal Alignment in MLLM
Chenkun Tan
Pengyu Wang
Shaojun Zhou
Botian Jiang
Zhaowei Li
Dong Zhang
Xinghao Wang
Yaqian Zhou
Xipeng Qiu
133
0
0
18 Sep 2025
Re-purposing SAM into Efficient Visual Projectors for MLLM-Based Referring Image Segmentation
Xiaobo Yang
Xiaojin Gong
VLM
119
0
0
17 Sep 2025
SAIL-VL2 Technical Report
Weijie Yin
Yongjie Ye
Fangxun Shu
Yue Liao
Zijian Kang
...
Han Wang
Wenzhuo Liu
Xiao Liang
Shuicheng Yan
Chao Feng
LRM
VLM
297
4
0
17 Sep 2025
ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement
Ali Salamatian
Amirhossein Abaskohi
Wan-Cyuan Fan
Mir Rayat Imtiaz Hossain
Leonid Sigal
Giuseppe Carenini
103
1
0
16 Sep 2025
Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models
Yan Chen
Long Li
Teng Xi
Long Zeng
Jingdong Wang
OffRL
ReLM
LRM
VLM
200
6
0
16 Sep 2025
Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis
Jing Hao
Yuxuan Fan
Yanpeng Sun
Kaixin Guo
Lizhuo Lin
Jinrong Yang
Qi Yong H. Ai
Lun M. Wong
Hao Tang
Kuo Feng Hung
LM&MA
172
5
0
11 Sep 2025
Measuring Epistemic Humility in Multimodal Large Language Models
Bingkui Tong
Jiaer Xia
Sifeng Shang
Kaiyang Zhou
HILM
143
2
0
11 Sep 2025
RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation
Z. Zhang
Chenghao Yue
Haobo Xu
Minwen Liao
Xianglin Qi
Huan-ang Gao
Ziwei Wang
Hang Zhao
148
1
0
10 Sep 2025
Point Linguist Model: Segment Any Object via Bridged Large 3D-Language Model
Zhuoxu Huang
Mingqi Gao
Jungong Han
145
1
0
09 Sep 2025
Visual Representation Alignment for Multimodal Large Language Models
Heeji Yoon
Jaewoo Jung
J. Kim
Hyungyu Choi
Heeseong Shin
...
Jisang Han
Donghyun Kim
Chanho Eom
Sunghwan Hong
Seungryong Kim
125
11
0
09 Sep 2025
Previous
1
2
3
4
5
6
7
8
9
Next