Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2406.16860
Cited By
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
24 June 2024
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
Sai Charitha Akula
Jihan Yang
Shusheng Yang
Adithya Iyer
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (61 upvotes)
Papers citing
"Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs"
50 / 413 papers shown
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
Eduard Allakhverdov
Elizaveta Goncharova
Andrey Kuznetsov
212
1
0
20 Mar 2025
POSTA: A Go-to Framework for Customized Artistic Poster Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Zhaodong Sun
Xiaojie Xu
Wenbo Li
Jingjing Ren
Tian Ye
Songhua Liu
Pengcheng Chen
Lei Zhu
Xinchao Wang
DiffM
300
21
0
19 Mar 2025
Visual Position Prompt for MLLM based Visual Grounding
Wei Tang
Yanpeng Sun
Qinying Gu
Zechao Li
VLM
529
7
0
19 Mar 2025
Where do Large Vision-Language Models Look at when Answering Questions?
X. Xing
Chia-Wen Kuo
Li Fuxin
Yulei Niu
Fan Chen
Ming Li
Ying Wu
Longyin Wen
Sijie Zhu
LRM
284
6
0
18 Mar 2025
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu
Siyuan Meng
Yanting Gao
Song Mao
Pinlong Cai
Guohang Yan
Yirong Chen
Zilin Bian
Ding Wang
Botian Shi
368
12
0
17 Mar 2025
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hai-Long Sun
Zhun Sun
Houwen Peng
Han-Jia Ye
LRM
296
15
0
17 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Yue Yang
Afshin Dehghan
Peter Grasch
290
34
0
17 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
Huanjin Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
393
206
0
17 Mar 2025
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Tianle Li
Yongming Rao
Winston Hu
Yu Cheng
MLLM
230
1
0
16 Mar 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi
Eddy Ilg
Margret Keuper
Hideki Tanaka
Masao Utiyama
Mary Dabre
Steffen Eger
Simone Paolo Ponzetto
623
2
0
14 Mar 2025
Semantic-Clipping: Efficient Vision-Language Modeling with Semantic-Guidedd Visual Selection
Bangzheng Li
Haiwei Yang
Wenxuan Zhou
Nan Xu
Ben Zhou
Sheng Zhang
Hoifung Poon
Mengzhao Chen
MLLM
VLM
328
2
0
14 Mar 2025
Towards Understanding Graphical Perception in Large Multimodal Models
Kai Zhang
Jianwei Yang
J. Inala
Chandan Singh
Jianfeng Gao
Eric Fosler-Lussier
Chenglong Wang
316
2
0
13 Mar 2025
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Wanhua Li
Renping Zhou
Jiawei Zhou
Yingwei Song
Johannes Herter
Minghan Qin
Gao Huang
Hanspeter Pfister
3DGS
VLM
425
17
0
13 Mar 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia
Junlong Li
Xiang Yue
Bo Li
Ping Nie
Dayou Du
Lei Ma
LRM
492
16
0
13 Mar 2025
Generative Frame Sampler for Long Video Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Linli Yao
Haoning Wu
Kun Ouyang
Yujiao Shi
Caiming Xiong
Bei Chen
Xu Sun
Junnan Li
VLM
VGen
282
16
0
12 Mar 2025
Revisiting semi-supervised learning in the era of foundation models
Ping Zhang
Zheda Mai
Quang-Huy Nguyen
Wei-Lun Chao
567
3
0
12 Mar 2025
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
932
12
0
11 Mar 2025
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Letian Zhang
Quan Cui
Bingchen Zhao
Cheng Yang
MLLM
SyDa
462
5
0
11 Mar 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Computer Vision and Pattern Recognition (CVPR), 2025
Huanyi Zheng
Yuzhuo Tian
Hao Chen
Chunluan Zhou
Qingpei Guo
Yongxu Liu
M. Yang
Chunhua Shen
MLLM
VLM
288
10
0
11 Mar 2025
Should VLMs be Pre-trained with Image Data?
International Conference on Learning Representations (ICLR), 2025
Sedrick Scott Keh
Jean Mercat
S. Gadre
Kushal Arora
Igor Vasiljevic
...
Shuran Song
Russ Tedrake
Thomas Kollar
Ludwig Schmidt
Achal Dave
VLM
240
0
0
10 Mar 2025
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
International Conference on Learning Representations (ICLR), 2025
Qinghao Ye
Xianhan Zeng
Fu Li
Chong Li
Haoqi Fan
CoGe
261
15
0
10 Mar 2025
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo
Yingying Zhang
Xiaoyu Yang
Kang Wu
Qi Zhu
Lei Liang
Jingdong Chen
Yansheng Li
484
12
0
10 Mar 2025
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2025
Bardia Safaei
Faizan Siddiqui
Jiacong Xu
Vishal M. Patel
Shao-Yuan Lo
VLM
1.0K
7
0
10 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2025
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
300
31
0
08 Mar 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Computer Vision and Pattern Recognition (CVPR), 2025
Junyan Lin
Haoran Chen
Yue Fan
Yingqi Fan
Jianfeng Dong
Hui Su
Jinlan Fu
Xiaoyu Shen
235
13
0
08 Mar 2025
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu
Yinghao Chen
Renshou Wu
Haohao Gao
Xi Chen
...
Fangyuan Li
Yafei Wen
Xiaoxin Chen
Shuai Ren
Jiaming Song
403
0
0
08 Mar 2025
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Hengguang Zhou
Xirui Li
Ruochen Wang
Minhao Cheng
Tianyi Zhou
Cho-Jui Hsieh
OffRL
LRM
ReLM
395
127
0
07 Mar 2025
SHAPE : Self-Improved Visual Preference Alignment by Iteratively Generating Holistic Winner
Kejia Chen
Jiawen Zhang
Jiacong Hu
Jiazhen Yang
Jian Lou
Zunlei Feng
Weilong Dai
329
1
0
06 Mar 2025
See What You Are Told: Visual Attention Sink in Large Multimodal Models
International Conference on Learning Representations (ICLR), 2025
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
360
48
0
05 Mar 2025
Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
Wei-Yao Wang
Zhao Wang
Helen Suzuki
Yoshiyuki Kobayashi
LRM
354
5
0
04 Mar 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Computer Vision and Pattern Recognition (CVPR), 2025
Ailin Deng
Tri Cao
Zhirui Chen
Bryan Hooi
VLM
338
31
0
04 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Yunbo Wang
VLM
604
4
0
04 Mar 2025
Advancing vision-language models in front-end development via data synthesis
Tong Ge
Yashu Liu
Jieping Ye
Tianyi Li
Chao Wang
209
3
0
03 Mar 2025
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
Shiqi Chen
Tongyao Zhu
Ruochen Zhou
Jinghan Zhang
Siyang Gao
Juan Carlos Niebles
Mor Geva
Junxian He
Jiajun Wu
Pengfei Yu
LRM
447
35
0
03 Mar 2025
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Dilxat Muhtar
Enzhuo Zhang
Zhenshi Li
Feng-Xue Gu
Yanglangxing He
Pengfeng Xiao
Xueliang Zhang
294
8
0
02 Mar 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Computer Vision and Pattern Recognition (CVPR), 2025
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
484
85
0
28 Feb 2025
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation
Conference on Machine Translation (WMT), 2025
Shaharukh Khan
Ayush Tarun
Ali Faraz
Palash Kamble
Vivek Dahiya
Praveen Kumar Pokala
Ashish Kulkarni
Chandra Khatri
Abhinav Ravi
Shubham Agarwal
890
8
0
27 Feb 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Zhongyang Li
Ziyue Li
Wanrong Zhu
MoE
470
3
0
27 Feb 2025
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
Junwei Liao
Haipang Wu
Ji Liu
André Freitas
Qifan Wang
AuLLM
581
7
0
26 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
588
12
0
26 Feb 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guoquan Zheng
Haodong Duan
Hua Yang
Kai Chen
380
16
0
25 Feb 2025
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
International Conference on Learning Representations (ICLR), 2024
Xi Jiang
Jian Li
Hanqiu Deng
Wenshu Fan
Bin-Bin Gao
Yifeng Zhou
Jialin Li
Chengjie Wang
Feng Zheng
409
0
0
24 Feb 2025
Chitrarth: Bridging Vision and Language for a Billion People
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Shaharukh Khan
Ayush Tarun
Abhinav Ravi
Ali Faraz
Akshat Patidar
Praveen Kumar Pokala
Anagha Bhangare
Raja Kolla
Chandra Khatri
Shubham Agarwal
VLM
583
8
0
21 Feb 2025
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
Hao Zhang
Xiang Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Yue Yang
Zhe Gan
CLIP
VLM
350
23
0
20 Feb 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yue Yang
Ajay Patel
Matt Deitke
Tanmay Gupta
Luca Weihs
...
Mark Yatskar
Chris Callison-Burch
Ranjay Krishna
Aniruddha Kembhavi
Christopher Clark
SyDa
563
24
0
20 Feb 2025
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Shengguang Wu
Fan-Yun Sun
Kaiyue Wen
Nick Haber
VLM
401
8
0
19 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Hui Yuan
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Longji Xu
235
1
0
19 Feb 2025
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi
Wenyao Zhang
Yufei Ding
Runpei Dong
Xinqiang Yu
...
Xin Jin
Kaisheng Ma
Zhizheng Zhang
He Wang
Li Yi
LM&Ro
445
15
0
18 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Computer Vision and Pattern Recognition (CVPR), 2025
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
355
93
0
18 Feb 2025
PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection
Jinhe Bi
Yifan Wang
Danqi Yan
Aniri
Wenke Huang
...
Xun Xiao
Hinrich Schuetze
Volker Tresp
Yunpu Ma
Yunpu Ma
VLM
517
32
0
17 Feb 2025
Previous
1
2
3
4
5
6
7
8
9
Next