ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.16860
  4. Cited By
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

24 June 2024
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
Sai Charitha Akula
Jihan Yang
Shusheng Yang
Adithya Iyer
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
    3DVMLLM
ArXiv (abs)PDFHTMLHuggingFace (61 upvotes)

Papers citing "Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs"

50 / 413 papers shown
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
Xindi Wu
Hee Seung Hwang
Polina Kirichenko
Olga Russakovsky
Olga Russakovsky
VLMCoGe
396
4
0
24 Dec 2025
Multimodal Reinforcement Learning with Agentic Verifier for AI Agents
Multimodal Reinforcement Learning with Agentic Verifier for AI Agents
Reuben Tan
Baolin Peng
Zhengyuan Yang
Hao Cheng
Oier Mees
...
Xiaodong Liu
Lijuan Wang
Marc Pollefeys
Yong Jae Lee
Jianfeng Gao
OffRLLRM
189
1
0
03 Dec 2025
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
Shojiro Yamabe
Futa Waseda
Daiki Shiono
Tsubasa Takahashi
DiffMMLLMVLM
232
0
0
03 Dec 2025
Jina-VLM: Small Multilingual Vision Language Model
Jina-VLM: Small Multilingual Vision Language Model
Andreas Koukounas
Georgios Mastrapas
Florian Hönicke
Sedigheh Eslami
Guillaume Roncari
Scott Martens
Han Xiao
MLLM
351
0
0
03 Dec 2025
PAI-Bench: A Comprehensive Benchmark For Physical AI
Fengzhe Zhou
Jiannan Huang
Jialuo Li
Deva Ramanan
Humphrey Shi
VGen
156
0
0
01 Dec 2025
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
Yiming Chen
Junlin Han
Tianyi Bai
Shengbang Tong
Filippos Kokkinos
Philip Torr
87
0
0
27 Nov 2025
Geometrically-Constrained Agent for Spatial Reasoning
Geometrically-Constrained Agent for Spatial Reasoning
Zeren Chen
Xiaoya Lu
Zhijie Zheng
Pengrui Li
Lehan He
Yijin Zhou
Jing Shao
Bohan Zhuang
Lu Sheng
LRM
103
0
0
27 Nov 2025
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
Tianyi Xiong
Yi Ge
Ming Li
Zuolong Zhang
Pranav Kulkarni
...
Yanshuo Chen
X. Wang
Renrui Zhang
Wenhu Chen
Heng Huang
188
0
0
26 Nov 2025
EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens
EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens
Ze Feng
Sen Yang
Boqiang Duan
Wankou Yang
Jingdong Wang
VLM
171
0
0
26 Nov 2025
SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
Peiran Xu
Sudong Wang
Yao Zhu
Jianing Li
Yunjian Zhang
LRM
342
1
0
26 Nov 2025
DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models
DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models
Futian Wang
Chaoliu Weng
Xiao Wang
Zhen Chen
Zhicheng Zhao
Jin Tang
68
0
0
26 Nov 2025
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
Yuyi Li
Daoyuan Chen
Zhen Wang
Yutong Lu
Yaliang Li
142
0
0
25 Nov 2025
Thinking in 360°: Humanoid Visual Search in the Wild
Thinking in 360°: Humanoid Visual Search in the Wild
Heyang Yu
Yinan Han
Xiangyu Zhang
B. Yin
Bowen Chang
...
Jing Zhang
Marco Pavone
Chen Feng
Saining Xie
Yiming Li
VGen
334
1
0
25 Nov 2025
AlignBench: Benchmarking Fine-Grained Image-Text Alignment with Synthetic Image-Caption Pairs
AlignBench: Benchmarking Fine-Grained Image-Text Alignment with Synthetic Image-Caption Pairs
Kuniaki Saito
Risa Shinoda
Shohei Tanaka
Tosho Hirasawa
Fumio Okura
Yoshitaka Ushiku
CoGeVLM
304
0
0
25 Nov 2025
Text-Guided Semantic Image Encoder
Text-Guided Semantic Image Encoder
Raghuveer Thirukovalluru
Xiaochuang Han
Bhuwan Dhingra
Emily Dinan
Maha Elbayad
VLM
156
0
0
25 Nov 2025
LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
Shuai Wang
D. Zhang
Tianyi Bai
Shitong Shao
Jiebo Luo
Jiaheng Wei
VLM
142
1
0
24 Nov 2025
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Yiming Qin
Bomin Wei
Jiaxin Ge
Konstantinos Kallidromitis
Stephanie Fu
Trevor Darrell
Xudong Wang
LRMVLM
251
1
0
24 Nov 2025
Perceptual Taxonomy: Evaluating and Guiding Hierarchical Scene Reasoning in Vision-Language Models
Perceptual Taxonomy: Evaluating and Guiding Hierarchical Scene Reasoning in Vision-Language Models
Jonathan Lee
Xingrui Wang
Jiawei Peng
Luoxin Ye
Zehan Zheng
...
Wufei Ma
S. Chen
Yu-Cheng Chou
Prakhar Kaushik
Alan Yuille
LRM
89
1
0
24 Nov 2025
SO-Bench: A Structural Output Evaluation of Multimodal LLMs
SO-Bench: A Structural Output Evaluation of Multimodal LLMs
Di Feng
Kaixin Ma
Feng Nan
Haofeng Chen
Bohan Zhai
...
Zhe Gan
Eshan Verma
Yinfei Yang
Zhifeng Chen
Afshin Dehghan
89
0
0
23 Nov 2025
When Better Teachers Don't Make Better Students: Revisiting Knowledge Distillation for CLIP Models in VQA
When Better Teachers Don't Make Better Students: Revisiting Knowledge Distillation for CLIP Models in VQA
Pume Tuchinda
Parinthapat Pengpun
Romrawin Chumpu
Sarana Nutanong
Peerat Limkonchotiwat
VLM
124
0
0
22 Nov 2025
ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and Better
ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and Better
Y. Zhang
Ming Lu
J. Pan
Tao Huang
Kuan Cheng
Qi She
Shanghang Zhang
LRM
197
0
0
21 Nov 2025
IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation
IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation
Y. Li
Lichi Li
Anh Dao
Xinyu Zhou
Yicheng Qiao
...
Daeun Lee
Z. Chen
Zhen Tan
Mohit Bansal
Yu Kong
163
0
0
21 Nov 2025
Attention Guided Alignment in Efficient Vision-Language Models
Attention Guided Alignment in Efficient Vision-Language Models
Shweta Mahajan
Hoang Le
Hyojin Park
Farzad Farhadzadeh
Munawar Hayat
Fatih Porikli
VLM
141
0
0
21 Nov 2025
BOP-ASK: Object-Interaction Reasoning for Vision-Language Models
BOP-ASK: Object-Interaction Reasoning for Vision-Language Models
V. Bhat
Sungsu Kim
Valts Blukis
Greg Heinrich
Prashanth Krishnamurthy
Ramesh Karri
Stan Birchfield
Farshad Khorrami
Jonathan Tremblay
VLM
239
1
0
20 Nov 2025
Can We Predict the Next Question? A Collaborative Filtering Approach to Modeling User Behavior
Can We Predict the Next Question? A Collaborative Filtering Approach to Modeling User Behavior
Bokang Fu
Jiahao Wang
Xiaojing Liu
Y. Liu
197
0
0
17 Nov 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Yunxin Li
Xinyu Chen
Shenyuan Jiang
Haoyuan Shi
Zhenyu Liu
...
Zhenran Xu
Yicheng Ma
Meishan Zhang
Baotian Hu
Min Zhang
MLLMMoEOSLMVLM
615
1
0
16 Nov 2025
Simple Vision-Language Math Reasoning via Rendered Text
Simple Vision-Language Math Reasoning via Rendered Text
Matvey Skripkin
Elizaveta Goncharova
Andrey Kuznetsov
ReLMLRMVLM
352
0
0
12 Nov 2025
Multimodal LLMs Do Not Compose Skills Optimally Across Modalities
Multimodal LLMs Do Not Compose Skills Optimally Across Modalities
Paula Ontalvilla
Aitor Ormazabal
Gorka Azkune
129
0
0
11 Nov 2025
SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards
SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards
Hunar Batra
Haoqin Tu
Hardy Chen
Yuanze Lin
Cihang Xie
Ronald Clark
OffRLReLMLRM
374
0
0
10 Nov 2025
Revisiting the Data Sampling in Multimodal Post-training from a Difficulty-Distinguish View
Revisiting the Data Sampling in Multimodal Post-training from a Difficulty-Distinguish View
Jianyu Qi
Ding Zou
Wenrui Yan
Rui Ma
Jiaxu Li
Zhijie Zheng
Zhiguo Yang
Rongchang Zhao
LRM
249
0
0
10 Nov 2025
Visual Spatial Tuning
Visual Spatial Tuning
Rui Yang
Ziyu Zhu
Yanwei Li
Jingjia Huang
Shen Yan
...
Xiangtai Li
S. Li
Wenqian Wang
Yi Lin
Hengshuang Zhao
VLM
345
6
0
07 Nov 2025
iFlyBot-VLM Technical Report
iFlyBot-VLM Technical Report
Xin Nie
Zhiyuan Cheng
Yuan Zhang
Chao Ji
Jiajia wu
Yuhan Zhang
Jia Pan
LM&Ro
331
0
0
07 Nov 2025
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at ScaleAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
David Acuna
Chao-Han Huck Yang
Yuntian Deng
Jaehun Jung
Ximing Lu
Prithviraj Ammanabrolu
Hyunwoo J. Kim
Yuan-Hong Liao
Yejin Choi
ReLMOffRLLRM
343
1
0
07 Nov 2025
Cambrian-S: Towards Spatial Supersensing in Video
Cambrian-S: Towards Spatial Supersensing in Video
Shusheng Yang
J. Yang
Pinzhi Huang
Ellis L Brown
Zihao Yang
...
Daohan Lu
Rob Fergus
Yann LeCun
Li Fei-Fei
Saining Xie
178
17
0
06 Nov 2025
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts
Ellis L Brown
Jihan Yang
Shusheng Yang
Rob Fergus
Saining Xie
VLM
230
5
0
06 Nov 2025
IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
Ali Faraz
Akash
Shaharukh Khan
Raja Kolla
Akshat Patidar
Suranjan Goswami
Abhinav Ravi
Chandra Khatri
Shubham Agarwal
VLM
164
0
0
06 Nov 2025
SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding
SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding
Ellis L Brown
Arijit Ray
Ranjay Krishna
Ross B. Girshick
Rob Fergus
Saining Xie
358
6
0
06 Nov 2025
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models
Xiaoyu Zhan
Wenxuan Huang
Hao Sun
Xinyu Fu
Changfeng Ma
...
Wenlong Zhang
Wanli Ouyang
Yuanqi Li
Jie Guo
Yanwen Guo
LRM
112
1
0
03 Nov 2025
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Ming Li
Jike Zhong
Shitian Zhao
H. Zhang
Shaoheng Lin
Yuxiang Lai
Chen Wei
Konstantinos Psounis
Kaipeng Zhang
EGVMLRMVLM
462
3
0
03 Nov 2025
Masked Diffusion Captioning for Visual Feature Learning
Masked Diffusion Captioning for Visual Feature Learning
Chao Feng
Zihao Wei
Andrew Owens
DiffM
251
0
0
30 Oct 2025
NaviTrace: Evaluating Embodied Navigation of Vision-Language Models
NaviTrace: Evaluating Embodied Navigation of Vision-Language Models
Tim Windecker
Manthan Patel
Moritz Reuss
Richard Schwarzkopf
Cesar Cadena
Rudolf Lioutikov
Marco Hutter
Jonas Frey
LM&Ro
396
2
0
30 Oct 2025
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
Fenfen Lin
Y. Liu
Haiyu Xu
Chen Yue
Zheqi He
Mingxuan Zhao
Miguel Hu Chen
Jiakang Liu
JG Yao
Xi Yang
VLM
142
0
0
30 Oct 2025
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks
Xu Zheng
Zihao Dongfang
Lutao Jiang
Boyuan Zheng
Yulong Guo
...
L. Zhang
Danda Pani Paudel
Nicu Sebe
Luc Van Gool
Xuming Hu
LRMVLM
721
4
0
29 Oct 2025
SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability
SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability
Peiyang Xu
Minzhou Pan
Z. Chen
Shuang Yang
Chaowei Xiao
B. Li
EGVM
256
1
0
28 Oct 2025
UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
Anqi Li
Z. T. Wang
JIazhao Zhang
Minghan Li
Y. Qi
Zhibo Chen
Zhizheng Zhang
He Wang
147
0
0
27 Oct 2025
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
Xin Jin
Siyuan Li
Siyong Jian
Kai Yu
Huan Wang
143
0
0
27 Oct 2025
Dexbotic: Open-Source Vision-Language-Action Toolbox
Dexbotic: Open-Source Vision-Language-Action Toolbox
Bin Xie
Erjin Zhou
Fan Jia
Hao Shi
Haoqiang Fan
...
Zhao Wu
Ziheng Zhang
Ziming Liu
Ziwei Yan
Z. Zhang
LM&RoVLM
192
2
0
27 Oct 2025
EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence
EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence
Ding Zou
F. Wang
Mengyu Ge
Siyuan Fan
Zongbing Zhang
...
Xianxian Xi
Y. Zhang
Wenyuan Li
Zhengguang Gao
Yurui Zhu
LM&Ro
159
0
0
23 Oct 2025
Data-Centric Lessons To Improve Speech-Language Pretraining
Data-Centric Lessons To Improve Speech-Language Pretraining
Vishaal Udandarao
Zhiyun Lu
Xuankai Chang
Yongqiang Wang
Violet Z. Yao
Albin Madapally Jose
Fartash Faghri
Josh Gardner
Chung-Cheng Chiu
140
0
0
22 Oct 2025
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Haochen Wang
Yuhao Wang
Tao Zhang
Yikang Zhou
Yanwei Li
...
Anran Wang
Yunhai Tong
Z. Wang
X. Li
Zhaoxiang Zhang
VLM
219
0
0
21 Oct 2025
123456789
Next