ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.16860
  4. Cited By
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

24 June 2024
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
Sai Charitha Akula
Jihan Yang
Shusheng Yang
Adithya Iyer
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
    3DVMLLM
ArXiv (abs)PDFHTMLHuggingFace (61 upvotes)

Papers citing "Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs"

50 / 413 papers shown
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Zheng Liu
Mengjie Liu
Jianfei Chen
Jingwei Xu
Tengjiao Wang
Bin Wang
Wentao Zhang
MLLM
466
3
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLMVLM
599
806
1
14 Apr 2025
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Haoran Hao
Jiaming Han
Yiyuan Zhang
Xiangyu Yue
495
0
0
14 Apr 2025
MIEB: Massive Image Embedding Benchmark
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao
Isaac Chung
Imene Kerboua
Jamie Stirling
Xin Zhang
Márton Kardos
Roman Solomatin
Noura Al Moubayed
Kenneth Enevoldsen
Niklas Muennighoff
VLM
491
6
0
14 Apr 2025
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jaewoo Lee
Keyang Xuan
Chanakya Ekbote
Sandeep Polisetty
Yi R. Fung
Paul Pu Liang
VLM
374
3
0
14 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
Xuelong Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
229
20
0
14 Apr 2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang
Xuelong Li
Zilong Huang
Yuchen Ren
Weixian Lei
XueQing Deng
Shihao Chen
Shilin Xu
Jiashi Feng
MLLMLRM
319
18
0
14 Apr 2025
AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark
AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark
Aruna Gauba
Irene Pi
Yunze Man
Ziqi Pang
Vikram S. Adve
Yu-Xiong Wang
874
1
0
14 Apr 2025
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
961
0
0
11 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Xinze Wang
Zhiyong Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODDReLMLRMVLM
591
73
0
10 Apr 2025
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
Yijun Liang
Ming Li
Chenrui Fan
Ziyue Li
Dang Nguyen
Kwesi Cobbina
Shweta Bhardwaj
Jiuhai Chen
Fuxiao Liu
Tianyi Zhou
VLMCoGe
378
12
0
10 Apr 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Wenshu Fan
Qi Wang
Fuzheng Zhang
MLLMVLM
302
1
0
10 Apr 2025
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu
Kangheng Lin
Liang Zhao
Jisheng Yin
Yana Wei
...
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Jingyu Wang
Wenbing Tao
VLMOffRLLRM
318
57
0
10 Apr 2025
How Can Objects Help Video-Language Understanding?
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
355
2
0
10 Apr 2025
Kimi-VL Technical Report
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Longxiang Zhang
Zhe Chen
Zijia Zhao
Ziwei Chen
Zongyu Lin
MLLMVLMMoE
976
143
0
10 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Wenshu Fan
Qi Wang
Fuzheng Zhang
VLM
385
2
0
10 Apr 2025
OmniCaptioner: One Captioner to Rule Them All
OmniCaptioner: One Captioner to Rule Them All
Yiting Lu
Jiakang Yuan
Zhen Li
Jike Zhong
Qi Qin
...
Wenlong Zhang
Zhibo Chen
Peng Gao
Bo Zhang
Shiyang Feng
MLLM
447
11
0
09 Apr 2025
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local PerceptionComputer Vision and Pattern Recognition (CVPR), 2025
Ruotian Peng
Haiying He
Yake Wei
Yandong Wen
D. Hu
VLM
205
0
0
09 Apr 2025
Perception in Reflection
Perception in Reflection
Yana Wei
Liang Zhao
Kangheng Lin
En Yu
Yuang Peng
...
Jianjian Sun
Haoran Wei
Zheng Ge
Xiangyu Zhang
Vishal M. Patel
334
7
0
09 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
463
117
0
07 Apr 2025
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
Yimu Wang
Mozhgan Nasr Azadani
Sean Sedwards
Krzysztof Czarnecki
MoEMLLM
273
2
0
07 Apr 2025
OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
Chaoyi Wang
Baoqing Li
Xinhan Di
MLLMLRM
251
0
0
07 Apr 2025
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Yunlong Tang
Jing Bi
Chao Huang
Susan Liang
Daiki Shimada
...
Jinxi He
Liu He
Zeliang Zhang
Jiebo Luo
Chenliang Xu
259
8
0
07 Apr 2025
Slow-Fast Architecture for Video Multi-Modal Large Language Models
Slow-Fast Architecture for Video Multi-Modal Large Language Models
Min Shi
Shihao Wang
Chieh-Yun Chen
Jitesh Jain
Kai Wang
Junjun Xiong
Guilin Liu
Zhiding Yu
Humphrey Shi
228
7
0
02 Apr 2025
Aligned Better, Listen Better for Audio-Visual Large Language Models
Aligned Better, Listen Better for Audio-Visual Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025
Yuxin Guo
Shuailei Ma
Shijie Ma
Xiaoyi Bao
Chen-Wei Xie
Kecheng Zheng
Tingyu Weng
Siyang Sun
Yun Zheng
Wei Zou
MLLMAuLLM
324
8
0
02 Apr 2025
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Runhui Huang
Chunwei Wang
Junwei Yang
Guansong Lu
Yunlong Yuan
...
Lu Hou
Wei Zhang
Lanqing Hong
Hengshuang Zhao
Hang Xu
MLLM
366
31
0
02 Apr 2025
SPF-Portrait: Towards Pure Text-to-Portrait Customization with Semantic Pollution-Free Fine-Tuning
SPF-Portrait: Towards Pure Text-to-Portrait Customization with Semantic Pollution-Free Fine-Tuning
Xiaole Xian
Zhichao Liao
Qingyu Li
Wenyu Qin
Pengfei Wan
Weicheng Xie
Long Zeng
Linlin Shen
Pingfa Feng
DiffM
489
0
0
01 Apr 2025
Multi-Task Learning for Extracting Menstrual Characteristics from Clinical Notes
Multi-Task Learning for Extracting Menstrual Characteristics from Clinical Notes
Anna Shopova
Cristoph Lippert
Leslee J. Shaw
Eugenia Alleva
294
4
0
31 Mar 2025
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?Computer Vision and Pattern Recognition (CVPR), 2025
Fengxiang Wang
Hongru Wang
Mingshuo Chen
Haiyan Zhao
Yulin Wang
...
L. Lan
Wenjing Yang
Jing Zhang
Zhiyuan Liu
Maosong Sun
324
24
0
31 Mar 2025
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference
Kai Huang
Hao Zou
Bochen Wang
Ye Xi
Zhen Xie
Hao Wang
VLM
380
0
0
31 Mar 2025
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Jiahui Zhang
Yurui Chen
Yanpeng Zhou
Yueming Xu
Ze Huang
...
Xinyue Cai
G. Huang
Xingyue Quan
Hang Xu
Li Zhang
LRM
612
2
0
29 Mar 2025
Learning to Instruct for Visual Instruction Tuning
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou
Feng Hong
Jiaan Luo
Jiangchao Yao
Dongsheng Li
Bo Han
Yujiao Shi
Yanfeng Wang
VLM
421
3
0
28 Mar 2025
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
463
1
0
28 Mar 2025
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Yue Li
Meng Tian
Zhenyu Lin
Jiangtong Zhu
Dechang Zhu
Haiqiang Liu
Zining Wang
Yueyi Zhang
Zhiwei Xiong
Xinhai Zhao
CoGeVLM
366
11
0
27 Mar 2025
StarFlow: Generating Structured Workflow Outputs From Sketch Images
StarFlow: Generating Structured Workflow Outputs From Sketch Images
Patrice Bechard
Chao Wang
Amirhossein Abaskohi
Juan A. Rodriguez
Christopher Pal
David Vazquez
Spandana Gella
Sai Rajeswar
Perouz Taslakian
274
0
0
27 Mar 2025
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng
Ziyuan Huang
Kaixiang Ji
Manwen Liao
VLM
627
4
0
26 Mar 2025
Vision as LoRA
Vision as LoRA
Han Wang
Yongjie Ye
Bingru Li
Yuxiang Nie
Jinghui Lu
Jingqun Tang
Yanjie Wang
Can Huang
376
12
0
26 Mar 2025
Unified Multimodal Discrete Diffusion
Unified Multimodal Discrete Diffusion
Alexander Swerdlow
Mihir Prabhudesai
Siddharth Gandhi
Deepak Pathak
Katerina Fragkiadaki
DiffM
337
23
0
26 Mar 2025
LangBridge: Interpreting Image as a Combination of Language Embeddings
LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao
Yuwei Niu
Fanqing Meng
Hao Li
Changyao Tian
...
Dianqi Li
X. Zhu
Li Yuan
Jifeng Dai
Yu Cheng
MLLM
344
6
0
25 Mar 2025
Scaling Vision Pre-Training to 4K Resolution
Scaling Vision Pre-Training to 4K ResolutionComputer Vision and Pattern Recognition (CVPR), 2025
Baifeng Shi
Boyi Li
Han Cai
Yaojie Lu
Sifei Liu
...
Jan Kautz
Enze Xie
Trevor Darrell
Pavlo Molchanov
Hongxu Yin
CLIP
905
12
0
25 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Yue Yang
Afshin Dehghan
429
15
0
24 Mar 2025
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Yuxiao Chen
L. Meng
Wujian Peng
Zuxuan Wu
Yu-Gang Jiang
VLM
486
5
0
24 Mar 2025
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
Qiao Liang
Yanjiang Liu
Xianpei Han
Yaojie Lu
Hongyu Lin
Jia Zheng
Jia Zheng
Le Sun
Le Sun
Yingfei Sun
314
1
0
23 Mar 2025
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Yufei Zhan
Yousong Zhu
Shurong Zheng
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
VLM
316
53
0
23 Mar 2025
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu
Bing Li
Cheng Zheng
Jinjie Mai
Jun-Cheng Chen
...
Abdullah Hamdi
Sara Rojas Martinez
Chia-Wen Lin
Mohamed Elhoseiny
Bernard Ghanem
VLM
275
1
0
22 Mar 2025
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
Jianing Qi
Jiawei Liu
Hao Tang
Zhigang Zhu
348
5
0
21 Mar 2025
PVChat: Personalized Video Chat with One-Shot Learning
PVChat: Personalized Video Chat with One-Shot Learning
Yufei Shi
Weilong Yan
Gang Xu
Yumeng Li
Yongqian Li
Hao Sun
Fei Richard Yu
Ming Li
Si Yong Yeo
374
2
0
21 Mar 2025
M3: 3D-Spatial MultiModal Memory
M3: 3D-Spatial MultiModal MemoryInternational Conference on Learning Representations (ICLR), 2025
Xueyan Zou
Yuchen Song
Ri-Zhao Qiu
Xuanbin Peng
Jianglong Ye
Sifei Liu
Xiaolong Wang
3DGS
261
2
0
20 Mar 2025
A Vision Centric Remote Sensing Benchmark
A Vision Centric Remote Sensing Benchmark
Abduljaleel Adejumo
Faegheh Yeganli
Clifford Broni-bediako
Aoran Xiao
Xiangwei Zhu
Mennatullah Siam
397
0
0
20 Mar 2025
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Zhihang Liu
Chen-Wei Xie
Nianzu Yang
Liming Zhao
Longxiang Tang
Yun Zheng
Chuanbin Liu
Hongtao Xie
VLM
236
14
0
20 Mar 2025
Previous
123456789
Next