ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.16860
  4. Cited By
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

24 June 2024
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
Sai Charitha Akula
Jihan Yang
Shusheng Yang
Adithya Iyer
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
    3DVMLLM
ArXiv (abs)PDFHTMLHuggingFace (61 upvotes)

Papers citing "Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs"

50 / 413 papers shown
Interpretable and Testable Vision Features via Sparse Autoencoders
Interpretable and Testable Vision Features via Sparse Autoencoders
Samuel Stevens
Wei-Lun Chao
T. Berger-Wolf
Yu-Chuan Su
VLM
403
17
0
10 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Zhiyong Yang
Mike Zheng Shou
MoE
704
2
0
10 Feb 2025
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?
Mennatullah Siam
VLM
791
3
0
06 Feb 2025
D-Attn: Decomposed Attention for Large Vision-and-Language Models
D-Attn: Decomposed Attention for Large Vision-and-Language Models
Chia-Wen Kuo
Sijie Zhu
Fan Chen
Xiaohui Shen
Longyin Wen
VLM
533
1
0
04 Feb 2025
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Shenghao Fu
Q. Yang
Qijie Mo
Junkai Yan
Xihan Wei
Jingke Meng
Xiaohua Xie
Wei-Shi Zheng
MLLMObjDVLM
453
33
0
31 Jan 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
693
419
0
28 Jan 2025
BiFold: Bimanual Cloth Folding with Language Guidance
BiFold: Bimanual Cloth Folding with Language GuidanceIEEE International Conference on Robotics and Automation (ICRA), 2025
Oriol Barbany
Adrià Colomé
Carme Torras
341
5
0
27 Jan 2025
TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos
TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos
Korawat Charoenpitaks
Van-Quang Nguyen
Masanori Suganuma
Kentaro Arai
Seiji Totsuka
Hiroshi Ino
Takayuki Okatani
VLM
147
2
0
10 Jan 2025
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
Run Luo
Ting-En Lin
Ning Yang
Yuchuan Wu
Xiong Liu
...
Lei Zhang
Emmanouil Benetos
Xiaobo Xia
Hamid Alinejad-Rokny
Fei Huang
VLMAuLLM
697
0
0
08 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
...
Yunhai Tong
Lu Qi
Jiashi Feng
Ming-Hsuan Yang
Ming-Hsuan Yang
VLM
612
68
0
07 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
465
33
0
06 Jan 2025
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model EvaluationComputer Vision and Pattern Recognition (CVPR), 2025
Yuhui Zhang
Yuchang Su
Yiming Liu
Xiaohan Wang
James Burgess
...
Josiah Aklilu
Alejandro Lozano
Anjiang Wei
Ludwig Schmidt
Serena Yeung-Levy
420
21
0
06 Jan 2025
Demystifying CLIP Data
Demystifying CLIP DataInternational Conference on Learning Representations (ICLR), 2023
Hu Xu
Saining Xie
Xiaoqing Ellen Tan
Po-Yao (Bernie) Huang
Russell Howes
Vasu Sharma
Shang-Wen Li
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLMCLIP
593
206
0
31 Dec 2024
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid
  Instruction Generation
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction GenerationInternational Conference on Computational Linguistics (COLING), 2024
Shijie Zhou
Ruiyi Zhang
Jiuxiang Gu
Changyou Chen
VLM
290
2
0
20 Dec 2024
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level
  Vision-Language Alignment
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language AlignmentComputer Vision and Pattern Recognition (CVPR), 2024
Cijo Jose
Théo Moutakanni
Dahyun Kang
Federico Baldassarre
Timothée Darcet
...
Maxime Oquab
Oriane Siméoni
Huy V. Vo
Patrick Labatut
Piotr Bojanowski
CLIPVLM
356
41
0
20 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Yi Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Xingtai Lv
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLMVLM
366
3
0
18 Dec 2024
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Apollo: An Exploration of Video Understanding in Large Multimodal ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Orr Zohar
Xiaohan Wang
Yann Dubois
Nikhil Mehta
Tong Xiao
...
Xiaofang Wang
F. Xu
Ning Zhang
Serena Yeung-Levy
Xide Xia
VLM
424
0
0
13 Dec 2024
Olympus: A Universal Task Router for Computer Vision Tasks
Olympus: A Universal Task Router for Computer Vision TasksComputer Vision and Pattern Recognition (CVPR), 2024
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Juil Sock
VLMObjD
1.2K
3
0
12 Dec 2024
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model
  with Transparent Explanations
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
Zhiwen Chen
Francesco Pinto
Minzhou Pan
Bo Li
347
19
0
09 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
Mingxing Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Ding Wang
Tao Chen
Bo Zhang
Xiangyu Yue
618
9
0
08 Dec 2024
VisionZip: Longer is Better but Not Necessary in Vision Language Models
VisionZip: Longer is Better but Not Necessary in Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Senqiao Yang
Yukang Chen
Zhuotao Tian
Chengyao Wang
Jingyao Li
Bei Yu
Jiaya Jia
VLM
293
107
0
05 Dec 2024
Florence-VL: Enhancing Vision-Language Models with Generative Vision
  Encoder and Depth-Breadth Fusion
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth FusionComputer Vision and Pattern Recognition (CVPR), 2024
Jiuhai Chen
Jianwei Yang
Haiping Wu
Dianqi Li
Jianfeng Gao
Tianyi Zhou
Bin Xiao
VLM
329
15
0
05 Dec 2024
FLAIR: VLM with Fine-grained Language-informed Image Representations
FLAIR: VLM with Fine-grained Language-informed Image RepresentationsComputer Vision and Pattern Recognition (CVPR), 2024
Rui Xiao
Sanghwan Kim
Mariana-Iuliana Georgescu
Zeynep Akata
Stephan Alaniz
VLMCLIP
315
21
0
04 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
396
6
0
02 Dec 2024
AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large
  Language Models
AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models
Yutong Zhou
Masahiro Ryo
408
5
0
30 Nov 2024
On Domain-Adaptive Post-Training for Multimodal Large Language Models
On Domain-Adaptive Post-Training for Multimodal Large Language Models
Daixuan Cheng
Shaohan Huang
Ziyu Zhu
Xintong Zhang
Wayne Xin Zhao
Zhongzhi Luan
Bo Dai
Zhenliang Zhang
VLM
494
5
0
29 Nov 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLMLRM
562
22
0
27 Nov 2024
NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?
NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?
Jiaxuan Li
Junwen Mo
MinhDuc Vo
Akihiro Sugimoto
Hideki Nakayama
338
1
0
26 Nov 2024
What's in the Image? A Deep-Dive into the Vision of Vision Language
  Models
What's in the Image? A Deep-Dive into the Vision of Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Omri Kaduri
Shai Bagon
Tali Dekel
VLMCoGe
214
24
0
26 Nov 2024
Efficient Multi-modal Large Language Models via Visual Token Grouping
Efficient Multi-modal Large Language Models via Visual Token Grouping
Minbin Huang
Runhui Huang
Han Shi
Yimeng Chen
Chuanyang Zheng
Xiangguo Sun
Xin Jiang
Zhiyu Li
Hong Cheng
VLM
364
6
0
26 Nov 2024
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Lei Li
Y. X. Wei
Zhihui Xie
Xuqing Yang
Yifan Song
...
Tianyu Liu
Sujian Li
Bill Yuchen Lin
Dianbo Sui
Qiang Liu
VLMCoGe
539
63
0
26 Nov 2024
DOGR: Towards Versatile Visual Document Grounding and Referring
DOGR: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou
Yuxin Chen
Haokun Lin
Shuyu Yang
Li Zhu
Chen Ma
Chen Ma
Mingyu Ding
Ying Shan
ObjD
557
4
0
26 Nov 2024
Factorized Visual Tokenization and Generation
Factorized Visual Tokenization and Generation
Zechen Bai
Jianxiong Gao
Ziteng Gao
Pichao Wang
Zheng Zhang
Tong He
Mike Zheng Shou
295
6
0
25 Nov 2024
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Probing the Mid-level Vision Capabilities of Self-Supervised LearningComputer Vision and Pattern Recognition (CVPR), 2024
Xuweiyi Chen
Markus Marks
Zezhou Cheng
484
4
0
25 Nov 2024
FINECAPTION: Compositional Image Captioning Focusing on Wherever You
  Want at Any Granularity
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any GranularityComputer Vision and Pattern Recognition (CVPR), 2024
Hang Hua
Qing Liu
Lingzhi Zhang
Jing Shi
Zhifei Zhang
Yilin Wang
Jianming Zhang
Jiebo Luo
CoGeVLM
332
17
0
23 Nov 2024
Panther: Illuminate the Sight of Multimodal LLMs with Instruction-Guided
  Visual Prompts
Panther: Illuminate the Sight of Multimodal LLMs with Instruction-Guided Visual Prompts
Honglin Li
Yuting Gao
Chenglu Zhu
Jingdong Chen
M. Yang
Lin Yang
MLLM
481
0
0
21 Nov 2024
From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao
Bin Zhu
Yue Yu
Chong-Wah Ngo
Yu-Gang Jiang
VLMOffRL
444
0
0
19 Nov 2024
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large
  Language Models on Mobile Devices
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile DevicesComputer Vision and Pattern Recognition (CVPR), 2024
Xudong Lu
Yinghao Chen
Cheng Chen
Hui Tan
Boheng Chen
...
Aojun Zhou
Yafei Wen
Xiaoxin Chen
Shuai Ren
Jiaming Song
210
20
0
16 Nov 2024
MLAN: Language-Based Instruction Tuning Preserves and Transfers Knowledge in Multimodal Language Models
MLAN: Language-Based Instruction Tuning Preserves and Transfers Knowledge in Multimodal Language Models
Jianhong Tu
Zhuohao Ni
Nicholas Crispino
Zihao Yu
Michael Bendersky
...
Ruoxi Jia
Xin Liu
Lingjuan Lyu
Dawn Song
Chenguang Wang
VLMMLLM
335
0
0
15 Nov 2024
Analyzing The Language of Visual Tokens
Analyzing The Language of Visual Tokens
David M. Chan
Rodolfo Corona
J. S. Park
Cheol Jun Cho
Yutong Bai
Trevor Darrell
105
9
0
07 Nov 2024
Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs
Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs
Chengxin Hu
Hao Li
Yihe Yuan
Jing Li
Ivor Tsang
429
6
0
07 Nov 2024
KptLLM: Unveiling the Power of Large Language Model for Keypoint
  Comprehension
KptLLM: Unveiling the Power of Large Language Model for Keypoint ComprehensionNeural Information Processing Systems (NeurIPS), 2024
Jie Yang
Wang Zeng
Sheng Jin
Lumin Xu
Wentao Liu
Chen Qian
Ruimao Zhang
MLLM
338
7
0
04 Nov 2024
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)International Conference on Learning Representations (ICLR), 2024
Leander Girrbach
Yiran Huang
Stephan Alaniz
Trevor Darrell
Zeynep Akata
VLM
432
8
0
25 Oct 2024
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Shuhao Gu
Jialing Zhang
Siyuan Zhou
Kevin Yu
Zhaohu Xing
...
Yufeng Cui
Xinlong Wang
Yaoqi Liu
Fangxiang Feng
Guang Liu
SyDaVLMMLLM
448
54
0
24 Oct 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Bin Wang
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
337
136
0
22 Oct 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5%
  Parameters and 90% Performance
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
404
88
0
21 Oct 2024
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM
  Pretraining
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Han Huang
Yuqi Huo
Zijia Zhao
Haoyu Lu
Shu Wu
Bin Wang
Qiang Liu
Weipeng Chen
Shu Wu
VLM
194
2
0
21 Oct 2024
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Y. Cai
Jiangning Zhang
Haoyang He
Xinwei He
Ao Tong
Zhenye Gan
Chengjie Wang
Zhucun Xue
Yong-Jin Liu
X. Bai
VLM
440
22
0
21 Oct 2024
SemiHVision: Enhancing Medical Multimodal Models with a Semi-Human
  Annotated Dataset and Fine-Tuned Instruction Generation
SemiHVision: Enhancing Medical Multimodal Models with a Semi-Human Annotated Dataset and Fine-Tuned Instruction Generation
Junda Wang
Yujan Ting
Eric Z. Chen
Hieu Tran
Hong-ye Yu
Weijing Huang
Terrence Chen
VLMLM&MA
298
1
0
19 Oct 2024
E3D-GPT: Enhanced 3D Visual Foundation for Medical Vision-Language Model
E3D-GPT: Enhanced 3D Visual Foundation for Medical Vision-Language Model
Zihang Jiang
Zihang Jiang
Qingsong Yao
Rongsheng Wang
Zhiyang He
Xiaodong Tao
Weifu Lv
Weifu Lv
Shuoling Zhou
VLMMedIm
165
11
0
18 Oct 2024
Previous
123456789
Next
Page 7 of 9