Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2406.16860
Cited By

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

24 June 2024

Sanghyun Woo

Manoj Middepogu

Sai Charitha Akula

ArXiv (abs)PDF HTML HuggingFace (61 upvotes)

Papers citing "Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs"

50 / 413 papers shown

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

...

Zhaoxiang Zhang

226

0

0

21 Oct 2025

FineVision: Open Data Is All You Need

FineVision: Open Data Is All You Need

Leandro von Werra

Aritra Roy Gosthipaty

Andrés Marafioti

196

13

0

20 Oct 2025

SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning

SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning

...

159

2

0

18 Oct 2025

VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs

VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs

254

0

0

18 Oct 2025

MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models

MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models

...

138

4

0

18 Oct 2025

RL makes MLLMs see better than SFT

RL makes MLLMs see better than SFT

193

0

0

18 Oct 2025

Vision-Centric Activation and Coordination for Multimodal Large Language Models

Vision-Centric Activation and Coordination for Multimodal Large Language Models

366

0

0

16 Oct 2025

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding

Mohamed Elhoseiny

114

5

0

15 Oct 2025

Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models

Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models

Guangliang Cheng

281

0

0

15 Oct 2025

Scope: Selective Cross-modal Orchestration of Visual Perception Experts

Scope: Selective Cross-modal Orchestration of Visual Perception Experts

Juan A. Rodriguez

Perouz Taslakian

279

0

0

14 Oct 2025

Point Prompting: Counterfactual Tracking with Video Diffusion Models

Point Prompting: Counterfactual Tracking with Video Diffusion Models

Ayush Shrivastava

129

1

0

13 Oct 2025

Scaling Language-Centric Omnimodal Representation Learning

Scaling Language-Centric Omnimodal Representation Learning

Mahani Aljunied

140

0

0

13 Oct 2025

A Survey on Agentic Multimodal Large Language Models

A Survey on Agentic Multimodal Large Language Models

...

LM&Ro AIFin AI4TS LRM AI4CE

250

5

0

13 Oct 2025

Data or Language Supervision: What Makes CLIP Better than DINO?

Data or Language Supervision: What Makes CLIP Better than DINO?

Serena Yeung-Levy

126

1

0

13 Oct 2025

Task-Aware Resolution Optimization for Visual Large Language Models

Task-Aware Resolution Optimization for Visual Large Language Models

82

0

0

10 Oct 2025

Unleashing Perception-Time Scaling to Multimodal Reasoning Models

Unleashing Perception-Time Scaling to Multimodal Reasoning Models

146

1

0

10 Oct 2025

Evaluating Small Vision-Language Models on Distance-Dependent Traffic Perception

Evaluating Small Vision-Language Models on Distance-Dependent Traffic Perception

Nikos Theodoridis

Fiachra Collins

Anthony G. Scanlan

140

1

0

09 Oct 2025

SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models

SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models

261

7

0

09 Oct 2025

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Chen Change Loy

183

2

0

09 Oct 2025

TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics

TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics

Shanghang Zhang

239

9

0

08 Oct 2025

Automated Repeatable Adversary Threat Emulation with Effects Language (EL)

Automated Repeatable Adversary Threat Emulation with Effects Language (EL)

Suresh Damodaran

135

9

0

07 Oct 2025

Visual Representations inside the Language Model

Visual Representations inside the Language Model

Madeleine Grunde-McLaughlin

151

2

0

06 Oct 2025

MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition

MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition

Umberto Cappellazzo

Stavros Petridis

155

0

0

05 Oct 2025

InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions

InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions

180

0

0

03 Oct 2025

OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows

OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows

Luke Zettlemoyer

Ricky T. Q. Chen

207

4

0

03 Oct 2025

RefineShot: Rethinking Cinematography Understanding with Foundational Skill Evaluation

RefineShot: Rethinking Cinematography Understanding with Foundational Skill Evaluation

Ming-Hsuan Yang

175

1

0

02 Oct 2025

Mitigating Modal Imbalance in Multimodal Reasoning

Mitigating Modal Imbalance in Multimodal Reasoning

Aditi Raghunathan

146

1

0

02 Oct 2025

VIRTUE: Visual-Interactive Text-Image Universal Embedder

VIRTUE: Visual-Interactive Text-Image Universal Embedder

Kazuya Tateishi

Shusuke Takahashi

146

0

0

01 Oct 2025

Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training

Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training

Filippos Kokkinos

204

7

0

30 Sep 2025

Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification

Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification

J. A. dos Santos

112

0

0

30 Sep 2025

Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding

Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding

181

1

0

29 Sep 2025

Vision Function Layer in Multimodal LLMs

Vision Function Layer in Multimodal LLMs

129

3

0

29 Sep 2025

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

...

Jiankang Deng

344

43

0

28 Sep 2025

Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding

Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding

187

1

1

27 Sep 2025

Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional

Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional

Varshan Muhunthan

123

1

0

27 Sep 2025

MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models

MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models

109

0

0

26 Sep 2025

The Photographer Eye: Teaching Multimodal Large Language Models to Understand Image Aesthetics like Photographers

The Photographer Eye: Teaching Multimodal Large Language Models to Understand Image Aesthetics like PhotographersComputer Vision and Pattern Recognition (CVPR), 2025

Franck Dernoncourt

239

1

0

23 Sep 2025

History-Aware Visuomotor Policy Learning via Point Tracking

History-Aware Visuomotor Policy Learning via Point Tracking

165

2

0

21 Sep 2025

Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception

Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception

269

0

0

21 Sep 2025

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

...

Zhengdong Zhang

205

4

0

19 Sep 2025

Decoupled Proxy Alignment: Mitigating Language Prior Conflict for Multimodal Alignment in MLLM

Decoupled Proxy Alignment: Mitigating Language Prior Conflict for Multimodal Alignment in MLLM

133

0

0

18 Sep 2025

Re-purposing SAM into Efficient Visual Projectors for MLLM-Based Referring Image Segmentation

Re-purposing SAM into Efficient Visual Projectors for MLLM-Based Referring Image Segmentation

119

0

0

17 Sep 2025

SAIL-VL2 Technical Report

SAIL-VL2 Technical Report

...

297

4

0

17 Sep 2025

ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement

ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement

Amirhossein Abaskohi

Mir Rayat Imtiaz Hossain

Giuseppe Carenini

103

1

0

16 Sep 2025

Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models

Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models

OffRL ReLM LRM VLM

200

6

0

16 Sep 2025

Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis

Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis

172

5

0

11 Sep 2025

Measuring Epistemic Humility in Multimodal Large Language Models

Measuring Epistemic Humility in Multimodal Large Language Models

143

2

0

11 Sep 2025

RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation

RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation

148

1

0

10 Sep 2025

Point Linguist Model: Segment Any Object via Bridged Large 3D-Language Model

Point Linguist Model: Segment Any Object via Bridged Large 3D-Language Model

145

1

0

09 Sep 2025

Visual Representation Alignment for Multimodal Large Language Models

Visual Representation Alignment for Multimodal Large Language Models

...

125

11

0

09 Sep 2025

1 2 3 4 5 6 7 8 9