Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

24 June 2024

Sanghyun Woo

ArXiv (abs)PDF HTML HuggingFace (61 upvotes)

Papers citing "Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs"

50 / 413 papers shown

FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

466

14 Apr 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

...

599

806

14 Apr 2025

Multimodal Long Video Modeling Based on Temporal Dynamic Context

495

14 Apr 2025

MIEB: Massive Image Embedding Benchmark

491

14 Apr 2025

TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

374

14 Apr 2025

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

229

14 Apr 2025

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

319

14 Apr 2025

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

874

14 Apr 2025

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

Cheng-Yu Hsieh

Pavan Kumar Anasosalu Vasu

961

11 Apr 2025

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

591

10 Apr 2025

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

378

10 Apr 2025

Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models

302

10 Apr 2025

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

...

318

10 Apr 2025

How Can Objects Help Video-Language Understanding?

355

10 Apr 2025

Kimi-VL Technical Report

...

976

143

10 Apr 2025

Data Metabolism: An Efficient Data Design Schema For Vision Language Model

385

10 Apr 2025

OmniCaptioner: One Captioner to Rule Them All

...

447

09 Apr 2025

Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local PerceptionComputer Vision and Pattern Recognition (CVPR), 2025

205

09 Apr 2025

Perception in Reflection

...

334

09 Apr 2025

SmolVLM: Redefining small and efficient multimodal models

...

463

117

07 Apr 2025

LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts

273

07 Apr 2025

OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance

251

07 Apr 2025

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

...

259

07 Apr 2025

Slow-Fast Architecture for Video Multi-Modal Large Language Models

228

02 Apr 2025

Aligned Better, Listen Better for Audio-Visual Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025

324

02 Apr 2025

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

...

366

02 Apr 2025

SPF-Portrait: Towards Pure Text-to-Portrait Customization with Semantic Pollution-Free Fine-Tuning

489

01 Apr 2025

Multi-Task Learning for Extracting Menstrual Characteristics from Clinical Notes

294

31 Mar 2025

XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?Computer Vision and Pattern Recognition (CVPR), 2025

...

324

31 Mar 2025

AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference

380

31 Mar 2025

From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D

...

612

29 Mar 2025

Learning to Instruct for Visual Instruction Tuning

421

28 Mar 2025

Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization

Aitor Gonzalez-Agirre

Javier Hernando

Marta Villegas

VLM

463

28 Mar 2025

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving

366

27 Mar 2025

StarFlow: Generating Structured Workflow Outputs From Sketch Images

274

27 Mar 2025

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping

627

26 Mar 2025

376

26 Mar 2025

Unified Multimodal Discrete Diffusion

337

26 Mar 2025

LangBridge: Interpreting Image as a Combination of Language Embeddings

...

344

25 Mar 2025

Scaling Vision Pre-Training to 4K ResolutionComputer Vision and Pattern Recognition (CVPR), 2025

...

905

25 Mar 2025

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

429

24 Mar 2025

CoMP: Continual Multimodal Pre-training for Vision Foundation Models

486

24 Mar 2025

Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models

314

23 Mar 2025

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning

316

23 Mar 2025

4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding

...

275

22 Mar 2025

Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models

348

21 Mar 2025

PVChat: Personalized Video Chat with One-Shot Learning

374

21 Mar 2025

M3: 3D-Spatial MultiModal MemoryInternational Conference on Learning Representations (ICLR), 2025

261

20 Mar 2025

A Vision Centric Remote Sensing Benchmark

Abduljaleel Adejumo

Faegheh Yeganli

Clifford Broni-bediako

Aoran Xiao

Xiangwei Zhu

Mennatullah Siam

397

20 Mar 2025

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025

236

20 Mar 2025