v1v2 (latest)

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Neural Information Processing Systems (NeurIPS), 2022

20 September 2022

Oyvind Tafjord

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"

50 / 1,273 papers shown

Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

Ser-Nam Lim

261

15 Mar 2024

EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Rocktim Jyoti Das

Simeon Emilov Hristov

Jinyan Su

Dimitar Iliyanov Dimitrov

Ivan Koychev

Preslav Nakov

CoGe ELM

260

15 Mar 2024

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

...

524

246

14 Mar 2024

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text TransformationEuropean Conference on Computer Vision (ECCV), 2024

Yunhao Gou

Kai Chen

Zhili Liu

Lanqing Hong

Hang Xu

340

101

14 Mar 2024

UniCode: Learning a Unified Codebook for Multimodal Large Language ModelsEuropean Conference on Computer Vision (ECCV), 2024

225

14 Mar 2024

CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language ModelNeural Information Processing Systems (NeurIPS), 2024

Lianli Gao

Jingkuan Song

CLL

202

13 Mar 2024

MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

294

13 Mar 2024

Multi-modal Auto-regressive Modeling via Visual WordsACM Multimedia (MM), 2024

Bo Du

156

12 Mar 2024

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language ModelsEuropean Conference on Computer Vision (ECCV), 2024

Shuai Bai

Chang Zhou

Baobao Chang

MLLM VLM

342

333

11 Mar 2024

Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024

305

10 Mar 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

...

Chengqi Deng

463

647

08 Mar 2024

Chain of Thought Explanation for Dialogue State Tracking

See-Kiong Ng

220

07 Mar 2024

Embodied Understanding of Driving ScenariosEuropean Conference on Computer Vision (ECCV), 2024

Yu Qiao

255

07 Mar 2024

Adaptive Task Balancing for Visual Instruction Tuning via Inter-Task Contribution and Intra-Task Difficulty

Xiangxiang Chu

Zhiwu Lu

341

07 Mar 2024

Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

331

06 Mar 2024

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

237

05 Mar 2024

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Hongxia Yang

149

03 Mar 2024

Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models

Lei Li

Yuqi Wang

Runxin Xu

Peiyi Wang

Xiachong Feng

Lingpeng Kong

Qi Liu

358

01 Mar 2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

...

Yu Qiao

318

29 Feb 2024

TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning

Kate Sanders

Nathaniel Weir

Benjamin Van Durme

LRM

263

29 Feb 2024

Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

342

29 Feb 2024

ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph

Xing Xie

175

29 Feb 2024

A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist

...

472

113

28 Feb 2024

Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

342

27 Feb 2024

Measuring Vision-Language STEM Skills of Neural Models

430

27 Feb 2024

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Omkar Thawakar

Ashmal Vayani

Salman Khan

Hisham Cholakal

Rao M. Anwer

Michael Felsberg

Timothy Baldwin

Eric P. Xing

Fahad Shahbaz Khan

240

26 Feb 2024

LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding

Yuxuan Wang

Yueqian Wang

Pengfei Wu

Jianxin Liang

Dongyan Zhao

Zilong Zheng

VLM

268

25 Feb 2024

Multimodal Instruction Tuning with Conditional Mixture of LoRA

206

24 Feb 2024

GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation

Yi Zong

Xipeng Qiu

ELM VLM

151

24 Feb 2024

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

310

23 Feb 2024

CommVQA: Situating Visual Question Answering in Communicative Contexts

22 Feb 2024

Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images

Zhen Han

Jindong Gu

279

22 Feb 2024

Towards Robust Instruction Tuning on Multimodal Large Language Models

294

22 Feb 2024

Uncertainty-Aware Evaluation for Vision-Language Models

440

22 Feb 2024

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

...

Yuxiang Zhang

Jie Liu

Lei Qi

Zhiyuan Liu

Maosong Sun

ELM AIMat

407

690

21 Feb 2024

BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

Wei Bi

Lingpeng Kong

LRM

291

21 Feb 2024

Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

Baotian Hu

184

21 Feb 2024

FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning

611

20 Feb 2024

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

542

20 Feb 2024

Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection

196

19 Feb 2024

The Revolution of Multimodal Large Language Models: A Survey

Lorenzo Baraldi

359

123

19 Feb 2024

Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

Matthias Hein

389

19 Feb 2024

High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models

Michela Lorandi

Anya Belz

168

19 Feb 2024

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

Didi Zhu

222

19 Feb 2024

RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

...

184

19 Feb 2024

ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models

Xiang Wan

388

184

18 Feb 2024

Efficient Multimodal Learning from Data-centric Perspective

Tiejun Huang

273

121

18 Feb 2024

Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

271

165

18 Feb 2024

BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering

Jing Gao

232

16 Feb 2024

EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models

284

15 Feb 2024