v1v2 (latest)

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Neural Information Processing Systems (NeurIPS), 2022

20 September 2022

Oyvind Tafjord

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"

50 / 1,273 papers shown

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Ahmed Hassan Awadallah

...

Yue Zhang

611

1,908

22 Apr 2024

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Sijie Zhao

Ying Shan

432

243

22 Apr 2024

MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

Kaixin Ma

364

21 Apr 2024

FakeBench: Uncover the Achilles' Heels of Fake Images with Large Multimodal Models

Yixuan Li

Xuelin Liu

Xiaoyang Wang

Shiqi Wang

Weisi Lin

314

20 Apr 2024

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

301

19 Apr 2024

Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning

Bin Zhu

Na Zhao

Yu-Gang Jiang

LRM

347

19 Apr 2024

MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

246

18 Apr 2024

Missed Connections: Lateral Thinking Puzzles for Large Language Models

330

17 Apr 2024

Self-Supervised Visual Preference Alignment

Ke Zhu

Liang Zhao

Zheng Ge

Xiangyu Zhang

215

16 Apr 2024

AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception

Weisi Lin

288

15 Apr 2024

On Speculative Decoding for Multimodal Large Language Models

173

13 Apr 2024

Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval

Juraj Vladika

Florian Matthes

RALM

229

12 Apr 2024

DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation

173

11 Apr 2024

MM-PhyQA: Multimodal Physics Question-Answering With Multi-Image CoT Prompting

145

11 Apr 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HDNeural Information Processing Systems (NeurIPS), 2024

...

Dahua Lin

276

160

09 Apr 2024

OmniFusion Technical Report

193

09 Apr 2024

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

...

Peng Zhou

317

136

08 Apr 2024

CoReS: Orchestrating the Dance of Reasoning and Segmentation

359

08 Apr 2024

Navigating the Landscape of Hint Generation Research: From the Past to the Future

Anubhav Jangra

Jamshid Mozafari

Adam Jatowt

Smaranda Muresan

271

06 Apr 2024

Measuring Social Norms of Large Language Models

385

03 Apr 2024

What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and BiasesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

258

03 Apr 2024

ViTamin: Designing Scalable Vision Models in the Vision-Language EraComputer Vision and Pattern Recognition (CVPR), 2024

Liang-Chieh Chen

415

02 Apr 2024

IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations

459

01 Apr 2024

Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs

Jing Gao

323

01 Apr 2024

How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset

Akash Ghosh

Venkata Sahith Bathini

224

30 Mar 2024

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

181

29 Mar 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?

...

Yu Qiao

Dahua Lin

Feng Zhao

VLM

420

560

29 Mar 2024

Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering

193

28 Mar 2024

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

412

325

27 Mar 2024

Beyond Embeddings: The Promise of Visual Table in Visual Reasoning

Yiwu Zhong

Zi-Yuan Hu

Michael R. Lyu

Liwei Wang

235

27 Mar 2024

Assessment of Multimodal Large Language Models in Alignment with Human Values

Yu Qiao

230

26 Mar 2024

DreamLIP: Language-Image Pre-training with Long Captions

313

25 Mar 2024

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Han Xiao

349

218

25 Mar 2024

Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art

Neeloy Chakraborty

Melkior Ornik

Katherine Driggs-Campbell

LRM

451

25 Mar 2024

Enhancing Video Transformers for Action Understanding with VLM-aided Training

225

24 Mar 2024

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Yan Yan

595

213

22 Mar 2024

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models

Qiong Wu

191

22 Mar 2024

A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal ReasoningACM Multimedia (MM), 2024

210

22 Mar 2024

ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting

163

21 Mar 2024

VL-Mamba: Exploring State Space Models for Multimodal Learning

Qi Wu

241

108

20 Mar 2024

Improved Baselines for Data-efficient Perceptual Augmentation of LLMs

317

20 Mar 2024

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

...

239

20 Mar 2024

PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns

232

20 Mar 2024

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

Jie Zhou

251

19 Mar 2024

Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference

Baolin Li

Yankai Jiang

V. Gadepally

Devesh Tiwari

244

19 Mar 2024

VL-ICL Bench: The Devil in the Details of Multimodal In-Context LearningInternational Conference on Learning Representations (ICLR), 2024

Yongshuo Zong

Ondrej Bohdal

Timothy M. Hospedales

350

19 Mar 2024

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution ImagesEuropean Conference on Computer Vision (ECCV), 2024

Zhiyuan Liu

Maosong Sun

Gao Huang

VLM MLLM

395

170

18 Mar 2024

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Ran Xu

288

17 Mar 2024

ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization

261

17 Mar 2024

CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations

317

17 Mar 2024