v1v2 (latest)

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Neural Information Processing Systems (NeurIPS), 2022

20 September 2022

Oyvind Tafjord

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"

50 / 1,273 papers shown

To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Junyan Lin

Haoran Chen

Dawei Zhu

Xiaoyu Shen

143

09 Oct 2024

MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA

Hanrong Ye

Haotian Zhang

...

Yinfei Yang

297

09 Oct 2024

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference TimeInternational Conference on Learning Representations (ICLR), 2024

317

09 Oct 2024

Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See

Chenliang Xu

257

08 Oct 2024

ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt

287

08 Oct 2024

Intriguing Properties of Large Language and Vision Models

292

07 Oct 2024

Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

Himanshu Gupta

Shreyas Verma

Ujjwala Anantheswaran

Swaroop Mishra

263

06 Oct 2024

Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs

Javier Marin

LRM

534

196

06 Oct 2024

Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

594

05 Oct 2024

An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation

Ahmed Abdulaal

Nina Montaña-Brown

Daniel Coelho De Castro

MedIm

293

04 Oct 2024

Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

217

04 Oct 2024

DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Zhuowen Tu

266

04 Oct 2024

AuroraCap: Efficient, Performant Video Detailed Captioning and a New BenchmarkInternational Conference on Learning Representations (ICLR), 2024

Christopher D. Manning

3DV

654

102

04 Oct 2024

Unlocking Structured Thinking in Language Models with Cognitive Prompting

Oliver Kramer

Jill Baumann

ReLM LRM

298

03 Oct 2024

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Zhengfeng Lai

Vasileios Saveris

Chen Chen

Hong-You Chen

Haotian Zhang

...

Wenze Hu

Zhe Gan

Peter Grasch

Meng Cao

Yinfei Yang

VLM

177

03 Oct 2024

Justice or Prejudice? Quantifying Biases in LLM-as-a-JudgeInternational Conference on Learning Representations (ICLR), 2024

Jiayi Ye

Zixiang Xu

Yue Huang

Dongping Chen

...

Xiangliang Zhang

368

207

03 Oct 2024

NL-Eye: Abductive NLI for ImagesInternational Conference on Learning Representations (ICLR), 2024

308

03 Oct 2024

Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities

Paul Jäger

Mennatallah El-Assady

190

02 Oct 2024

Question-guided Knowledge Graph Re-scoring and Injection for Knowledge Graph Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Yu Zhang

Kehai Chen

Xuefeng Bai

zhao kang

Quanjiang Guo

Min Zhang

301

02 Oct 2024

DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models

Yuxuan Zhang

Ruizhe Li

MoMe

502

02 Oct 2024

EMMA: Efficient Visual Alignment in Multi-Modal LLMs

Sara Ghazanfari

Alexandre Araujo

Prashanth Krishnamurthy

Siddharth Garg

Farshad Khorrami

VLM

313

02 Oct 2024

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic ManipulationInternational Conference on Learning Representations (ICLR), 2024

Dieter Fox

Ajay Mandlekar

Yijie Guo

VLM LRM

273

01 Oct 2024

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Haotian Zhang

Mingfei Gao

...

Zirui Wang

Yinfei Yang

307

30 Sep 2024

World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and FilteringConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

305

30 Sep 2024

T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness RecognitionNeural Information Processing Systems (NeurIPS), 2024

189

29 Sep 2024

See then Tell: Enhancing Key Information Extraction with Vision Grounding

253

29 Sep 2024

Emu3: Next-Token Prediction is All You Need

Xinlong Wang

Xiaosong Zhang

Zhengxiong Luo

Quan-Sen Sun

Yufeng Cui

...

Xi Yang

Jingjing Liu

Yonghua Lin

Tiejun Huang

Zhongyuan Wang

MLLM

292

495

27 Sep 2024

Align

^2

LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

...

Juncheng Li

Hao Jiang

Haoyuan Li

Yueting Zhuang

MLLM ALM

115

27 Sep 2024

MIO: A Foundation Model on Multimodal Tokens

...

473

26 Sep 2024

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid EmotionsComputer Vision and Pattern Recognition (CVPR), 2024

Kai Chen

Zhili Liu

...

Jun Yao

447

26 Sep 2024

Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM

184

25 Sep 2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal ModelsComputer Vision and Pattern Recognition (CVPR), 2024

Matt Deitke

Christopher Clark

Sangho Lee

Rohun Tripathi

Yue Yang

...

Noah A. Smith

Hannaneh Hajishirzi

Ross Girshick

Ali Farhadi

Aniruddha Kembhavi

OSLM VLM

470

25 Sep 2024

Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language ModelsInternational Conference on Computational Linguistics (COLING), 2024

Patrick Amadeus Irawan

Genta Indra Winata

Samuel Cahyawijaya

Ayu Purwarianti

272

23 Sep 2024

Phantom of Latent for Large Language and Vision Models

Yong Man Ro

285

23 Sep 2024

Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization

Minyi Zhao

Jiyuan Zhang

Shuigeng Zhou

325

22 Sep 2024

Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science CommunicatorsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Tanmoy Chakraborty

236

21 Sep 2024

Enhancing Advanced Visual Reasoning Ability of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Chaoyi Zhang

Weidong Cai

262

21 Sep 2024

AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual GranularityAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

316

20 Sep 2024

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Renrui Zhang

...

Guanglu Song

Peng Gao

Yu Liu

Chunyuan Li

Hongsheng Li

MLLM

294

19 Sep 2024

JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated ImagesNeural Information Processing Systems (NeurIPS), 2024

Zhecan Wang

Junzhang Liu

Chia-Wei Tang

Hani Alomari

Anushka Sivakumar

...

Haoxuan You

A. Ishmam

Kai-Wei Chang

Shih-Fu Chang

Chris Thomas

CoGe VLM

531

19 Sep 2024

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Xiaotian Han

Yiren Jian

Xuefeng Hu

Haogeng Liu

Yiqi Wang

...

Yuang Ai

Huaibo Huang

Ran He

Zhenheng Yang

Quanzeng You

LRM AI4CE

206

19 Sep 2024

From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models

428

19 Sep 2024

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoningInternational Conference on Learning Representations (ICLR), 2024

677

238

18 Sep 2024

NVLM: Open Frontier-Class Multimodal LLMs

Wenliang Dai

Zihan Liu

308

114

17 Sep 2024

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024

330

16 Sep 2024

Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding

145

13 Sep 2024

What Makes a Maze Look Like a Maze?International Conference on Learning Representations (ICLR), 2024

487

12 Sep 2024

Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive DecodingChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2024

Yuchen Yang

Jian Chen

150

10 Sep 2024

POINTS: Improving Your Vision-language Model with Affordable Strategies

261

07 Sep 2024

An overview of domain-specific foundation model: key technologies, applications and challengesScience China Information Sciences (Sci. China Inf. Sci.), 2024

498

06 Sep 2024