v1v2 (latest)

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Neural Information Processing Systems (NeurIPS), 2022

20 September 2022

Oyvind Tafjord

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"

50 / 1,273 papers shown

DoRA: Weight-Decomposed Low-Rank Adaptation

770

676

14 Feb 2024

Higher Layers Need More LoRA Experts

Ruibo Liu

Jie Yang

208

13 Feb 2024

VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction OptimizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

297

12 Feb 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

...

Yu Qiao

523

139

08 Feb 2024

SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark

282

06 Feb 2024

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

...

Chunhua Shen

238

149

06 Feb 2024

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional TokenizationInternational Conference on Machine Learning (ICML), 2024

Kun Xu

...

262

05 Feb 2024

MULTI: Multimodal Understanding Leaderboard with Text and Images

...

374

05 Feb 2024

Copyright Protection in Generative AI: A Technical Perspective

...

337

04 Feb 2024

Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

Timothy M. Hospedales

VLM MLLM

290

111

03 Feb 2024

Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis

Jianing Li

Xi Nan

Ming Lu

Li Du

Shanghang Zhang

148

31 Jan 2024

SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models

226

31 Jan 2024

MouSi: Poly-Visual-Expert Vision-Language Models

...

Xipeng Qiu

Xuanjing Huang

Zuxuan Wu

Yunchun Jiang

VLM

159

30 Jan 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

...

Conghui He

Xingcheng Zhang

Yu Qiao

Dahua Lin

Yuan Liu

VLM MLLM

370

344

29 Jan 2024

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Bin Lin

...

439

269

29 Jan 2024

Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQAAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

292

29 Jan 2024

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

297

25 Jan 2024

WebVoyager: Building an End-to-End Web Agent with Large Multimodal ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Hongliang He

Wenlin Yao

Kaixin Ma

Wenhao Yu

Dong Yu

501

239

25 Jan 2024

Demystifying Chains, Trees, and Graphs of ThoughtsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

...

1.0K

25 Jan 2024

MM-LLMs: Recent Advances in MultiModal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

512

335

24 Jan 2024

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with InstructionsAAAI Conference on Artificial Intelligence (AAAI), 2024

262

24 Jan 2024

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts ReasoningAAAI Conference on Artificial Intelligence (AAAI), 2024

Godawari Sudhakar Rao

LRM

179

23 Jan 2024

Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation

...

255

18 Jan 2024

Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future TrendsIEEE Transactions on Knowledge and Data Engineering (TKDE), 2024

441

15 Jan 2024

GroundingGPT:Language Enhanced Multi-modal Grounding ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

...

621

11 Jan 2024

REBUS: A Robust Evaluation Benchmark of Understanding Symbols

Andrew Gritsevskiy

128

11 Jan 2024

AutoAct: Automatic Agent Learning from Scratch for QA via Self-PlanningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Ningyu Zhang

Huajun Chen

350

10 Jan 2024

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Jianbo Yuan

Hongxia Yang

315

146

10 Jan 2024

CaMML: Context-Aware Multimodal Learner for Large Models

276

06 Jan 2024

LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model

729

145

04 Jan 2024

GPT-4V(ision) is a Generalist Web Agent, if GroundedInternational Conference on Machine Learning (ICML), 2024

Huan Sun

385

407

03 Jan 2024

GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social AbuseACM Transactions on Intelligent Systems and Technology (ACM TIST), 2024

534

03 Jan 2024

Video Understanding with Large Language Models: A Survey

...

717

170

29 Dec 2023

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

282

274

28 Dec 2023

MIVC: Multiple Instance Visual Component for Visual-Language Models

201

28 Dec 2023

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

...

Chunhua Shen

312

28 Dec 2023

Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey

198

27 Dec 2023

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

...

Dahua Lin

396

21 Dec 2023

Generative Multimodal Models are In-Context Learners

...

Tiejun Huang

371

419

20 Dec 2023

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

Yunhao Gou

Zhili Liu

Kai Chen

Lanqing Hong

Hang Xu

429

102

19 Dec 2023

A Survey of Reasoning with Foundation Models

Zhengying Liu

...

Xipeng Qiu

Qun Liu

582

17 Dec 2023

Decoding Concerns: Multi-label Classification of Vaccine Sentiments in Social Media

Somsubhra De

Shaurya Vats

189

17 Dec 2023

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language ModelsEuropean Conference on Computer Vision (ECCV), 2023

Jinjin Gu

400

14 Dec 2023

Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2023

193

14 Dec 2023

VILA: On Pre-training for Visual Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023

Song Han

641

681

12 Dec 2023

Honeybee: Locality-enhanced Projector for Multimodal LLM

401

197

11 Dec 2023

Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator

460

11 Dec 2023

AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

820

121

10 Dec 2023

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023

191

09 Dec 2023

GlitchBench: Can large multimodal models detect video game glitches?Computer Vision and Pattern Recognition (CVPR), 2023

Mohammad Reza Taesiri

328

08 Dec 2023