ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.09513
  4. Cited By
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
v1v2 (latest)

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Neural Information Processing Systems (NeurIPS), 2022
20 September 2022
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
    ELMReLMLRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"

50 / 1,273 papers shown
Mitigating Dialogue Hallucination for Large Vision Language Models via
  Adversarial Instruction Tuning
Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning
Dongmin Park
Zhaofang Qian
Guangxing Han
Ser-Nam Lim
MLLM
261
1
0
15 Mar 2024
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for
  Evaluating Vision Language Models
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Rocktim Jyoti Das
Simeon Emilov Hristov
Jinyan Su
Dimitar Iliyanov Dimitrov
Ivan Koychev
Preslav Nakov
CoGeELM
260
43
0
15 Mar 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie
Zhe Gan
J. Fauconnier
Sam Dodge
Bowen Zhang
...
Zirui Wang
Ruoming Pang
Peter Grasch
Alexander Toshev
Yinfei Yang
MLLM
524
246
0
14 Mar 2024
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text
  Transformation
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text TransformationEuropean Conference on Computer Vision (ECCV), 2024
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Hang Xu
Zhenguo Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MLLM
340
101
0
14 Mar 2024
UniCode: Learning a Unified Codebook for Multimodal Large Language
  Models
UniCode: Learning a Unified Codebook for Multimodal Large Language ModelsEuropean Conference on Computer Vision (ECCV), 2024
Sipeng Zheng
Bohan Zhou
Yicheng Feng
Ye Wang
Zongqing Lu
VLMMLLM
225
14
0
14 Mar 2024
CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large
  Language Model
CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language ModelNeural Information Processing Systems (NeurIPS), 2024
Cheng Chen
Sitong Su
Xu Luo
Hengtao Shen
Lianli Gao
Jingkuan Song
CLL
202
32
0
13 Mar 2024
MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular
  Comprehension
MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xingyu Lu
He Cao
Zijing Liu
Shengyuan Bai
Leqing Chen
Xingtai Lv
Hai-Tao Zheng
Yu-Feng Li
HILM
294
14
0
13 Mar 2024
Multi-modal Auto-regressive Modeling via Visual Words
Multi-modal Auto-regressive Modeling via Visual WordsACM Multimedia (MM), 2024
Tianshuo Peng
Zuchao Li
Lefei Zhang
Hai Zhao
Ping Wang
Bo Du
OffRL
156
1
0
12 Mar 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference
  Acceleration for Large Vision-Language Models
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language ModelsEuropean Conference on Computer Vision (ECCV), 2024
Liang Chen
Haozhe Zhao
Tianyu Liu
Shuai Bai
Junyang Lin
Chang Zhou
Baobao Chang
MLLMVLM
342
333
0
11 Mar 2024
Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small
  Language Models
Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024
Minjie Zhu
Yichen Zhu
Xin Liu
Ning Liu
Zhiyuan Xu
Yaxin Peng
Chaomin Shen
Zhicai Ou
Feifei Feng
Jian Tang
VLM
305
27
0
10 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
463
647
0
08 Mar 2024
Chain of Thought Explanation for Dialogue State Tracking
Chain of Thought Explanation for Dialogue State Tracking
Lin Xu
Ningxin Peng
Daquan Zhou
See-Kiong Ng
Jinlan Fu
LRM
220
3
0
07 Mar 2024
Embodied Understanding of Driving Scenarios
Embodied Understanding of Driving ScenariosEuropean Conference on Computer Vision (ECCV), 2024
Yunsong Zhou
Linyan Huang
Qingwen Bu
Jia Zeng
Tianyu Li
Hang Qiu
Hongzi Zhu
Minyi Guo
Yu Qiao
Hongyang Li
LM&Ro
255
53
0
07 Mar 2024
Adaptive Task Balancing for Visual Instruction Tuning via Inter-Task Contribution and Intra-Task Difficulty
Adaptive Task Balancing for Visual Instruction Tuning via Inter-Task Contribution and Intra-Task Difficulty
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Xiangxiang Chu
Zhiwu Lu
341
4
0
07 Mar 2024
Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious
  Challenges in Multimodal Reasoning
Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning
Deepanway Ghosal
Vernon Toh Yan Han
Chia Yew Ken
Soujanya Poria
ReLMLRM
331
20
0
06 Mar 2024
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large
  Language Models
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Gen Luo
Weihao Ye
Yuxin Zhang
Xiawu Zheng
Xiaoshuai Sun
Rongrong Ji
VLM
237
98
0
05 Mar 2024
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Haogeng Liu
Quanzeng You
Xiaotian Han
Yiqi Wang
Bohan Zhai
Yongfei Liu
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
MLLM
149
11
0
03 Mar 2024
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of
  Large Vision-Language Models
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
Lei Li
Yuqi Wang
Runxin Xu
Peiyi Wang
Xiachong Feng
Lingpeng Kong
Qi Liu
358
96
0
01 Mar 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the
  Open World
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang
Yiming Ren
Hao Luo
Tiantong Li
Chenxiang Yan
...
Qingyun Li
Lewei Lu
Xizhou Zhu
Yu Qiao
Jifeng Dai
MLLM
318
86
0
29 Feb 2024
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning
Kate Sanders
Nathaniel Weir
Benjamin Van Durme
LRM
263
14
0
29 Feb 2024
Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient
  Tuning
Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning
Weijieying Ren
Xinlong Li
Lei Wang
Tianxiang Zhao
Wei Qin
CLLKELM
342
57
0
29 Feb 2024
ToolNet: Connecting Large Language Models with Massive Tools via Tool
  Graph
ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph
Xukun Liu
Zhiyuan Peng
Xiaoyuan Yi
Xing Xie
Lirong Xiang
Yuchen Liu
Dongkuan Xu
CLLLLMAG
175
45
0
29 Feb 2024
A Multimodal Foundation Agent for Financial Trading: Tool-Augmented,
  Diversified, and Generalist
A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist
Wentao Zhang
Lingxuan Zhao
Haochong Xia
Shuo Sun
Jiaze Sun
...
Yilei Zhao
Xinyu Cai
Longtao Zheng
Xinrun Wang
Rui Hu
AIFin
472
113
0
28 Feb 2024
Are LLMs Capable of Data-based Statistical and Causal Reasoning?
  Benchmarking Advanced Quantitative Reasoning with Data
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data
Xiao Liu
Zirui Wu
Xueqing Wu
Pan Lu
Kai-Wei Chang
Yansong Feng
ELMLRM
342
62
0
27 Feb 2024
Measuring Vision-Language STEM Skills of Neural Models
Measuring Vision-Language STEM Skills of Neural Models
Jianhao Shen
Ye Yuan
Srbuhi Mirzoyan
Ming Zhang
Chenguang Wang
VLM
430
13
0
27 Feb 2024
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Omkar Thawakar
Ashmal Vayani
Salman Khan
Hisham Cholakal
Rao M. Anwer
Michael Felsberg
Timothy Baldwin
Eric P. Xing
Fahad Shahbaz Khan
240
47
0
26 Feb 2024
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form
  Video-Text Understanding
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding
Yuxuan Wang
Yueqian Wang
Pengfei Wu
Jianxin Liang
Dongyan Zhao
Zilong Zheng
VLM
268
3
0
25 Feb 2024
Multimodal Instruction Tuning with Conditional Mixture of LoRA
Multimodal Instruction Tuning with Conditional Mixture of LoRA
Ying Shen
Zhiyang Xu
Qifan Wang
Yu Cheng
Wenpeng Yin
Lifu Huang
206
30
0
24 Feb 2024
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models
  Evaluation
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation
Yi Zong
Xipeng Qiu
ELMVLM
151
13
0
24 Feb 2024
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and
  Two-Phase Partition
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Lu Ye
Ze Tao
Yong Huang
Yang Li
310
62
0
23 Feb 2024
CommVQA: Situating Visual Question Answering in Communicative Contexts
CommVQA: Situating Visual Question Answering in Communicative Contexts
N. Naik
Christopher Potts
Elisa Kreiss
CoGe
84
1
0
22 Feb 2024
Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning
  Meets Adversarial Images
Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images
Zefeng Wang
Zhen Han
Shuo Chen
Fan Xue
Zifeng Ding
Xun Xiao
Volker Tresp
Juil Sock
Jindong Gu
LRM
279
4
0
22 Feb 2024
Towards Robust Instruction Tuning on Multimodal Large Language Models
Towards Robust Instruction Tuning on Multimodal Large Language Models
Wei Han
Hui Chen
Soujanya Poria
MLLM
294
2
0
22 Feb 2024
Uncertainty-Aware Evaluation for Vision-Language Models
Uncertainty-Aware Evaluation for Vision-Language Models
Vasily Kostumov
Bulat Nutfullin
Oleg Pilipenko
Eugene Ilyushin
ELM
440
16
0
22 Feb 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with
  Olympiad-Level Bilingual Multimodal Scientific Problems
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELMAIMat
407
690
0
21 Feb 2024
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large
  Vision-Language Models
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models
Xueliang Zhao
Xinting Huang
Tingchen Fu
Qintong Li
Shansan Gong
Lemao Liu
Wei Bi
Lingpeng Kong
LRM
291
4
0
21 Feb 2024
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension
  with Enhanced Visual Knowledge Alignment
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Yunxin Li
Xinyu Chen
Baotian Hu
Haoyuan Shi
Min Zhang
184
7
0
21 Feb 2024
FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning
FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning
Xiao Li
Bolin Zhu
Kaiwen Shi
Sichen Liu
Yin Zhu
Yiwei Liu
Gong Cheng
AIMat
611
1
0
20 Feb 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Vasu Sharma
VLM
542
68
0
20 Feb 2024
Your Vision-Language Model Itself Is a Strong Filter: Towards
  High-Quality Instruction Tuning with Data Selection
Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection
Ruibo Chen
Yihan Wu
Lichang Chen
Guodong Liu
Qi He
Tianyi Xiong
Chenxi Liu
Junfeng Guo
Heng-Chiao Huang
VLM
196
36
0
19 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRMVLM
359
123
0
19 Feb 2024
Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings
  for Robust Large Vision-Language Models
Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
Christian Schlarmann
Naman D. Singh
Francesco Croce
Matthias Hein
VLMAAML
389
86
0
19 Feb 2024
High-quality Data-to-Text Generation for Severely Under-Resourced
  Languages with Out-of-the-box Large Language Models
High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models
Michela Lorandi
Anya Belz
168
7
0
19 Feb 2024
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large
  Language Models
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models
Didi Zhu
Zhongyi Sun
Zexi Li
Zhenyuan Zhang
Ke Yan
Shouhong Ding
Kun Kuang
Chao Wu
CLLKELMMoMe
222
45
0
19 Feb 2024
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question
  Answering and Clinical Reasoning
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning
Congyun Jin
Ming Zhang
Xiaowei Ma
Yujiao Li
Yingbo Wang
...
Chenfei Chi
Xiangguo Lv
Fangzhou Li
Wei Xue
Yiran Huang
LM&MA
184
10
0
19 Feb 2024
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language
  Models
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models
Guiming Hardy Chen
Shunian Chen
Ruifei Zhang
Junying Chen
Xiangbo Wu
Zhiyi Zhang
Zhihong Chen
Jianquan Li
Xiang Wan
Benyou Wang
VLMSyDa
388
184
0
18 Feb 2024
Efficient Multimodal Learning from Data-centric Perspective
Efficient Multimodal Learning from Data-centric Perspective
Muyang He
Yexin Liu
Boya Wu
Jianhao Yuan
Yueze Wang
Tiejun Huang
Bo Zhao
MLLM
273
121
0
18 Feb 2024
Aligning Modalities in Vision Large Language Models via Preference
  Fine-tuning
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Yiyang Zhou
Chenhang Cui
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
VLMMLLM
271
165
0
18 Feb 2024
BlendFilter: Advancing Retrieval-Augmented Large Language Models via
  Query Generation Blending and Knowledge Filtering
BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering
Haoyu Wang
Ruirui Li
Haoming Jiang
Jinjin Tian
Zhengyang Wang
Chen Luo
Xianfeng Tang
Monica Cheng
Tuo Zhao
Jing Gao
RALMKELM
232
36
0
16 Feb 2024
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating
  Hallucinations in Multimodal Large Language Models
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
Shangyu Xing
Fei Zhao
Zhen Wu
Tuo An
Weihao Chen
Chunhui Li
Jianbing Zhang
Xinyu Dai
MLLMMU
284
13
0
15 Feb 2024
Previous
123...202122...242526
Next
Page 21 of 26
Pageof 26