ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.09513
  4. Cited By
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
v1v2 (latest)

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Neural Information Processing Systems (NeurIPS), 2022
20 September 2022
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
    ELMReLMLRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"

50 / 1,273 papers shown
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRMALM
611
1,908
0
22 Apr 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge
Sijie Zhao
Jinguo Zhu
Yixiao Ge
Kun Yi
Lin Song
Chen Li
Xiaohan Ding
Ying Shan
VLM
432
243
0
22 Apr 2024
MARVEL: Multidimensional Abstraction and Reasoning through Visual
  Evaluation and Learning
MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
Yifan Jiang
Jiarui Zhang
Kexuan Sun
Zhivar Sourati
Kian Ahrabian
Kaixin Ma
Filip Ilievski
Jay Pujara
LRM
364
29
0
21 Apr 2024
FakeBench: Uncover the Achilles' Heels of Fake Images with Large
  Multimodal Models
FakeBench: Uncover the Achilles' Heels of Fake Images with Large Multimodal Models
Yixuan Li
Xuelin Liu
Xiaoyang Wang
Shiqi Wang
Weisi Lin
314
2
0
20 Apr 2024
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Zhuofan Zong
Bingqi Ma
Dazhong Shen
Guanglu Song
Hao Shao
Dongzhi Jiang
Jiaming Song
Yu Liu
MoE
301
87
0
19 Apr 2024
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning
Yian Li
Wentao Tian
Yang Jiao
Yue Yu
Yueping Jiang
Bin Zhu
Na Zhao
Yu-Gang Jiang
LRM
347
3
0
19 Apr 2024
MedThink: Explaining Medical Visual Question Answering via Multimodal
  Decision-Making Rationale
MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
Xiaotang Gai
Chenyi Zhou
Jiaxiang Liu
Yang Feng
Jian Wu
Zuo-Qiang Liu
MedIm
246
11
0
18 Apr 2024
Missed Connections: Lateral Thinking Puzzles for Large Language Models
Missed Connections: Lateral Thinking Puzzles for Large Language Models
Graham Todd
Timothy Merino
Sam Earle
Julian Togelius
ReLMLRM
330
9
0
17 Apr 2024
Self-Supervised Visual Preference Alignment
Self-Supervised Visual Preference Alignment
Ke Zhu
Liang Zhao
Zheng Ge
Xiangyu Zhang
215
29
0
16 Apr 2024
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics
  Perception
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
Yipo Huang
Xiangfei Sheng
Zhichao Yang
Quan Yuan
Zhichao Duan
Pengfei Chen
Leida Li
Weisi Lin
Guangming Shi
288
52
0
15 Apr 2024
On Speculative Decoding for Multimodal Large Language Models
On Speculative Decoding for Multimodal Large Language Models
Mukul Gagrani
Raghavv Goel
Wonseok Jeon
Junyoung Park
Mingu Lee
Christopher Lott
LRM
173
21
0
13 Apr 2024
Improving Health Question Answering with Reliable and Time-Aware
  Evidence Retrieval
Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval
Juraj Vladika
Florian Matthes
RALM
229
13
0
12 Apr 2024
DesignQA: A Multimodal Benchmark for Evaluating Large Language Models'
  Understanding of Engineering Documentation
DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation
Anna C. Doris
Daniele Grandi
Ryan Tomich
Md Ferdous Alam
Hyunmin Cheong
Faez Ahmed
173
35
0
11 Apr 2024
MM-PhyQA: Multimodal Physics Question-Answering With Multi-Image CoT
  Prompting
MM-PhyQA: Multimodal Physics Question-Answering With Multi-Image CoT Prompting
Avinash Anand
Janak Kapuriya
Apoorv Singh
Jay Saraf
Naman Lal
Astha Verma
Rushali Gupta
R. Shah
LRM
145
21
0
11 Apr 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model
  Handling Resolutions from 336 Pixels to 4K HD
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HDNeural Information Processing Systems (NeurIPS), 2024
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Sijin Yu
...
Xingcheng Zhang
Jifeng Dai
Yuxin Qiao
Dahua Lin
Yuan Liu
VLMMLLM
276
160
0
09 Apr 2024
OmniFusion Technical Report
OmniFusion Technical Report
Elizaveta Goncharova
Anton Razzhigaev
Matvey Mikhalchuk
Maxim Kurkin
Irina Abdullaeva
Matvey Skripkin
Ivan Oseledets
Denis Dimitrov
Andrey Kuznetsov
193
6
0
09 Apr 2024
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng
Daniel Goldstein
Quentin G. Anthony
Alon Albalak
Eric Alcaide
...
Bingchen Zhao
Qihang Zhao
Peng Zhou
Jian Zhu
Ruijie Zhu
317
136
0
08 Apr 2024
CoReS: Orchestrating the Dance of Reasoning and Segmentation
CoReS: Orchestrating the Dance of Reasoning and Segmentation
Xiaoyi Bao
Siyang Sun
Shuailei Ma
Kecheng Zheng
Yuxin Guo
Guosheng Zhao
Yun Zheng
Xingang Wang
LRM
359
17
0
08 Apr 2024
Navigating the Landscape of Hint Generation Research: From the Past to
  the Future
Navigating the Landscape of Hint Generation Research: From the Past to the Future
Anubhav Jangra
Jamshid Mozafari
Adam Jatowt
Smaranda Muresan
271
7
0
06 Apr 2024
Measuring Social Norms of Large Language Models
Measuring Social Norms of Large Language Models
Ye Yuan
Kexin Tang
Jianhao Shen
Ming Zhang
Chenguang Wang
ELM
385
13
0
03 Apr 2024
What Are We Measuring When We Evaluate Large Vision-Language Models? An
  Analysis of Latent Factors and Biases
What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and BiasesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
A. M. H. Tiong
Junqi Zhao
Boyang Albert Li
Junnan Li
Guosheng Lin
Caiming Xiong
258
12
0
03 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
ViTamin: Designing Scalable Vision Models in the Vision-Language EraComputer Vision and Pattern Recognition (CVPR), 2024
Jienneg Chen
Qihang Yu
Xiaohui Shen
Yaoyao Liu
Liang-Chieh Chen
3DVVLM
415
50
0
02 Apr 2024
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic
  Representations
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
Deqing Fu
Ghazal Khalighinejad
Ollie Liu
Bhuwan Dhingra
Dani Yogatama
Robin Jia
Willie Neiswanger
459
36
0
01 Apr 2024
Evaluating the Factuality of Large Language Models using Large-Scale
  Knowledge Graphs
Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs
Xiaoze Liu
Feijie Wu
Tianyang Xu
Zhuo Chen
Yichi Zhang
Xiaoqian Wang
Jing Gao
HILM
323
15
0
01 Apr 2024
How Robust are the Tabular QA Models for Scientific Tables? A Study
  using Customized Dataset
How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset
Akash Ghosh
Venkata Sahith Bathini
Niloy Ganguly
Pawan Goyal
Mayank Singh
LMTD
224
2
0
30 Mar 2024
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact
  Language Model
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Musashi Hinck
Matthew Lyle Olson
David Cobbley
Shao-Yen Tseng
Vasudev Lal
VLM
181
15
0
29 Mar 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Yuhang Zang
...
Haodong Duan
Yuan Liu
Yu Qiao
Dahua Lin
Feng Zhao
VLM
420
560
0
29 Mar 2024
Mitigating Misleading Chain-of-Thought Reasoning with Selective
  Filtering
Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering
Yexin Wu
Zhuosheng Zhang
Hai Zhao
LRM
193
9
0
28 Mar 2024
Mini-Gemini: Mining the Potential of Multi-modality Vision Language
  Models
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li
Yuechen Zhang
Chengyao Wang
Zhisheng Zhong
Yixin Chen
Ruihang Chu
Shaoteng Liu
Jiaya Jia
VLMMLLMMoE
412
325
0
27 Mar 2024
Beyond Embeddings: The Promise of Visual Table in Visual Reasoning
Beyond Embeddings: The Promise of Visual Table in Visual Reasoning
Yiwu Zhong
Zi-Yuan Hu
Michael R. Lyu
Liwei Wang
235
6
0
27 Mar 2024
Assessment of Multimodal Large Language Models in Alignment with Human
  Values
Assessment of Multimodal Large Language Models in Alignment with Human Values
Zhelun Shi
Zhipin Wang
Hongxing Fan
Zaibin Zhang
Lijun Li
Yongting Zhang
Zhen-fei Yin
Lu Sheng
Yu Qiao
Jing Shao
230
34
0
26 Mar 2024
DreamLIP: Language-Image Pre-training with Long Captions
DreamLIP: Language-Image Pre-training with Long Captions
Kecheng Zheng
Yifei Zhang
Wei Wu
Fan Lu
Shuailei Ma
Xin Jin
Wei Chen
Yujun Shen
VLMCLIP
313
63
0
25 Mar 2024
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive
  Dataset and Benchmark for Chain-of-Thought Reasoning
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Hao Shao
Shengju Qian
Han Xiao
Guanglu Song
Zhuofan Zong
Letian Wang
Yu Liu
Jiaming Song
VGenLRMMLLM
349
218
0
25 Mar 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
451
29
0
25 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided
  Training
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
225
6
0
24 Mar 2024
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal
  Models
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang
Mu Cai
Bingxin Xu
Yong Jae Lee
Yan Yan
VLM
595
213
0
22 Mar 2024
Not All Attention is Needed: Parameter and Computation Efficient
  Transfer Learning for Multi-modal Large Language Models
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu
Yiyi Zhou
Weihao Ye
Xiaoshuai Sun
Rongrong Ji
MoE
191
2
0
22 Mar 2024
A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal
  Reasoning
A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal ReasoningACM Multimedia (MM), 2024
Changmeng Zheng
Dayong Liang
Wengyu Zhang
Xiao Wei
Tat-Seng Chua
Qing Li
210
1
0
22 Mar 2024
ChainLM: Empowering Large Language Models with Improved Chain-of-Thought
  Prompting
ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting
Xiaoxue Cheng
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
LRMAI4CEReLM
163
19
0
21 Mar 2024
VL-Mamba: Exploring State Space Models for Multimodal Learning
VL-Mamba: Exploring State Space Models for Multimodal Learning
Yanyuan Qiao
Zheng Yu
Longteng Guo
Sihan Chen
Zijia Zhao
Mingzhen Sun
Qi Wu
Jing Liu
Mamba
241
108
0
20 Mar 2024
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Théophane Vallaeys
Mustafa Shukor
Matthieu Cord
Jakob Verbeek
317
16
0
20 Mar 2024
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal
  Large Language Models
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Wenqiao Zhang
Tianwei Lin
Jiang Liu
Fangxun Shu
Haoyuan Li
...
Zheqi Lv
Hao Jiang
Juncheng Li
Siliang Tang
Yueting Zhuang
VLMMLLM
239
16
0
20 Mar 2024
PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models
  with Abstract Visual Patterns
PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns
Yew Ken Chia
Vernon Toh Yan Han
Deepanway Ghosal
Lidong Bing
Soujanya Poria
LRMReLM
232
45
0
20 Mar 2024
Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language
  Models
Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models
Zuyan Liu
Yuhao Dong
Yongming Rao
Jie Zhou
Jiwen Lu
LRM
251
43
0
19 Mar 2024
Toward Sustainable GenAI using Generation Directives for Carbon-Friendly
  Large Language Model Inference
Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
244
23
0
19 Mar 2024
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
VL-ICL Bench: The Devil in the Details of Multimodal In-Context LearningInternational Conference on Learning Representations (ICLR), 2024
Yongshuo Zong
Ondrej Bohdal
Timothy M. Hospedales
350
7
0
19 Mar 2024
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution ImagesEuropean Conference on Computer Vision (ECCV), 2024
Ruyi Xu
Yuan Yao
Zonghao Guo
Junbo Cui
Zanlin Ni
Chunjiang Ge
Tat-Seng Chua
Zhiyuan Liu
Maosong Sun
Gao Huang
VLMMLLM
395
170
0
18 Mar 2024
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Guohao Sun
Can Qin
Jiamian Wang
Zeyuan Chen
Ran Xu
Zhiqiang Tao
MLLMVLMLRM
288
22
0
17 Mar 2024
ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart
  Summarization
ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization
Mengsha Liu
Daoyuan Chen
Yaliang Li
Guian Fang
Ying Shen
261
27
0
17 Mar 2024
CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object
  Detection under Unknown Degradations
CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations
Yuwei Zhang
Yan Wu
Yanming Liu
Xinyue Peng
317
14
0
17 Mar 2024
Previous
123...192021...242526
Next
Page 20 of 26
Pageof 26