ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.09513
  4. Cited By
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
v1v2 (latest)

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Neural Information Processing Systems (NeurIPS), 2022
20 September 2022
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
    ELMReLMLRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"

50 / 1,273 papers shown
To Preserve or To Compress: An In-Depth Study of Connector Selection in
  Multimodal Large Language Models
To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Junyan Lin
Haoran Chen
Dawei Zhu
Xiaoyu Shen
143
7
0
09 Oct 2024
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
Hanrong Ye
Haotian Zhang
Erik Daxberger
Lin Chen
Zongyu Lin
...
Haoxuan You
Dan Xu
Zhe Gan
Jiasen Lu
Yinfei Yang
EgoVMLLM
297
20
0
09 Oct 2024
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference TimeInternational Conference on Learning Representations (ICLR), 2024
Yi Ding
Bolian Li
Ruqi Zhang
MLLM
317
42
0
09 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to
  See
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
Phu Pham
Phu Pham
Kun Wan
Yu-Jhe Li
Zeliang Zhang
Daniel Miranda
Ajinkya Kale
Ajinkya Kale
Chenliang Xu
257
1
0
08 Oct 2024
ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt
ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt
Fanhu Zeng
Fei Zhu
Haiyang Guo
Xu-Yao Zhang
Cheng-Lin Liu
VLMCLL
287
15
0
08 Oct 2024
Intriguing Properties of Large Language and Vision Models
Intriguing Properties of Large Language and Vision Models
Young-Jun Lee
ByungSoo Ko
Han-Gyu Kim
Yechan Hwang
Ho-Jin Choi
LRMVLM
292
0
0
07 Oct 2024
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLMLRM
263
19
0
06 Oct 2024
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Javier Marin
LRM
534
196
0
06 Oct 2024
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Jiayi He
Hehai Lin
Q. Wang
Yi R. Fung
Chenhui Xu
ReLMLRM
594
27
0
05 Oct 2024
An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable
  Radiology Report Generation
An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
Ahmed Abdulaal
Hugo Fry
Nina Montaña-Brown
Ayodeji Ijishakin
Jack Gao
Stephanie L. Hyland
Daniel C. Alexander
Daniel Coelho De Castro
MedIm
293
19
0
04 Oct 2024
Investigating and Mitigating Object Hallucinations in Pretrained
  Vision-Language (CLIP) Models
Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yufang Liu
Changzhi Sun
Changzhi Sun
Man Lan
Aimin Zhou
VLMMLLM
217
11
0
04 Oct 2024
DocKD: Knowledge Distillation from LLMs for Open-World Document
  Understanding Models
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Sungnyun Kim
Haofu Liao
Srikar Appalaraju
Peng Tang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
Stefano Soatto
266
3
0
04 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New BenchmarkInternational Conference on Learning Representations (ICLR), 2024
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
654
102
0
04 Oct 2024
Unlocking Structured Thinking in Language Models with Cognitive
  Prompting
Unlocking Structured Thinking in Language Models with Cognitive Prompting
Oliver Kramer
Jill Baumann
ReLMLRM
298
9
0
03 Oct 2024
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal
  Foundation Models
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Zhengfeng Lai
Vasileios Saveris
Chen Chen
Hong-You Chen
Haotian Zhang
...
Wenze Hu
Zhe Gan
Peter Grasch
Meng Cao
Yinfei Yang
VLM
177
9
0
03 Oct 2024
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Justice or Prejudice? Quantifying Biases in LLM-as-a-JudgeInternational Conference on Learning Representations (ICLR), 2024
Jiayi Ye
Zixiang Xu
Yue Huang
Dongping Chen
Qihui Zhang
...
Werner Geyer
Chao Huang
Pin-Yu Chen
Nitesh Chawla
Xiangliang Zhang
ELM
368
207
0
03 Oct 2024
NL-Eye: Abductive NLI for Images
NL-Eye: Abductive NLI for ImagesInternational Conference on Learning Representations (ICLR), 2024
Mor Ventura
Michael Toker
Nitay Calderon
Zorik Gekhman
Yonatan Bitton
Roi Reichart
308
3
0
03 Oct 2024
Why context matters in VQA and Reasoning: Semantic interventions for VLM
  input modalities
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities
Kenza Amara
Lukas Klein
Carsten T. Lüth
Paul Jäger
Hendrik Strobelt
Mennatallah El-Assady
190
3
0
02 Oct 2024
Question-guided Knowledge Graph Re-scoring and Injection for Knowledge
  Graph Question Answering
Question-guided Knowledge Graph Re-scoring and Injection for Knowledge Graph Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yu Zhang
Kehai Chen
Xuefeng Bai
zhao kang
Quanjiang Guo
Min Zhang
301
17
0
02 Oct 2024
DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models
DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models
Yuxuan Zhang
Ruizhe Li
MoMe
502
2
0
02 Oct 2024
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
Sara Ghazanfari
Alexandre Araujo
Prashanth Krishnamurthy
Siddharth Garg
Farshad Khorrami
VLM
313
7
0
02 Oct 2024
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures
  in Robotic Manipulation
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic ManipulationInternational Conference on Learning Representations (ICLR), 2024
Jiafei Duan
Wilbert Pumacay
Nishanth Kumar
Yi Ru Wang
Shulin Tian
Wentao Yuan
Ranjay Krishna
Dieter Fox
Ajay Mandlekar
Yijie Guo
VLMLRM
273
81
0
01 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLMMLLM
307
67
1
30 Sep 2024
World to Code: Multi-modal Data Generation via Self-Instructed
  Compositional Captioning and Filtering
World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and FilteringConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jiacong Wang
Bohong Wu
Haiyong Jiang
Xun Zhou
Xin Xiao
Haoyuan Guo
Jun Xiao
VLMVGen
305
14
0
30 Sep 2024
T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness
  Recognition
T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness RecognitionNeural Information Processing Systems (NeurIPS), 2024
Chen Yeh
You-Ming Chang
Wei-Chen Chiu
Ning Yu
189
3
0
29 Sep 2024
See then Tell: Enhancing Key Information Extraction with Vision Grounding
See then Tell: Enhancing Key Information Extraction with Vision Grounding
Shuhang Liu
Zhenrong Zhang
Pengfei Hu
Jiefeng Ma
Jun Du
Qing Wang
Jianshu Zhang
Chenyu Liu
253
1
0
29 Sep 2024
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
292
495
0
27 Sep 2024
Align$^2$LLaVA: Cascaded Human and Large Language Model Preference
  Alignment for Multi-modal Instruction Curation
Align2^22LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
Hongzhe Huang
Zhewen Yu
Jiang Liu
Li Cai
Dian Jiao
...
Siliang Tang
Juncheng Li
Hao Jiang
Haoyuan Li
Yueting Zhuang
MLLMALM
115
0
0
27 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLMAuLLM
473
22
0
26 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid EmotionsComputer Vision and Pattern Recognition (CVPR), 2024
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLMMLLMVLM
447
44
0
26 Sep 2024
Internalizing ASR with Implicit Chain of Thought for Efficient
  Speech-to-Speech Conversational LLM
Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM
Robin Shing-Hei Yuen
Timothy Tin-Long Tse
Jian Zhu
AuLLM
184
4
0
25 Sep 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
  Multimodal Models
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Matt Deitke
Christopher Clark
Sangho Lee
Rohun Tripathi
Yue Yang
...
Noah A. Smith
Hannaneh Hajishirzi
Ross Girshick
Ali Farhadi
Aniruddha Kembhavi
OSLMVLM
470
58
0
25 Sep 2024
Towards Efficient and Robust VQA-NLE Data Generation with Large
  Vision-Language Models
Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language ModelsInternational Conference on Computational Linguistics (COLING), 2024
Patrick Amadeus Irawan
Genta Indra Winata
Samuel Cahyawijaya
Ayu Purwarianti
272
1
0
23 Sep 2024
Phantom of Latent for Large Language and Vision Models
Phantom of Latent for Large Language and Vision Models
Byung-Kwan Lee
Sangyun Chung
Chae Won Kim
Beomchan Park
Yong Man Ro
VLMLRM
285
12
0
23 Sep 2024
Effectively Enhancing Vision Language Large Models by Prompt
  Augmentation and Caption Utilization
Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization
Minyi Zhao
Jie Wang
Zerui Li
Jiyuan Zhang
Zhenbang Sun
Shuigeng Zhou
MLLMVLM
325
3
0
22 Sep 2024
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs
  as Science Communicators
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science CommunicatorsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Prasoon Bajpai
Niladri Chatterjee
Subhabrata Dutta
Tanmoy Chakraborty
ELM
236
3
0
21 Sep 2024
Enhancing Advanced Visual Reasoning Ability of Large Language Models
Enhancing Advanced Visual Reasoning Ability of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zhiyuan Li
Dongnan Liu
Chaoyi Zhang
Heng Wang
Tengfei Xue
Weidong Cai
VLMLRM
262
17
0
21 Sep 2024
AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual Granularity
AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual GranularityAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhibin Lan
Liqiang Niu
Fandong Meng
Wenbo Li
Jie Zhou
Jinsong Su
VLM
316
4
0
20 Sep 2024
MMSearch: Benchmarking the Potential of Large Models as Multi-modal
  Search Engines
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines
Dongzhi Jiang
Renrui Zhang
Ziyu Guo
Yanmin Wu
Jiayi Lei
...
Guanglu Song
Peng Gao
Yu Liu
Chunyuan Li
Hongsheng Li
MLLM
294
51
0
19 Sep 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated ImagesNeural Information Processing Systems (NeurIPS), 2024
Zhecan Wang
Junzhang Liu
Chia-Wei Tang
Hani Alomari
Anushka Sivakumar
...
Haoxuan You
A. Ishmam
Kai-Wei Chang
Shih-Fu Chang
Chris Thomas
CoGeVLM
531
5
0
19 Sep 2024
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
  Mathematical Reasoning
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
Xiaotian Han
Yiren Jian
Xuefeng Hu
Haogeng Liu
Yiqi Wang
...
Yuang Ai
Huaibo Huang
Ran He
Zhenheng Yang
Quanzeng You
LRMAI4CE
206
32
0
19 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
428
5
0
19 Sep 2024
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoningInternational Conference on Learning Representations (ICLR), 2024
Zayne Sprague
Fangcong Yin
Juan Diego Rodriguez
Dongwei Jiang
Manya Wadhwa
Prasann Singhal
Xinyu Zhao
Xi Ye
Kyle Mahowald
Greg Durrett
ReLMLRM
677
238
0
18 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Wei Ping
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
Mohammad Shoeybi
Bryan Catanzaro
Ming-Yu Liu
MLLMVLMLRM
308
114
0
17 Sep 2024
Fit and Prune: Fast and Training-free Visual Token Pruning for
  Multi-modal Large Language Models
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024
Yiyi Zhou
Qiong Wu
Wenhao Lin
Weihao Ye
VLM
330
54
0
16 Sep 2024
Expediting and Elevating Large Language Model Reasoning via Hidden
  Chain-of-Thought Decoding
Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding
Tianqiao Liu
Zui Chen
Zitao Liu
Mi Tian
Weiqi Luo
LRM
145
10
0
13 Sep 2024
What Makes a Maze Look Like a Maze?
What Makes a Maze Look Like a Maze?International Conference on Learning Representations (ICLR), 2024
Joy Hsu
Jiayuan Mao
J. Tenenbaum
Noah D. Goodman
Jiajun Wu
OCL
487
13
0
12 Sep 2024
Mitigating Hallucination in Visual-Language Models via Re-Balancing
  Contrastive Decoding
Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive DecodingChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2024
Xiaoyu Liang
Jiayuan Yu
Lianrui Mu
Jiedong Zhuang
Jiaqi Hu
Yuchen Yang
Jiangnan Ye
Lu Lu
Jian Chen
Haoji Hu
VLM
150
7
0
10 Sep 2024
POINTS: Improving Your Vision-language Model with Affordable Strategies
POINTS: Improving Your Vision-language Model with Affordable Strategies
Yuan Liu
Zhongyin Zhao
Ziyuan Zhuang
Le Tian
Xiao Zhou
Jie Zhou
VLM
261
12
0
07 Sep 2024
An overview of domain-specific foundation model: key technologies, applications and challenges
An overview of domain-specific foundation model: key technologies, applications and challengesScience China Information Sciences (Sci. China Inf. Sci.), 2024
Haolong Chen
Hanzhi Chen
Zijian Zhao
Kaifeng Han
Guangxu Zhu
Yichen Zhao
Ying Du
Wei Xu
Qingjiang Shi
ALMVLM
498
21
0
06 Sep 2024
Previous
123...141516...242526
Next
Page 15 of 26
Pageof 26