ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.09513
  4. Cited By
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
v1v2 (latest)

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Neural Information Processing Systems (NeurIPS), 2022
20 September 2022
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
    ELMReLMLRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"

50 / 1,273 papers shown
Multimodal Commonsense Knowledge Distillation for Visual Question
  Answering
Multimodal Commonsense Knowledge Distillation for Visual Question Answering
Shuo Yang
Siwen Luo
S. Han
LRM
125
1
0
05 Nov 2024
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
Yunkai Dang
Mengxi Gao
Yibo Yan
Xin Zou
Yanggan Gu
...
Jingyu Wang
Peijie Jiang
Aiwei Liu
Jia Liu
Xuming Hu
359
11
0
05 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
613
4
0
01 Nov 2024
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Bohan Lyu
Yadi Cao
Duncan Watson-Parris
Leon Bergen
Taylor Berg-Kirkpatrick
Rose Yu
541
9
0
01 Nov 2024
PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via
  Existing MLLM Structures
PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures
Tianxiang Wu
Minxin Nie
Ziqiang Cao
MLLM
135
0
0
30 Oct 2024
Causal Interventions on Causal Paths: Mapping GPT-2's Reasoning From
  Syntax to Semantics
Causal Interventions on Causal Paths: Mapping GPT-2's Reasoning From Syntax to Semantics
Isabelle Lee
Joshua Lum
Ziyi Liu
Dani Yogatama
LRM
170
1
0
28 Oct 2024
What Factors Affect Multi-Modal In-Context Learning? An In-Depth
  Exploration
What Factors Affect Multi-Modal In-Context Learning? An In-Depth ExplorationNeural Information Processing Systems (NeurIPS), 2024
L. Qin
Qiguang Chen
Hao Fei
Zhi Chen
Min Li
Wanxiang Che
212
27
0
27 Oct 2024
Sensor2Text: Enabling Natural Language Interactions for Daily Activity
  Tracking Using Wearable Sensors
Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable SensorsProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2024
Wenqiang Chen
Jiaxuan Cheng
Leyao Wang
Wei Zhao
Wojciech Matusik
274
15
0
26 Oct 2024
Can Stories Help LLMs Reason? Curating Information Space Through
  Narrative
Can Stories Help LLMs Reason? Curating Information Space Through Narrative
Vahid Sadiri Javadi
Johanne R. Trippas
Yash Kumar Lal
Lucie Flek
154
2
0
25 Oct 2024
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
Sara Ghaboura
Ahmed Heakl
Omkar Thawakar
Ali Alharthi
Ines Riahi
Abduljalil Saif
Jorma T. Laaksonen
Fahad Shahbaz Khan
Salman Khan
Rao Muhammad Anwer
214
11
0
24 Oct 2024
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence
  Embeddings for Automatic Dialog Flow Extraction
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow ExtractionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Sergio Burdisso
S. Madikeri
P. Motlícek
350
6
0
24 Oct 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language
  Tuning
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language TuningInternational Journal of Computer Vision (IJCV), 2024
Zhiwei Hao
Jianyuan Guo
Li Shen
Yong Luo
Han Hu
Yonggang Wen
VLM
295
4
0
23 Oct 2024
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large
  Vision-Language Models
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Ziyu Liu
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Haodong Duan
Bin Wang
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
244
20
0
23 Oct 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Bin Wang
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
337
136
0
22 Oct 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5%
  Parameters and 90% Performance
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
405
89
0
21 Oct 2024
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large
  Multimodal Models
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models
Yufei Zhan
Hongyin Zhao
Yousong Zhu
Fan Yang
Ming Tang
Jinqiao Wang
MLLM
298
3
0
21 Oct 2024
Mitigating Object Hallucination via Concentric Causal Attention
Mitigating Object Hallucination via Concentric Causal AttentionNeural Information Processing Systems (NeurIPS), 2024
Yun Xing
Yiheng Li
Ivan Laptev
Shijian Lu
277
40
0
21 Oct 2024
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Y. Cai
Jiangning Zhang
Haoyang He
Xinwei He
Ao Tong
Zhenye Gan
Chengjie Wang
Zhucun Xue
Yong-Jin Liu
X. Bai
VLM
441
24
0
21 Oct 2024
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning StepsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Xiongtao Zhou
Jie He
Lanyu Chen
Jingyu Li
Haojing Chen
Víctor Gutiérrez-Basulto
Jeff Z. Pan
Ningyu Zhang
LRM
376
8
0
18 Oct 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial SamplesNeural Information Processing Systems (NeurIPS), 2024
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAMLCoGeVLM
659
62
0
18 Oct 2024
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
Yin Xie
Kaicheng Yang
Ninghua Yang
Weimo Deng
Xiangzi Dai
Tiancheng Gu
Yumeng Wang
Xiang An
Yongle Zhao
Ziyong Feng
MLLMVLM
373
1
0
18 Oct 2024
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language AlignmentInternational Conference on Learning Representations (ICLR), 2024
Chenhang Cui
An Zhang
Yiyang Zhou
Zhaorun Chen
Gelei Deng
Huaxiu Yao
Tat-Seng Chua
657
13
0
18 Oct 2024
Efficient Vision-Language Models by Summarizing Visual Tokens into
  Compact Registers
Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers
Yuxin Wen
Qingqing Cao
Qichen Fu
Sachin Mehta
Mahyar Najibi
VLM
277
18
0
17 Oct 2024
$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large
  Language Models
γ−γ-γ−MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Yaxin Luo
Gen Luo
Jinfa Huang
Weihao Ye
Xiaoshuai Sun
Zhiqiang Shen
Rongrong Ji
VLMMoE
286
9
0
17 Oct 2024
Can MLLMs Understand the Deep Implication Behind Chinese Images?
Can MLLMs Understand the Deep Implication Behind Chinese Images?Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Chenhao Zhang
Xi Feng
Yuelin Bai
Xinrun Du
Jinchang Hou
...
Min Yang
Wenhao Huang
Chenghua Lin
Ge Zhang
Shiwen Ni
ELMVLM
161
11
0
17 Oct 2024
Improving Multi-modal Large Language Model through Boosting Vision
  Capabilities
Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Yanpeng Sun
Han Zhang
Qiang Chen
Xinyu Zhang
Nong Sang
Gang Zhang
Jingdong Wang
Zechao Li
213
10
0
17 Oct 2024
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of
  Large Multimodal Models Through Coding Tasks
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Fengji Zhang
Linquan Wu
Huiyu Bai
Guancheng Lin
Xiao Li
Xiao Yu
Yue Wang
Bei Chen
Jacky Keung
MLLMELMLRM
335
3
0
16 Oct 2024
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
Botian Jiang
Lei Li
Xiaonan Li
Zhaowei Li
Xiachong Feng
Dianbo Sui
Qiang Liu
Xipeng Qiu
214
5
0
16 Oct 2024
Model Balancing Helps Low-data Training and Fine-tuning
Model Balancing Helps Low-data Training and Fine-tuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zihang Liu
Yihan Hu
Tianyu Pang
Yefan Zhou
Pu Ren
Yaoqing Yang
227
9
0
16 Oct 2024
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Cross-Modal Safety Mechanism Transfer in Large Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Shicheng Xu
Liang Pang
Yunchang Zhu
Huawei Shen
Xueqi Cheng
MLLM
303
14
0
16 Oct 2024
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained
  Vision-Language Understanding
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Yue Cao
Yangzhou Liu
Zhe Chen
Guangchen Shi
Wenhai Wang
Danhuai Zhao
Tong Lu
252
16
0
15 Oct 2024
VisualRWKV-HD and UHD: Advancing High-Resolution Processing for Visual
  Language Models
VisualRWKV-HD and UHD: Advancing High-Resolution Processing for Visual Language Models
Zihang Li
Haowen Hou
135
2
0
15 Oct 2024
MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark
MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark
Bin Shan
Xiang Fei
Wei Shi
An-Lan Wang
Guozhi Tang
Lei Liao
Jingqun Tang
Xiang Bai
Can Huang
VLM
241
10
0
15 Oct 2024
A Systematic Review on Prompt Engineering in Large Language Models for
  K-12 STEM Education
A Systematic Review on Prompt Engineering in Large Language Models for K-12 STEM Education
Eason Chen
Danyang Wang
Luyi Xu
Chen Cao
Xiao Fang
Jionghao Lin
AI4CE
216
16
0
14 Oct 2024
MEV Capture Through Time-Advantaged Arbitrage
MEV Capture Through Time-Advantaged Arbitrage
Robin Fritsch
Maria Ines Silva
A. Mamageishvili
Benjamin Livshits
E. Felten
257
13
0
14 Oct 2024
AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality
AlphaLoRA: Assigning LoRA Experts Based on Layer Training QualityConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Peijun Qing
Chongyang Gao
Yefan Zhou
Xingjian Diao
Yaoqing Yang
Soroush Vosoughi
MoMeMoE
264
14
0
14 Oct 2024
Can We Predict Performance of Large Models across Vision-Language Tasks?
Can We Predict Performance of Large Models across Vision-Language Tasks?
Qinyu Zhao
Ming Xu
Kartik Gupta
Akshay Asthana
Liang Zheng
Stephen Gould
498
1
0
14 Oct 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
348
29
0
14 Oct 2024
Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
Adapt-∞\infty∞: Scalable Continual Multimodal Instruction Tuning via Dynamic Data SelectionInternational Conference on Learning Representations (ICLR), 2024
A. Maharana
Jaehong Yoon
Tianlong Chen
Joey Tianyi Zhou
332
0
0
14 Oct 2024
Towards Efficient Visual-Language Alignment of the Q-Former for Visual
  Reasoning Tasks
Towards Efficient Visual-Language Alignment of the Q-Former for Visual Reasoning TasksConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Sungkyung Kim
Adam Lee
Junyoung Park
Andrew Chung
Jusang Oh
Jay-Yoon Lee
98
12
0
12 Oct 2024
Skipping Computations in Multimodal LLMs
Skipping Computations in Multimodal LLMs
Mustafa Shukor
Matthieu Cord
239
6
0
12 Oct 2024
Unraveling and Mitigating Safety Alignment Degradation of
  Vision-Language Models
Unraveling and Mitigating Safety Alignment Degradation of Vision-Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Qin Liu
Chao Shang
Ling Liu
Nikolaos Pappas
Jie Ma
Neha Anna John
Srikanth Doss Kadarundalagi Raghuram Doss
Lluís Marquez
Miguel Ballesteros
Yassine Benajiba
288
15
0
11 Oct 2024
Extracting and Combining Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Extracting and Combining Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Zhipeng Chen
Liang Song
K. Zhou
Wayne Xin Zhao
Binghai Wang
Weipeng Chen
Ji-Rong Wen
418
0
0
10 Oct 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal ModelsInternational Conference on Learning Representations (ICLR), 2024
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
370
29
0
10 Oct 2024
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
Shengcao Cao
Liang-Yan Gui
Yu Wang
249
5
0
10 Oct 2024
Q-VLM: Post-training Quantization for Large Vision-Language Models
Q-VLM: Post-training Quantization for Large Vision-Language ModelsNeural Information Processing Systems (NeurIPS), 2024
Changyuan Wang
Ziwei Wang
Xiuwei Xu
Yansong Tang
Jie Zhou
Jiwen Lu
MQ
456
16
0
10 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-trainingComputer Vision and Pattern Recognition (CVPR), 2024
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLMMLLM
383
68
0
10 Oct 2024
MoDEM: Mixture of Domain Expert Models
MoDEM: Mixture of Domain Expert ModelsAustralasian Language Technology Association Workshop (ALTA), 2024
Toby Simonds
Kemal Kurniawan
Jey Han Lau
MoE
276
6
0
09 Oct 2024
Exploring Prompt Engineering: A Systematic Review with SWOT Analysis
Exploring Prompt Engineering: A Systematic Review with SWOT Analysis
Aditi Singh
Abul Ehtesham
Gaurav Kumar Gupta
Nikhil Kumar Chatta
Saket Kumar
T. T. Khoei
234
6
0
09 Oct 2024
Deciphering Cross-Modal Alignment in Large Vision-Language Models with
  Modality Integration Rate
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
Qidong Huang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Jiaqi Wang
Dahua Lin
Weiming Zhang
Nenghai Yu
192
20
0
09 Oct 2024
Previous
123...131415...242526
Next
Page 14 of 26
Pageof 26