ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.09513
  4. Cited By
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
v1v2 (latest)

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Neural Information Processing Systems (NeurIPS), 2022
20 September 2022
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
    ELMReLMLRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"

50 / 1,273 papers shown
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation SteeringInternational Conference on Learning Representations (ICLR), 2024
Bruce W. Lee
Inkit Padhi
Karthikeyan N. Ramamurthy
Erik Miehling
Pierre Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
504
73
0
06 Sep 2024
Experimentation in Content Moderation using RWKV
Experimentation in Content Moderation using RWKV
Umut Yildirim
Rohan Dutta
Burak Yildirim
Atharva Vaidya
186
3
0
05 Sep 2024
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation
Ingo Ziegler
Abdullatif Köksal
Desmond Elliott
Hinrich Schütze
308
13
0
03 Sep 2024
Recoverable Compression: A Multimodal Vision Token Recovery Mechanism
  Guided by Text Information
Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text InformationAAAI Conference on Artificial Intelligence (AAAI), 2024
Yi Chen
Jian Xu
Xu-Yao Zhang
Wen-Zhuo Liu
Yang-Yang Liu
Cheng-Lin Liu
294
15
0
02 Sep 2024
Sparsity Meets Similarity: Leveraging Long-Tail Distribution for Dynamic Optimized Token Representation in Multimodal Large Language Models
Sparsity Meets Similarity: Leveraging Long-Tail Distribution for Dynamic Optimized Token Representation in Multimodal Large Language Models
Gaotong Yu
Yi Chen
Jian Xu
61
0
0
02 Sep 2024
Retrieval-Augmented Natural Language Reasoning for Explainable Visual
  Question Answering
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
Su Hyeon Lim
Minkuk Kim
Hyeon Bae Kim
Seong Tae Kim
ReLMLRM
200
0
0
30 Aug 2024
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene
  Understanding
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
Yonghui Wang
Wengang Zhou
Hao Feng
Houqiang Li
VLM
166
1
0
30 Aug 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths
  Vision Computation
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision ComputationNeural Information Processing Systems (NeurIPS), 2024
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
250
32
0
29 Aug 2024
CogVLM2: Visual Language Models for Image and Video Understanding
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLMMLLM
303
200
0
29 Aug 2024
LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in
  Vision-Language Models
LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jingyi Wang
Jianzhong Ju
Jian Luan
Zhidong Deng
VLM
346
5
0
29 Aug 2024
Law of Vision Representation in MLLMs
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
593
16
0
29 Aug 2024
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2024
Fangxun Shu
Yue Liao
Le Zhuo
Chenning Xu
Guanghao Zhang
...
Bolin Li
Zhelun Yu
Si Liu
Hongsheng Li
Hao Jiang
VLMMoE
210
34
0
28 Aug 2024
A Survey on Evaluation of Multimodal Large Language Models
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MAELMLRM
309
44
0
28 Aug 2024
GlaLSTM: A Concurrent LSTM Stream Framework for Glaucoma Detection via Biomarker Mining
GlaLSTM: A Concurrent LSTM Stream Framework for Glaucoma Detection via Biomarker Mining
Cheng Huang
Weizheng Xie
Tsengdar J. Lee
Karanjit S Kooner
Ning Zhang
Yishen Liu
382
75
0
28 Aug 2024
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and
  Analysis
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and AnalysisIEEE International Joint Conference on Neural Network (IJCNN), 2024
Aishik Nagar
Shantanu Jaiswal
Cheston Tan
ReLMLRM
136
19
0
27 Aug 2024
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
I2EBench: A Comprehensive Benchmark for Instruction-based Image EditingNeural Information Processing Systems (NeurIPS), 2024
Yiwei Ma
Jiayi Ji
Ke Ye
Weihuang Lin
Zhibin Wang
Yonghan Zheng
Qiang-feng Zhou
Xiaoshuai Sun
Rongrong Ji
302
27
0
26 Aug 2024
Tangram: A Challenging Benchmark for Geometric Element Recognizing
Tangram: A Challenging Benchmark for Geometric Element Recognizing
Jiamin Tang
Chao Zhang
Xudong Zhu
Mengchi Liu
LRM
62
1
0
25 Aug 2024
Has Multimodal Learning Delivered Universal Intelligence in Healthcare?
  A Comprehensive Survey
Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive SurveyInformation Fusion (Inf. Fusion), 2024
Qika Lin
Yifan Zhu
Xin Mei
Ling Huang
Jingying Ma
Kai He
Zhen Peng
Xiaoshi Zhong
Mengling Feng
298
64
0
23 Aug 2024
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal CapabilitiesAAAI Conference on Artificial Intelligence (AAAI), 2024
Bin Wang
Chunyu Xie
Dawei Leng
Yuhui Yin
MLLM
486
6
0
23 Aug 2024
ParGo: Bridging Vision-Language with Partial and Global Views
ParGo: Bridging Vision-Language with Partial and Global ViewsAAAI Conference on Artificial Intelligence (AAAI), 2024
An-Lan Wang
Bin Shan
Wei Shi
Kun-Yu Lin
Xiang Fei
Guozhi Tang
Lei Liao
Jingqun Tang
Can Huang
Wei-Shi Zheng
MLLMVLM
526
25
0
23 Aug 2024
Building and better understanding vision-language models: insights and
  future directions
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
320
133
0
22 Aug 2024
Large Language Models Are Self-Taught Reasoners: Enhancing LLM
  Applications via Tailored Problem-Solving Demonstrations
Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations
Kai Tzu-iunn Ong
Taeyoon Kwon
Jinyoung Yeo
LRM
132
1
0
22 Aug 2024
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
Feipeng Ma
Yizhou Zhou
Hebei Li
Zilong He
Siying Wu
Fengyun Rao
Siying Wu
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
486
10
0
21 Aug 2024
Sycophancy in Vision-Language Models: A Systematic Analysis and an Inference-Time Mitigation Framework
Sycophancy in Vision-Language Models: A Systematic Analysis and an Inference-Time Mitigation Framework
Yunpu Zhao
Rui Zhang
Junbin Xiao
Changxin Ke
Ruibo Hou
Yifan Hao
Qi Guo
240
4
0
21 Aug 2024
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
Yuanyang Yin
Yaqi Zhao
Yajie Zhang
Yuanxing Zhang
Ke Lin
Jiahao Wang
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
LRM
332
11
0
21 Aug 2024
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Jonathan Roberts
Kai Han
Samuel Albanie
278
3
0
21 Aug 2024
HiRED: Attention-Guided Token Dropping for Efficient Inference of
  High-Resolution Vision-Language Models in Resource-Constrained Environments
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments
Kazi Hasan Ibn Arif
JinYi Yoon
Dimitrios S. Nikolopoulos
Hans Vandierendonck
Deepu John
Bo Ji
MLLMVLM
233
25
0
20 Aug 2024
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing
  Hallucinations in LVLMs
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMsEuropean Conference on Computer Vision (ECCV), 2024
Yassine Ouali
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
VLMMLLM
298
43
0
19 Aug 2024
Visual Agents as Fast and Slow Thinkers
Visual Agents as Fast and Slow ThinkersInternational Conference on Learning Representations (ICLR), 2024
Guangyan Sun
Haoyang Ling
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAGLRM
551
46
0
16 Aug 2024
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Le Xue
Manli Shu
Anas Awadalla
Jun Wang
An Yan
...
Zeyuan Chen
Silvio Savarese
Juan Carlos Niebles
Caiming Xiong
Ran Xu
VLM
531
146
0
16 Aug 2024
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
Sayna Ebrahimi
Sercan O. Arik
Tejas Nama
Tomas Pfister
203
4
0
13 Aug 2024
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation
  Agents
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Xiao-Yang Liu
Tianjie Zhang
Yu Gu
Iat Long Iong
Yifan Xu
...
Zhengxiao Du
Chan Hee Song
Yu Su
Yuxiao Dong
Jie Tang
VLMLLMAG
253
68
0
12 Aug 2024
Towards a Generative Approach for Emotion Detection and Reasoning
Towards a Generative Approach for Emotion Detection and Reasoning
Ankita Bhaumik
T. Strzalkowski
ReLMLRM
216
4
0
09 Aug 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu
Haojia Lin
Zuwei Long
Chunjiang Ge
Meng Zhao
...
Rongrong Ji
Xing Sun
Ran He
Caifeng Shan
Xing Sun
MLLM
580
150
0
09 Aug 2024
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language
  Models
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
VLM
286
15
0
08 Aug 2024
Advancing Multimodal Large Language Models with Quantization-Aware Scale
  Learning for Efficient Adaptation
Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient AdaptationACM Multimedia (MM), 2024
Jingjing Xie
Yuxin Zhang
Mingbao Lin
Liujuan Cao
Rongrong Ji
MQ
184
15
0
07 Aug 2024
MoExtend: Tuning New Experts for Modality and Task Extension
MoExtend: Tuning New Experts for Modality and Task ExtensionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Shanshan Zhong
Shanghua Gao
Zhongzhan Huang
Wushao Wen
Marinka Zitnik
Pan Zhou
VLMMLLMMoE
282
11
0
07 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLMSyDaVLM
581
1,788
0
06 Aug 2024
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for
  Continual Learning
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning
Min Jae Jung
Romain Rouvoy
KELMMoECLL
245
6
0
31 Jul 2024
Autonomous Improvement of Instruction Following Skills via Foundation
  Models
Autonomous Improvement of Instruction Following Skills via Foundation Models
Zhiyuan Zhou
P. Atreya
Abraham Lee
Homer Walke
Oier Mees
Sergey Levine
254
27
0
30 Jul 2024
LLAVADI: What Matters For Multimodal Large Language Models Distillation
LLAVADI: What Matters For Multimodal Large Language Models Distillation
Shilin Xu
Xiangtai Li
Haobo Yuan
Lu Qi
Yunhai Tong
Ming-Hsuan Yang
225
15
0
28 Jul 2024
Data Processing Techniques for Modern Multimodal Models
Data Processing Techniques for Modern Multimodal Models
Yinheng Li
Han Ding
Hang Chen
VLM
298
4
0
27 Jul 2024
$VILA^2$: VILA Augmented VILA
VILA2VILA^2VILA2: VILA Augmented VILA
Yunhao Fang
Ligeng Zhu
Yao Lu
Yan Wang
Pavlo Molchanov
Jang Hyun Cho
Marco Pavone
Song Han
Hongxu Yin
VLM
256
4
0
24 Jul 2024
Multi-label Cluster Discrimination for Visual Representation Learning
Multi-label Cluster Discrimination for Visual Representation Learning
Xiang An
Kaicheng Yang
Xiangzi Dai
Ziyong Feng
Jiankang Deng
VLM
328
12
0
24 Jul 2024
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
Jihyung Kil
Zheda Mai
Justin Lee
Zihe Wang
Kerrie Cheng
Jingyan Bai
Ye Liu
A. Chowdhury
Wei-Lun Chao
CoGeVLM
387
1
0
23 Jul 2024
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal
  Large Language Model
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
Yiwei Ma
Zhibin Wang
Xiaoshuai Sun
Weihuang Lin
Qiang-feng Zhou
Jiayi Ji
Rongrong Ji
MLLMVLM
244
4
0
23 Jul 2024
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with
  Extensive Diversity
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Yangzhou Liu
Yue Cao
Zhangwei Gao
Weiyun Wang
Zhe Chen
...
Lewei Lu
Xizhou Zhu
Tong Lu
Yu Qiao
Jifeng Dai
VLMMLLM
322
41
0
22 Jul 2024
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Ziyuan Huang
Kaixiang Ji
Biao Gong
Zhiwu Qing
Qinglong Zhang
Kecheng Zheng
Jian Wang
Jingdong Chen
Ming Yang
LRM
229
5
0
22 Jul 2024
XAI meets LLMs: A Survey of the Relation between Explainable AI and
  Large Language Models
XAI meets LLMs: A Survey of the Relation between Explainable AI and Large Language Models
Xiaoshi Zhong
Lorenzo Malandri
Fabio Mercorio
Navid Nobani
Andrea Seveso
334
39
0
21 Jul 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
473
199
0
17 Jul 2024
Previous
123...151617...242526
Next
Page 16 of 26
Pageof 26