Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2209.09513
Cited By
v1
v2 (latest)
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Neural Information Processing Systems (NeurIPS), 2022
20 September 2022
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"
50 / 1,271 papers shown
Title
DIQ-H: Evaluating Hallucination Persistence in VLMs Under Temporal Visual Degradation
Zexin Lin
Hawen Wan
Yebin Zhong
Xiaoqiang
VLM
144
0
0
03 Dec 2025
MemVerse: Multimodal Memory for Lifelong Learning Agents
J. Liu
Yifei Sun
Weihua Cheng
Haodong Lei
Yirong Chen
...
Nianchen Deng
Yi Yu
Shuyue Hu
Botian Shi
Ding Wang
KELM
150
1
0
03 Dec 2025
See, Think, Learn: A Self-Taught Multimodal Reasoner
Sourabh Sharma
Sonam Gupta
Sadbhawna
ReLM
LRM
VLM
209
0
0
02 Dec 2025
Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models
Xiwen Wei
Mustafa Munir
R. Marculescu
CLL
251
0
0
02 Dec 2025
OneThinker: All-in-one Reasoning Model for Image and Video
Kaituo Feng
M. Zhang
Hongyu Li
Kaixuan Fan
Shuang Chen
...
Haoze Sun
Yan Feng
Peng Pei
Xunliang Cai
Xiangyu Yue
OffRL
MLLM
VLM
LRM
635
3
0
02 Dec 2025
Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models
Ziyi Tong
Feifei Sun
Le Minh Nguyen
8
0
0
02 Dec 2025
FiMMIA: scaling semantic perturbation-based membership inference across modalities
Anton A. Emelyanov
Sergei Kudriashov
Alena Fenogenova
116
0
0
02 Dec 2025
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models
Zhongyu Yang
Dannong Xu
Wei Pang
Yingfang Yuan
VLM
180
0
0
01 Dec 2025
PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models
Zeqing Wang
Keze Wang
Lei Zhang
VGen
104
0
0
01 Dec 2025
Comparative Analysis of 47 Context-Based Question Answer Models Across 8 Diverse Datasets
Muhammad Muneeb
David B. Ascher
Ahsan Baidar Bakht
48
0
0
29 Nov 2025
Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
Jiazhen Liu
Mingkuan Feng
Long Chen
68
0
0
29 Nov 2025
AgriCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture
Yibin Wen
Qingmei Li
Zi Ye
Jiarui Zhang
Jing Wu
...
Yang Zhang
Lingyuan Zhao
Haohuan Fu
Huang Jianxi
Juepeng Zheng
ReLM
LRM
160
0
0
28 Nov 2025
A Rosetta Stone for AI Benchmarks
A. Ho
Jean-Stanislas Denain
David Atanasov
Samuel Albanie
Rohin Shah
ELM
248
0
0
28 Nov 2025
EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens
Ze Feng
Sen Yang
Boqiang Duan
Wankou Yang
Jingdong Wang
VLM
157
0
0
26 Nov 2025
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
Yunze Man
S. S. Wang
Guowen Zhang
Johan Bjorck
Zhiqi Li
Liang-Yan Gui
Jim Fan
Jan Kautz
Yu Wang
Zhiding Yu
121
0
0
25 Nov 2025
M
3
^3
3
Prune: Hierarchical Communication Graph Pruning for Efficient Multi-Modal Multi-Agent Retrieval-Augmented Generation
Weizi Shao
Taolin Zhang
Zijie Zhou
Chen Chen
C. Wang
Xiaofeng He
68
0
0
25 Nov 2025
Harmonious Parameter Adaptation in Continual Visual Instruction Tuning for Safety-Aligned MLLMs
Z. J. Wang
Chang Che
Qi Wang
Hui Ma
Zenglin Shi
Cees G. M. Snoek
Meng Wang
CLL
192
0
0
25 Nov 2025
Object-Centric Vision Token Pruning for Vision Language Models
Guangyuan Li
R. Zhao
Jinhong Deng
Yanbo Wang
Joni Pajarinen
VLM
160
0
0
25 Nov 2025
INTERLACE: Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models
Parsa Madinei
Ryan Solgi
Ziqi Wen
Jonathan Skaza
Miguel P. Eckstein
Ramtin Pedarsani
VLM
165
0
0
24 Nov 2025
Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference
Wengyi Zhan
Mingbao Lin
Zhihang Lin
Rongrong Ji
MLLM
VLM
LRM
215
0
0
24 Nov 2025
Cross Domain Evaluation of Multimodal Chain-of-Thought Reasoning of different datasets into the Amazon CoT Framework
Nitya Tiwari
Parv Maheshwari
Vidisha Agarwal
LRM
96
0
0
24 Nov 2025
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
Juncheng Li
Y. Li
Hanxun Huang
Yunhao Chen
Xin Wang
Yixu Wang
Xingjun Ma
Yu-Gang Jiang
MLLM
AAML
VLM
184
0
0
24 Nov 2025
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Yiming Qin
Bomin Wei
Jiaxin Ge
Konstantinos Kallidromitis
Stephanie Fu
Trevor Darrell
Xudong Wang
LRM
VLM
180
0
0
24 Nov 2025
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
Meng Lu
Ran Xu
Yi Fang
Wenxuan Zhang
Yue Yu
...
Guanghua Xiao
Hanrui Wang
Di Jin
W. Shi
Xuan Wang
LRM
120
0
0
24 Nov 2025
VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal RL
Zengjie Hu
Jiantao Qiu
Tianyi Bai
Haojin Yang
Binhang Yuan
Qi Jing
Conghui He
Wentao Zhang
OffRL
210
0
0
24 Nov 2025
Self-Empowering VLMs: Achieving Hierarchical Consistency via Self-Elicited Knowledge Distillation
Wei Yang
Yiran Zhu
Zilin Li
Xunjia Zhang
Hongtao Wang
VLM
116
0
0
23 Nov 2025
FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning
Guoyang Xia
Yifeng Ding
Fengfa Li
Lei Ren
Wei Chen
Fangxiang Feng
Xiaojie Wang
MoE
VLM
180
0
0
22 Nov 2025
The PLLuM Instruction Corpus
Piotr Pęzik
Filip Żarnecki
Konrad Kaczyñski
A. Cichosz
Zuzanna Deckert
...
Konrad Wojtasik
Arkadiusz Janz
P. Kazienko
Julia Moska
Jan Kocoñ
88
0
0
21 Nov 2025
Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models
Mark Endo
Serena Yeung-Levy
LRM
233
0
0
21 Nov 2025
EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
Omkar Thawakar
Shravan Venkatraman
Ritesh Thawkar
Abdelrahman M. Shaker
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
Fahad A Khan
SyDa
LRM
VLM
313
2
0
20 Nov 2025
Learning to Think Fast and Slow for Visual Language Models
Chenyu Lin
Cheng Chi
Jinlin Wu
Sharon Li
Kaiyang Zhou
ReLM
VLM
225
0
0
20 Nov 2025
Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security
Wei Zhao
Zhe Li
Yige Li
Jun Sun
AAML
96
0
0
20 Nov 2025
Parameter Importance-Driven Continual Learning for Foundation Models
LingXiang Wang
Hainan Zhang
Zhiming Zheng
KELM
CLL
455
0
0
19 Nov 2025
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
Duo Li
Zuhao Yang
Xiaoqin Zhang
Ling Shao
Shijian Lu
VLM
146
1
0
19 Nov 2025
Multimodal Evaluation of Russian-language Architectures
Artem Chervyakov
Ulyana Isaeva
Anton A. Emelyanov
Artem Safin
Maria Tikhonova
...
Ilseyar Alimova
Ilseyar Alimova
A. Kapitanov
Alena Fenogenova
Alena Fenogenova
290
1
0
19 Nov 2025
Zero-Training Task-Specific Model Synthesis for Few-Shot Medical Image Classification
Yao Qin
Yangyang Yan
YuanChao Yang
Jinhua Pang
Huanyong Bi
Yuan Liu
HaiHua Wang
MedIm
128
0
0
18 Nov 2025
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization
Xuankun Rong
Wenke Huang
Tingfeng Wang
Daiguo Zhou
Bo Du
Mang Ye
LRM
201
0
0
17 Nov 2025
CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product
Kaiwen Xue
Chenglong Li
Zhonghong Ou
Guoxin Zhang
Kaoyan Lu
...
Xinyu Liu
Qunlin Chen
Weiwei Qin
Yiran Shen
Jiayi Cen
112
0
0
17 Nov 2025
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
Wenxin Zhu
Andong Chen
Yuchen Song
Kehai Chen
Conghui Zhu
Ziyan Chen
Tiejun Zhao
LRM
434
0
0
17 Nov 2025
Explore How to Inject Beneficial Noise in MLLMs
Ruishu Zhu
Sida Huang
Ziheng Jiao
Hongyuan Zhang
196
3
0
17 Nov 2025
Learning with Preserving for Continual Multitask Learning
H. Wang
Siwoo Bae
Zirong Chen
Meiyi Ma
CLL
172
0
0
11 Nov 2025
Knowledge-Augmented Long-CoT Generation for Complex Biomolecular Reasoning
Tianwen Lyu
Xiang Zhuang
Keyan Ding
Xinzhe Cao
Lei Liang
Wei Zhao
Qiang Zhang
H. Chen
LRM
103
0
0
11 Nov 2025
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads
Jingwei Ni
Ekaterina Fadeeva
Tianyi Wu
Mubashara Akhtar
Jiaheng Zhang
...
Markus Leippold
Timothy Baldwin
See-Kiong Ng
Artem Shelmanov
Mrinmaya Sachan
LRM
214
0
0
09 Nov 2025
NVIDIA Nemotron Nano V2 VL
Nvidia
Amala Sanjay Deshmukh
Kateryna Chumachenko
Tuomas Rintamaki
Matthieu Le
...
Krzysztof Pawelec
Michael Evans
Katherine Luna
Jie Lou
Erick Galinkin
VLM
309
2
0
06 Nov 2025
ChiMDQA: Towards Comprehensive Chinese Document QA with Fine-grained Evaluation
International Conference on Artificial Neural Networks (ICANN), 2025
Jing Gao
Shutiao Luo
Yumeng Liu
Yuanming Li
Hongji Zeng
96
0
0
05 Nov 2025
QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
Kuei-Chun Kao
Hsu Tzu-Yin
Yunqi Hong
Ruochen Wang
Cho-Jui Hsieh
LRM
128
0
0
05 Nov 2025
SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning
Fangxun Shu
Yongjie Ye
Yue Liao
Zijian Kang
Weijie Yin
Jiacong Wang
Xiao Liang
Shuicheng Yan
Chao Feng
OffRL
ReLM
LRM
237
1
0
04 Nov 2025
Enhancing Multimodal Reasoning via Latent Refocusing
Jizheng Ma
Xiaofei Zhou
Yanlong Song
Han Yan
VLM
LRM
170
1
0
04 Nov 2025
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
Tianfan Peng
Yuntao Du
Pengzhou Ji
Shijie Dong
Kailin Jiang
...
Jinhe Bi
Qian Li
Wei Du
Feng Xiao
Lizhen Cui
VLM
256
0
0
04 Nov 2025
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Yiyang Zhou
Haoqin Tu
Z. Wang
Zeyu Wang
Niklas Muennighoff
...
Shen Yan
Haoqi Fan
Cihang Xie
Huaxiu Yao
Qinghao Ye
LRM
250
2
0
04 Nov 2025
1
2
3
4
...
24
25
26
Next