Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2209.09513
Cited By
v1
v2 (latest)
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Neural Information Processing Systems (NeurIPS), 2022
20 September 2022
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"
50 / 1,266 papers shown
Title
FiMMIA: scaling semantic perturbation-based membership inference across modalities
Anton A. Emelyanov
Sergei Kudriashov
Alena Fenogenova
80
0
0
02 Dec 2025
See, Think, Learn: A Self-Taught Multimodal Reasoner
Sourabh Sharma
Sonam Gupta
Sadbhawna
ReLM
LRM
VLM
161
0
0
02 Dec 2025
PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models
Zeqing Wang
Keze Wang
Lei Zhang
VGen
72
0
0
01 Dec 2025
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models
Zhongyu Yang
Dannong Xu
Wei Pang
Yingfang Yuan
VLM
64
0
0
01 Dec 2025
Comparative Analysis of 47 Context-Based Question Answer Models Across 8 Diverse Datasets
Muhammad Muneeb
David B. Ascher
Ahsan Baidar Bakht
24
0
0
29 Nov 2025
Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
Jiazhen Liu
Mingkuan Feng
Long Chen
28
0
0
29 Nov 2025
A Rosetta Stone for AI Benchmarks
A. Ho
Jean-Stanislas Denain
David Atanasov
Samuel Albanie
Rohin Shah
ELM
156
0
0
28 Nov 2025
AgriCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture
Yibin Wen
Qingmei Li
Zi Ye
Jiarui Zhang
Jing Wu
...
Yang Zhang
Lingyuan Zhao
Haohuan Fu
Huang Jianxi
Juepeng Zheng
ReLM
LRM
136
0
0
28 Nov 2025
EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens
Ze Feng
Sen Yang
Boqiang Duan
Wankou Yang
Jingdong Wang
VLM
133
0
0
26 Nov 2025
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
Yunze Man
S. S. Wang
Guowen Zhang
Johan Bjorck
Zhiqi Li
Liang-Yan Gui
Jim Fan
Jan Kautz
Yu Wang
Zhiding Yu
109
0
0
25 Nov 2025
M
3
^3
3
Prune: Hierarchical Communication Graph Pruning for Efficient Multi-Modal Multi-Agent Retrieval-Augmented Generation
Weizi Shao
Taolin Zhang
Zijie Zhou
Chen Chen
C. Wang
Xiaofeng He
56
0
0
25 Nov 2025
Harmonious Parameter Adaptation in Continual Visual Instruction Tuning for Safety-Aligned MLLMs
Z. J. Wang
Chang Che
Qi Wang
Hui Ma
Zenglin Shi
Cees G. M. Snoek
Meng Wang
CLL
156
0
0
25 Nov 2025
Object-Centric Vision Token Pruning for Vision Language Models
Guangyuan Li
R. Zhao
Jinhong Deng
Yanbo Wang
Joni Pajarinen
VLM
132
0
0
25 Nov 2025
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Yiming Qin
Bomin Wei
Jiaxin Ge
Konstantinos Kallidromitis
Stephanie Fu
Trevor Darrell
Xudong Wang
LRM
VLM
136
0
0
24 Nov 2025
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
Juncheng Li
Y. Li
Hanxun Huang
Yunhao Chen
Xin Wang
Yixu Wang
Xingjun Ma
Yu-Gang Jiang
MLLM
AAML
VLM
172
0
0
24 Nov 2025
INTERLACE: Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models
Parsa Madinei
Ryan Solgi
Ziqi Wen
Jonathan Skaza
Miguel P. Eckstein
Ramtin Pedarsani
VLM
149
0
0
24 Nov 2025
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
Meng Lu
Ran Xu
Yi Fang
Wenxuan Zhang
Yue Yu
...
Guanghua Xiao
Hanrui Wang
Di Jin
W. Shi
Xuan Wang
LRM
108
0
0
24 Nov 2025
Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference
Wengyi Zhan
Mingbao Lin
Zhihang Lin
Rongrong Ji
MLLM
VLM
LRM
187
0
0
24 Nov 2025
Cross Domain Evaluation of Multimodal Chain-of-Thought Reasoning of different datasets into the Amazon CoT Framework
Nitya Tiwari
Parv Maheshwari
Vidisha Agarwal
LRM
84
0
0
24 Nov 2025
VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal RL
Zengjie Hu
Jiantao Qiu
Tianyi Bai
Haojin Yang
Binhang Yuan
Qi Jing
Conghui He
Wentao Zhang
OffRL
182
0
0
24 Nov 2025
Self-Empowering VLMs: Achieving Hierarchical Consistency via Self-Elicited Knowledge Distillation
Wei Yang
Yiran Zhu
Zilin Li
Xunjia Zhang
Hongtao Wang
VLM
92
0
0
23 Nov 2025
FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning
Guoyang Xia
Yifeng Ding
Fengfa Li
Lei Ren
Wei Chen
Fangxiang Feng
Xiaojie Wang
MoE
VLM
124
0
0
22 Nov 2025
The PLLuM Instruction Corpus
Piotr Pęzik
Filip Żarnecki
Konrad Kaczyñski
A. Cichosz
Zuzanna Deckert
...
Konrad Wojtasik
Arkadiusz Janz
P. Kazienko
Julia Moska
Jan Kocoñ
80
0
0
21 Nov 2025
Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models
Mark Endo
Serena Yeung-Levy
LRM
193
0
0
21 Nov 2025
EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
Omkar Thawakar
Shravan Venkatraman
Ritesh Thawkar
Abdelrahman M. Shaker
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
Fahad A Khan
SyDa
LRM
VLM
250
2
0
20 Nov 2025
Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security
Wei Zhao
Zhe Li
Yige Li
Jun Sun
AAML
92
0
0
20 Nov 2025
Learning to Think Fast and Slow for Visual Language Models
Chenyu Lin
Cheng Chi
Jinlin Wu
Sharon Li
Kaiyang Zhou
ReLM
VLM
221
0
0
20 Nov 2025
Parameter Importance-Driven Continual Learning for Foundation Models
LingXiang Wang
Hainan Zhang
Zhiming Zheng
KELM
CLL
414
0
0
19 Nov 2025
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
Duo Li
Zuhao Yang
Xiaoqin Zhang
Ling Shao
Shijian Lu
VLM
129
1
0
19 Nov 2025
Multimodal Evaluation of Russian-language Architectures
Artem Chervyakov
Ulyana Isaeva
Anton A. Emelyanov
Artem Safin
Maria Tikhonova
...
Ilseyar Alimova
Ilseyar Alimova
A. Kapitanov
Alena Fenogenova
Alena Fenogenova
250
1
0
19 Nov 2025
Zero-Training Task-Specific Model Synthesis for Few-Shot Medical Image Classification
Yao Qin
Yangyang Yan
YuanChao Yang
Jinhua Pang
Huanyong Bi
Yuan Liu
HaiHua Wang
MedIm
104
0
0
18 Nov 2025
CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product
Kaiwen Xue
Chenglong Li
Zhonghong Ou
Guoxin Zhang
Kaoyan Lu
...
Xinyu Liu
Qunlin Chen
Weiwei Qin
Yiran Shen
Jiayi Cen
96
0
0
17 Nov 2025
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
Wenxin Zhu
Andong Chen
Yuchen Song
Kehai Chen
Conghui Zhu
Ziyan Chen
Tiejun Zhao
LRM
398
0
0
17 Nov 2025
Explore How to Inject Beneficial Noise in MLLMs
Ruishu Zhu
Sida Huang
Ziheng Jiao
Hongyuan Zhang
164
3
0
17 Nov 2025
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization
Xuankun Rong
Wenke Huang
Tingfeng Wang
Daiguo Zhou
Bo Du
Mang Ye
LRM
189
0
0
17 Nov 2025
Learning with Preserving for Continual Multitask Learning
H. Wang
Siwoo Bae
Zirong Chen
Meiyi Ma
CLL
152
0
0
11 Nov 2025
Knowledge-Augmented Long-CoT Generation for Complex Biomolecular Reasoning
Tianwen Lyu
Xiang Zhuang
Keyan Ding
Xinzhe Cao
Lei Liang
Wei Zhao
Qiang Zhang
H. Chen
LRM
67
0
0
11 Nov 2025
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads
Jingwei Ni
Ekaterina Fadeeva
Tianyi Wu
Mubashara Akhtar
Jiaheng Zhang
...
Markus Leippold
Timothy Baldwin
See-Kiong Ng
Artem Shelmanov
Mrinmaya Sachan
LRM
174
0
0
09 Nov 2025
NVIDIA Nemotron Nano V2 VL
Nvidia
Amala Sanjay Deshmukh
Kateryna Chumachenko
Tuomas Rintamaki
Matthieu Le
...
Krzysztof Pawelec
Michael Evans
Katherine Luna
Jie Lou
Erick Galinkin
VLM
268
1
0
06 Nov 2025
ChiMDQA: Towards Comprehensive Chinese Document QA with Fine-grained Evaluation
International Conference on Artificial Neural Networks (ICANN), 2025
Jing Gao
Shutiao Luo
Yumeng Liu
Yuanming Li
Hongji Zeng
68
0
0
05 Nov 2025
QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
Kuei-Chun Kao
Hsu Tzu-Yin
Yunqi Hong
Ruochen Wang
Cho-Jui Hsieh
LRM
112
0
0
05 Nov 2025
SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning
Fangxun Shu
Yongjie Ye
Yue Liao
Zijian Kang
Weijie Yin
Jiacong Wang
Xiao Liang
Shuicheng Yan
Chao Feng
OffRL
ReLM
LRM
213
1
0
04 Nov 2025
CoCoVa: Chain of Continuous Vision-Language Thought for Latent Space Reasoning
Jizheng Ma
Xiaofei Zhou
Yanlong Song
Han Yan
VLM
LRM
153
1
0
04 Nov 2025
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
Tianfan Peng
Yuntao Du
Pengzhou Ji
Shijie Dong
Kailin Jiang
...
Jinhe Bi
Qian Li
Wei Du
Feng Xiao
Lizhen Cui
VLM
212
0
0
04 Nov 2025
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Yiyang Zhou
Haoqin Tu
Z. Wang
Zeyu Wang
Niklas Muennighoff
...
Shen Yan
Haoqi Fan
Cihang Xie
Huaxiu Yao
Qinghao Ye
LRM
218
2
0
04 Nov 2025
OMEGA: Optimized Multimodal Position Encoding Index Derivation with Global Adaptive Scaling for Vision-Language Models
Ruoxiang Huang
Xindian Ma
Rundong Kong
Zhen Yuan
Peng Zhang
VLM
109
0
0
02 Nov 2025
ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use
Mengjie Deng
Guanting Dong
Zhicheng Dou
84
1
0
31 Oct 2025
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
Ziyu Guo
Xinyan Chen
Renrui Zhang
Ruichuan An
Yu Qi
Dongzhi Jiang
Xiangtai Li
M. Zhang
Jiaming Song
Pheng-Ann Heng
VGen
LRM
124
9
0
30 Oct 2025
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
Fenfen Lin
Y. Liu
Haiyu Xu
Chen Yue
Zheqi He
Mingxuan Zhao
Miguel Hu Chen
Jiakang Liu
JG Yao
Xi Yang
VLM
88
0
0
30 Oct 2025
StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA
Yuhang Hu
Zhenyu Yang
S. S. Wang
Shengsheng Qian
Bin Wen
Fan Yang
Tingting Gao
Changsheng Xu
VGen
LRM
124
0
0
29 Oct 2025
1
2
3
4
...
24
25
26
Next