Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2411.18203
Cited By
v1
v2
v3
v4
v5 (latest)
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Computer Vision and Pattern Recognition (CVPR), 2024
27 November 2024
Di Zhang
Jingdi Lei
Junxian Li
Xunzhi Wang
Yong Liu
Zonglin Yang
Jiatong Li
Weida Wang
Steve Yang
Jianbo Wu
Peng Ye
Wanli Ouyang
Dongzhan Zhou
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (40 upvotes)
Papers citing
"Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning"
50 / 77 papers shown
Title
From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning
C. Wang
Haozhe Wang
Xi Chen
J. Liu
Taofeng Xue
Chong Peng
Donglian Qi
Fangzhen Lin
Yunfeng Yan
OffRL
LRM
228
0
0
28 Nov 2025
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
Wenxin Zhu
Andong Chen
Yuchen Song
Kehai Chen
Conghui Zhu
Ziyan Chen
Tiejun Zhao
LRM
406
0
0
17 Nov 2025
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Gailun Zeng
Ziyang Luo
Hongzhan Lin
Yuchen Tian
Kaixin Li
Ziyang Gong
Jianxiong Guo
Jing Ma
92
1
0
12 Nov 2025
Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing
Yijia Wang
Yiqing Shen
Weiming Chen
Z. He
DiffM
128
0
0
31 Oct 2025
Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing
Xin Guo
Zhiheng Xi
Yiwen Ding
Yitao Zhai
X. Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
136
0
0
30 Oct 2025
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
J. Zhang
Song Jin
Chuanqi Cheng
Yuhan Liu
Yankai Lin
...
Yufei Zhang
F. Jiang
G. Yin
Wei Lin
Rui Yan
VLM
208
3
0
28 Oct 2025
Token-Level Inference-Time Alignment for Vision-Language Models
Kejia Chen
Jiawen Zhang
Jiacong Hu
Kewei Gao
Jian Lou
Zunlei Feng
Mingli Song
MLLM
VLM
249
0
0
20 Oct 2025
Generative Universal Verifier as Multimodal Meta-Reasoner
Xinchen Zhang
X. Zhang
Youbin Wu
Yanbin Cao
Renrui Zhang
Ruihang Chu
Ling Yang
Yujiu Yang
LRM
144
1
0
15 Oct 2025
Zoom-In to Sort AI-Generated Images Out
Yikun Ji
Y. Hong
Bowen Deng
Jun Lan
Huijia Zhu
Weiqiang Wang
Liqing Zhang
Jianfu Zhang
140
0
0
05 Oct 2025
Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces
John Gkountouras
Ivan Titov
LRM
80
0
0
30 Sep 2025
GeoRef: Referring Expressions in Geometry via Task Formulation, Synthetic Supervision, and Reinforced MLLM-based Solutions
Bing Liu
Wenqiang Yv
X. J. Yang
S. Wang
Junzhuo Liu
Peng Wang
G. Wang
Yang Yang
H. Shen
ObjD
147
0
0
25 Sep 2025
ORCA: Agentic Reasoning For Hallucination and Adversarial Robustness in Vision-Language Models
Chung-En Yu
Hsuan-Chih
Chen
Brian Jalaian
Nathaniel D. Bastian
AAML
VLM
LRM
109
0
0
18 Sep 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLM
CoGe
LRM
316
8
0
24 Aug 2025
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
Yuyang Wanyan
Xi Zhang
Haiyang Xu
Haowei Liu
Junyang Wang
...
Ming Yan
Fei Huang
Xiaoshan Yang
Weiming Dong
Changsheng Xu
LLMAG
LRM
390
9
0
05 Jun 2025
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding
Ruqi Zhang
ReLM
LRM
VLM
244
6
0
28 May 2025
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
Jiacheng Ruan
Wenzhen Yuan
Xian Gao
Ye Guo
Daoxin Zhang
Zhe Xu
Yao Hu
Ting Liu
Yuzhuo Fu
LRM
VLM
406
13
0
10 Mar 2025
SHAPE : Self-Improved Visual Preference Alignment by Iteratively Generating Holistic Winner
Kejia Chen
Jiawen Zhang
Jiacong Hu
Jiazhen Yang
Jian Lou
Zunlei Feng
Weilong Dai
320
1
0
06 Mar 2025
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
L. Yang
Xinchen Zhang
Ye Tian
Chenming Shang
Minghao Xu
Wentao Zhang
Tengjiao Wang
316
9
0
17 Feb 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
Feiyu Xiong
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
564
46
0
21 Jan 2025
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Ruilin Luo
Zhuofan Zheng
Yifan Wang
Xinzhe Ni
Zicheng Lin
...
Yiyao Yu
C. Shi
Ruihang Chu
Jin Zeng
Yujiu Yang
LRM
698
34
0
08 Jan 2025
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Mingyang Song
Zhaochen Su
Xiaoye Qu
Jiawei Zhou
Yu Cheng
LRM
610
63
0
06 Jan 2025
Rule Based Rewards for Language Model Safety
Neural Information Processing Systems (NeurIPS), 2024
Tong Mu
Alec Helyar
Johannes Heidecke
Joshua Achiam
Andrea Vallone
Ian Kivlichan
Molly Lin
Alex Beutel
John Schulman
Lilian Weng
ALM
318
92
0
02 Nov 2024
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
566
2,655
0
25 Oct 2024
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Jiayi He
Hehai Lin
Q. Wang
Yi R. Fung
Chenhui Xu
ReLM
LRM
536
22
0
05 Oct 2024
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Di Zhang
Jianbo Wu
Jingdi Lei
Tong Che
Jiatong Li
...
Shufei Zhang
Marco Pavone
Yuqiang Li
Wanli Ouyang
Dongzhan Zhou
LRM
187
89
0
03 Oct 2024
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
International Conference on Computational Linguistics (COLING), 2024
Gautier Dagan
Olga Loginova
Anil Batra
CoGe
225
1
0
17 Sep 2024
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Yuan Yao
Tianyu Yu
Ao Zhang
Chongyi Wang
Junbo Cui
...
Xu Han
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
VLM
MLLM
405
854
0
03 Aug 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
260
171
0
03 Jul 2024
LLM Critics Help Catch LLM Bugs
Nat McAleese
Rai Michael Pokorny
Juan Felipe Cerón Uribe
Evgenia Nitishinskaya
Maja Trebacz
Jan Leike
ALM
LRM
229
120
0
28 Jun 2024
VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning
Ziyang Meng
Yu Dai
Zezheng Gong
Shaoxiong Guo
Minglong Tang
Tongquan Wei
VLM
247
7
0
20 Jun 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
346
1,138
0
18 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
477
58
0
17 Jun 2024
TextGrad: Automatic "Differentiation" via Text
Mert Yuksekgonul
Federico Bianchi
Joseph Boen
Sheng Liu
Zhi Huang
Carlos Guestrin
James Zou
LLMAG
OOD
AI4CE
313
92
0
11 Jun 2024
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Di Zhang
Xiaoshui Huang
Dongzhan Zhou
Yuqiang Li
Xuming He
LRM
294
124
0
11 Jun 2024
Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models
Sangmin Woo
Donguk Kim
Jaehyuk Jang
Yubin Choi
Changick Kim
303
30
0
28 May 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Wanrong Zhu
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
430
62
0
24 May 2024
Calibrated Self-Rewarding Vision Language Models
Neural Information Processing Systems (NeurIPS), 2024
Yiyang Zhou
Zhiyuan Fan
Dongjie Cheng
Sihan Yang
Zhaorun Chen
Chenhang Cui
Xiyao Wang
Yun Li
Linjun Zhang
Huaxiu Yao
VLM
286
63
0
23 May 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying
Fanqing Meng
Jin Wang
Zhiqiang Li
Han Lin
...
Yali Wang
Yuning Qiao
Ping Luo
Kaipeng Zhang
Wenqi Shao
214
155
0
24 Apr 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Yuhang Zang
...
Haodong Duan
Yuan Liu
Yu Qiao
Dahua Lin
Feng Zhao
VLM
361
548
0
29 Mar 2024
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Yuxuan Yao
Han Wu
Zhijiang Guo
Biyan Zhou
Jiahui Gao
Sichun Luo
Hanxu Hou
Mingwen Liu
Linqi Song
LLMAG
LRM
329
13
0
28 Mar 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang
Dongzhi Jiang
Yichi Zhang
Haokun Lin
Ziyu Guo
...
Aojun Zhou
Pan Lu
Kai-Wei Chang
Shiyang Feng
Jiaming Song
216
448
0
21 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
413
629
0
08 Mar 2024
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Yiyang Zhou
Chenhang Cui
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
VLM
MLLM
247
161
0
18 Feb 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini
Xingdi Yuan
Nikolay Malkin
Rameswar Panda
Alessandro Sordoni
Rishabh Agarwal
ReLM
LRM
261
188
0
09 Feb 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Yuan Liu
VLM
MLLM
353
337
0
29 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
578
2,117
0
21 Dec 2023
Silkie: Preference Distillation for Large Visual Language Models
Lei Li
Zhihui Xie
Mukai Li
Shunian Chen
Peiyi Wang
Liang Chen
Yazheng Yang
Benyou Wang
Lingpeng Kong
MLLM
356
105
0
17 Dec 2023
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Computer Vision and Pattern Recognition (CVPR), 2023
Qidong Huang
Xiao-wen Dong
Pan Zhang
Bin Wang
Conghui He
Yuan Liu
Dahua Lin
Weiming Zhang
Neng H. Yu
MLLM
427
350
0
29 Nov 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
European Conference on Computer Vision (ECCV), 2023
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Yuan Liu
Feng Zhao
Dahua Lin
MLLM
VLM
343
927
0
21 Nov 2023
LLMs cannot find reasoning errors, but can correct them given the error location
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Gladys Tyen
Hassan Mansoor
Victor Carbune
Peter Chen
Tony Mak
LRM
473
86
0
14 Nov 2023
1
2
Next