ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.18203
  4. Cited By
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
v1v2v3v4v5 (latest)

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Computer Vision and Pattern Recognition (CVPR), 2024
27 November 2024
Di Zhang
Jingdi Lei
Junxian Li
Xunzhi Wang
Yong Liu
Zonglin Yang
Jiatong Li
Weida Wang
Steve Yang
Jianbo Wu
Peng Ye
Wanli Ouyang
Dongzhan Zhou
    OffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (40 upvotes)

Papers citing "Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning"

50 / 76 papers shown
Title
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
Wenxin Zhu
Andong Chen
Yuchen Song
Kehai Chen
Conghui Zhu
Ziyan Chen
Tiejun Zhao
LRM
382
0
0
17 Nov 2025
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal CritiqueConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Gailun Zeng
Ziyang Luo
Hongzhan Lin
Yuchen Tian
Kaixin Li
Ziyang Gong
Jianxiong Guo
Jing Ma
76
1
0
12 Nov 2025
Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing
Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing
Yijia Wang
Yiqing Shen
Weiming Chen
Z. He
DiffM
96
0
0
31 Oct 2025
Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing
Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing
Xin Guo
Zhiheng Xi
Yiwen Ding
Yitao Zhai
X. Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
104
0
0
30 Oct 2025
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
J. Zhang
Song Jin
Chuanqi Cheng
Yuhan Liu
Yankai Lin
...
Yufei Zhang
F. Jiang
G. Yin
Wei Lin
Rui Yan
VLM
180
3
0
28 Oct 2025
Token-Level Inference-Time Alignment for Vision-Language Models
Token-Level Inference-Time Alignment for Vision-Language Models
Kejia Chen
Jiawen Zhang
Jiacong Hu
Kewei Gao
Jian Lou
Zunlei Feng
Mingli Song
MLLMVLM
225
0
0
20 Oct 2025
Generative Universal Verifier as Multimodal Meta-Reasoner
Generative Universal Verifier as Multimodal Meta-Reasoner
Xinchen Zhang
X. Zhang
Youbin Wu
Yanbin Cao
Renrui Zhang
Ruihang Chu
Ling Yang
Yujiu Yang
LRM
112
1
0
15 Oct 2025
Zoom-In to Sort AI-Generated Images Out
Zoom-In to Sort AI-Generated Images Out
Yikun Ji
Y. Hong
Bowen Deng
Jun Lan
Huijia Zhu
Weiqiang Wang
Liqing Zhang
Jianfu Zhang
108
0
0
05 Oct 2025
Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces
Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces
John Gkountouras
Ivan Titov
LRM
56
0
0
30 Sep 2025
GeoRef: Referring Expressions in Geometry via Task Formulation, Synthetic Supervision, and Reinforced MLLM-based Solutions
GeoRef: Referring Expressions in Geometry via Task Formulation, Synthetic Supervision, and Reinforced MLLM-based Solutions
Bing Liu
Wenqiang Yv
X. J. Yang
S. Wang
Junzhuo Liu
Peng Wang
G. Wang
Yang Yang
H. Shen
ObjD
119
0
0
25 Sep 2025
ORCA: Agentic Reasoning For Hallucination and Adversarial Robustness in Vision-Language Models
ORCA: Agentic Reasoning For Hallucination and Adversarial Robustness in Vision-Language Models
Chung-En Yu
Hsuan-Chih
Chen
Brian Jalaian
Nathaniel D. Bastian
AAMLVLMLRM
97
0
0
18 Sep 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLMCoGeLRM
276
6
0
24 Aug 2025
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
Yuyang Wanyan
Xi Zhang
Haiyang Xu
Haowei Liu
Junyang Wang
...
Ming Yan
Fei Huang
Xiaoshan Yang
Weiming Dong
Changsheng Xu
LLMAGLRM
366
7
0
05 Jun 2025
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding
Ruqi Zhang
ReLMLRMVLM
236
6
0
28 May 2025
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
Jiacheng Ruan
Wenzhen Yuan
Xian Gao
Ye Guo
Daoxin Zhang
Zhe Xu
Yao Hu
Ting Liu
Yuzhuo Fu
LRMVLM
365
13
0
10 Mar 2025
SHAPE : Self-Improved Visual Preference Alignment by Iteratively Generating Holistic Winner
Kejia Chen
Jiawen Zhang
Jiacong Hu
Jiazhen Yang
Jian Lou
Zunlei Feng
Weilong Dai
284
1
0
06 Mar 2025
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
L. Yang
Xinchen Zhang
Ye Tian
Chenming Shang
Minghao Xu
Wentao Zhang
Tengjiao Wang
300
9
0
17 Feb 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
Feiyu Xiong
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
535
46
0
21 Jan 2025
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Ruilin Luo
Zhuofan Zheng
Yifan Wang
Xinzhe Ni
Zicheng Lin
...
Yiyao Yu
C. Shi
Ruihang Chu
Jin Zeng
Yujiu Yang
LRM
626
34
0
08 Jan 2025
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Mingyang Song
Zhaochen Su
Xiaoye Qu
Jiawei Zhou
Yu Cheng
LRM
538
63
0
06 Jan 2025
Rule Based Rewards for Language Model Safety
Rule Based Rewards for Language Model SafetyNeural Information Processing Systems (NeurIPS), 2024
Tong Mu
Alec Helyar
Johannes Heidecke
Joshua Achiam
Andrea Vallone
Ian Kivlichan
Molly Lin
Alex Beutel
John Schulman
Lilian Weng
ALM
286
89
0
02 Nov 2024
GPT-4o System Card
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
538
2,591
0
25 Oct 2024
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Jiayi He
Hehai Lin
Q. Wang
Yi R. Fung
Chenhui Xu
ReLMLRM
516
22
0
05 Oct 2024
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
  Mathematical Reasoning
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Di Zhang
Jianbo Wu
Jingdi Lei
Tong Che
Jiatong Li
...
Shufei Zhang
Marco Pavone
Yuqiang Li
Wanli Ouyang
Dongzhan Zhou
LRM
163
88
0
03 Oct 2024
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
CAST: Cross-modal Alignment Similarity Test for Vision Language ModelsInternational Conference on Computational Linguistics (COLING), 2024
Gautier Dagan
Olga Loginova
Anil Batra
CoGe
221
1
0
17 Sep 2024
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Yuan Yao
Tianyu Yu
Ao Zhang
Chongyi Wang
Junbo Cui
...
Xu Han
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
VLMMLLM
365
838
0
03 Aug 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
  Supporting Long-Contextual Input and Output
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
232
169
0
03 Jul 2024
LLM Critics Help Catch LLM Bugs
LLM Critics Help Catch LLM Bugs
Nat McAleese
Rai Michael Pokorny
Juan Felipe Cerón Uribe
Evgenia Nitishinskaya
Maja Trebacz
Jan Leike
ALMLRM
197
119
0
28 Jun 2024
VGA: Vision GUI Assistant -- Minimizing Hallucinations through
  Image-Centric Fine-Tuning
VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning
Ziyang Meng
Yu Dai
Zezheng Gong
Shaoxiong Guo
Minglong Tang
Tongquan Wei
VLM
191
7
0
20 Jun 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
294
1,115
0
18 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
433
58
0
17 Jun 2024
TextGrad: Automatic "Differentiation" via Text
TextGrad: Automatic "Differentiation" via Text
Mert Yuksekgonul
Federico Bianchi
Joseph Boen
Sheng Liu
Zhi Huang
Carlos Guestrin
James Zou
LLMAGOODAI4CE
281
88
0
11 Jun 2024
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo
  Tree Self-refine with LLaMa-3 8B
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Di Zhang
Xiaoshui Huang
Dongzhan Zhou
Yuqiang Li
Xuming He
LRM
254
124
0
11 Jun 2024
Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models
Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models
Sangmin Woo
Donguk Kim
Jaehyuk Jang
Yubin Choi
Changick Kim
275
28
0
28 May 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Wanrong Zhu
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
422
62
0
24 May 2024
Calibrated Self-Rewarding Vision Language Models
Calibrated Self-Rewarding Vision Language ModelsNeural Information Processing Systems (NeurIPS), 2024
Yiyang Zhou
Zhiyuan Fan
Dongjie Cheng
Sihan Yang
Zhaorun Chen
Chenhang Cui
Xiyao Wang
Yun Li
Linjun Zhang
Huaxiu Yao
VLM
246
63
0
23 May 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large
  Vision-Language Models Towards Multitask AGI
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying
Fanqing Meng
Jin Wang
Zhiqiang Li
Han Lin
...
Yali Wang
Yuning Qiao
Ping Luo
Kaipeng Zhang
Wenqi Shao
198
152
0
24 Apr 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Yuhang Zang
...
Haodong Duan
Yuan Liu
Yu Qiao
Dahua Lin
Feng Zhao
VLM
337
539
0
29 Mar 2024
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Yuxuan Yao
Han Wu
Zhijiang Guo
Biyan Zhou
Jiahui Gao
Sichun Luo
Hanxu Hou
Mingwen Liu
Linqi Song
LLMAGLRM
297
13
0
28 Mar 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual
  Math Problems?
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang
Dongzhi Jiang
Yichi Zhang
Haokun Lin
Ziyu Guo
...
Aojun Zhou
Pan Lu
Kai-Wei Chang
Shiyang Feng
Jiaming Song
192
441
0
21 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
365
618
0
08 Mar 2024
Aligning Modalities in Vision Large Language Models via Preference
  Fine-tuning
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Yiyang Zhou
Chenhang Cui
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
VLMMLLM
227
160
0
18 Feb 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini
Xingdi Yuan
Nikolay Malkin
Rameswar Panda
Alessandro Sordoni
Rishabh Agarwal
ReLMLRM
249
186
0
09 Feb 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and
  Comprehension in Vision-Language Large Model
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Yuan Liu
VLMMLLM
333
336
0
29 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLMMLLM
526
2,078
0
21 Dec 2023
Silkie: Preference Distillation for Large Visual Language Models
Silkie: Preference Distillation for Large Visual Language Models
Lei Li
Zhihui Xie
Mukai Li
Shunian Chen
Peiyi Wang
Liang Chen
Yazheng Yang
Benyou Wang
Lingpeng Kong
MLLM
328
105
0
17 Dec 2023
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models
  via Over-Trust Penalty and Retrospection-Allocation
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-AllocationComputer Vision and Pattern Recognition (CVPR), 2023
Qidong Huang
Xiao-wen Dong
Pan Zhang
Bin Wang
Conghui He
Yuan Liu
Dahua Lin
Weiming Zhang
Neng H. Yu
MLLM
407
341
0
29 Nov 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ShareGPT4V: Improving Large Multi-Modal Models with Better CaptionsEuropean Conference on Computer Vision (ECCV), 2023
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Yuan Liu
Feng Zhao
Dahua Lin
MLLMVLM
319
913
0
21 Nov 2023
LLMs cannot find reasoning errors, but can correct them given the error
  location
LLMs cannot find reasoning errors, but can correct them given the error locationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Gladys Tyen
Hassan Mansoor
Victor Carbune
Peter Chen
Tony Mak
LRM
401
86
0
14 Nov 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction TuningComputer Vision and Pattern Recognition (CVPR), 2023
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLMMLLM
540
3,997
0
05 Oct 2023
12
Next