ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.18203
  4. Cited By
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
v1v2v3v4v5 (latest)

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Computer Vision and Pattern Recognition (CVPR), 2024
27 November 2024
Di Zhang
Jingdi Lei
Junxian Li
Xunzhi Wang
Yong Liu
Zonglin Yang
Jiatong Li
Weida Wang
Steve Yang
Jianbo Wu
Peng Ye
Wanli Ouyang
Dongzhan Zhou
    OffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (40 upvotes)

Papers citing "Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning"

26 / 76 papers shown
Title
MathVista: Evaluating Mathematical Reasoning of Foundation Models in
  Visual Contexts
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual ContextsInternational Conference on Learning Representations (ICLR), 2023
Pan Lu
Hritik Bansal
Tony Xia
Hamish Ivison
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRMMLLM
499
1,083
0
03 Oct 2023
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Zhengyuan Yang
Linjie Li
Kevin Qinghong Lin
Jianfeng Wang
Chung-Ching Lin
Nasim Shakouri Mahmoudabadi
Lijuan Wang
LM&MA
293
805
0
29 Sep 2023
Aligning Large Multimodal Models with Factually Augmented RLHF
Aligning Large Multimodal Models with Factually Augmented RLHFAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhiqing Sun
Sheng Shen
Shengcao Cao
Haotian Liu
Chunyuan Li
...
Liangyan Gui
Yu-Xiong Wang
Yiming Yang
Kurt Keutzer
Trevor Darrell
VLM
257
567
0
25 Sep 2023
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive
  Vision-Language Models
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
...
Simon Kornblith
Pang Wei Koh
Gabriel Ilharco
Mitchell Wortsman
Ludwig Schmidt
MLLM
320
525
0
02 Aug 2023
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
Bohao Li
Rui Wang
Guangzhi Wang
Yuying Ge
Yixiao Ge
Ying Shan
MLLMELM
407
764
0
30 Jul 2023
MMBench: Is Your Multi-modal Model an All-around Player?
MMBench: Is Your Multi-modal Model an All-around Player?European Conference on Computer Vision (ECCV), 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
...
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
552
1,595
0
12 Jul 2023
Mitigating Hallucination in Large Multi-Modal Models via Robust
  Instruction Tuning
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction TuningInternational Conference on Learning Representations (ICLR), 2023
Fuxiao Liu
Kevin Qinghong Lin
Linjie Li
Jianfeng Wang
Yaser Yacoob
Lijuan Wang
VLMMLLM
355
389
0
26 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
755
6,364
0
29 May 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Tree of Thoughts: Deliberate Problem Solving with Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Shunyu Yao
Dian Yu
Jeffrey Zhao
Izhak Shafran
Thomas Griffiths
Yuan Cao
Karthik Narasimhan
LM&RoLRMAI4CE
463
2,967
0
17 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Evaluating Object Hallucination in Large Vision-Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLMLRM
651
1,210
0
17 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
404
2,632
0
20 Apr 2023
Visual Instruction Tuning
Visual Instruction TuningNeural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
950
7,136
0
17 Apr 2023
REFINER: Reasoning Feedback on Intermediate Representations
REFINER: Reasoning Feedback on Intermediate RepresentationsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Debjit Paul
Mete Ismayilzada
Maxime Peyrard
Beatriz Borges
Antoine Bosselut
Robert West
Boi Faltings
ReLMLRM
290
218
0
04 Apr 2023
Self-Refine: Iterative Refinement with Self-Feedback
Self-Refine: Iterative Refinement with Self-FeedbackNeural Information Processing Systems (NeurIPS), 2023
Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
...
Bodhisattwa Prasad Majumder
Katherine Hermann
Sean Welleck
Amir Yazdanbakhsh
Peter Clark
ReLMLRMDiffM
692
2,470
0
30 Mar 2023
VAD: Vectorized Scene Representation for Efficient Autonomous Driving
VAD: Vectorized Scene Representation for Efficient Autonomous DrivingIEEE International Conference on Computer Vision (ICCV), 2023
Bo Jiang
Shaoyu Chen
Qing Xu
Bencheng Liao
Jiajie Chen
Helong Zhou
Qian Zhang
Wenyu Liu
Chang Huang
Xinggang Wang
424
428
0
21 Mar 2023
PaLM-E: An Embodied Multimodal Language Model
PaLM-E: An Embodied Multimodal Language ModelInternational Conference on Machine Learning (ICML), 2023
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
...
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
357
2,153
0
06 Mar 2023
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
  Reasoning, Hallucination, and Interactivity
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and InteractivityInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
Yejin Bang
Samuel Cahyawijaya
Nayeon Lee
Wenliang Dai
Jane Polak Scowcroft
...
Tiezheng Yu
Willy Chung
Quyet V. Do
Yan Xu
Pascale Fung
ReLMLRM
614
1,583
0
08 Feb 2023
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
O. Yu. Golovneva
Moya Chen
Spencer Poff
Martin Corredor
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ReLMLRM
262
192
0
15 Dec 2022
In-context Reinforcement Learning with Algorithm Distillation
In-context Reinforcement Learning with Algorithm DistillationInternational Conference on Learning Representations (ICLR), 2022
Michael Laskin
Luyu Wang
Junhyuk Oh
Emilio Parisotto
Stephen Spencer
...
Ethan A. Brooks
Maxime Gazeau
Himanshu Sahni
Satinder Singh
Volodymyr Mnih
OffRL
181
166
0
25 Oct 2022
Automatic Chain of Thought Prompting in Large Language Models
Automatic Chain of Thought Prompting in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2022
Zhuosheng Zhang
Aston Zhang
Mu Li
Alexander J. Smola
ReLMLRM
416
826
0
07 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
303
461
0
06 Oct 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question AnsweringNeural Information Processing Systems (NeurIPS), 2022
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELMReLMLRM
506
1,798
0
20 Sep 2022
ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal
  Feature Learning
ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature LearningEuropean Conference on Computer Vision (ECCV), 2022
Shengchao Hu
Li Chen
Peng Wu
Guoying Gu
Junchi Yan
Dacheng Tao
223
364
0
15 Jul 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot ReasonersNeural Information Processing Systems (NeurIPS), 2022
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLMLRM
1.3K
5,855
0
24 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot LearningNeural Information Processing Systems (NeurIPS), 2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
666
4,695
0
29 Apr 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
2.1K
14,012
0
28 Jan 2022
Previous
12