Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2311.17331
Cited By
v1
v2 (latest)
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
17 February 2025
Zeqing Wang
Wentao Wan
Qiqing Lao
Runmeng Chen
Minjie Lang
Keze Wang
Liang Lin
Guanbin Li
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering"
50 / 72 papers shown
PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models
Zeqing Wang
Keze Wang
Lei Zhang
VGen
189
0
0
01 Dec 2025
VideoVerse: Does Your T2V Generator Have World Model Capability to Synthesize Videos?
Zeqing Wang
Xinyu Wei
Bairui Li
Zhen Guo
Jinrui Zhang
Hongyang Wei
Keze Wang
Lei Zhang
VGen
428
10
0
09 Oct 2025
Visual Question Decomposition on Multimodal Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Haowei Zhang
Jianzhe Liu
Zhen Han
Shuo Chen
Bailan He
Volker Tresp
Zhiqiang Xu
Jindong Gu
433
8
0
28 Sep 2024
Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation
Tianyidan Xie
Rui Ma
Qian Wang
Xiaoqian Ye
Feixuan Liu
Ying Tai
Ying Tai
Lanjun Wang
Zili Yi
DiffM
MLLM
298
1
0
29 Apr 2024
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLM
LRM
684
8
0
03 Mar 2024
More Agents Is All You Need
Junyou Li
Qin Zhang
Yangbin Yu
Qiang Fu
Deheng Ye
LLMAG
465
145
0
03 Feb 2024
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibi Wang
Weifeng Ge
LRM
505
13
0
19 Jan 2024
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
Zhi Gao
Yuntao Du
Xintong Zhang
Xiaojian Ma
Wenjuan Han
Song-Chun Zhu
Qing Li
LLMAG
VLM
448
51
0
18 Dec 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
1.8K
1,402
0
16 Nov 2023
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts
ACM Multimedia (ACM MM), 2023
Yunshi Lan
Xiang Li
Xin Liu
Yang Li
Wei Qin
Weining Qian
LRM
ReLM
498
41
0
15 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
Neural Information Processing Systems (NeurIPS), 2023
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
840
778
0
06 Nov 2023
Exploring Question Decomposition for Zero-Shot VQA
Neural Information Processing Systems (NeurIPS), 2023
Zaid Khan
B. Vijaykumar
S. Schulter
Manmohan Chandraker
Yun Fu
ReLM
261
21
0
25 Oct 2023
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Science China Information Sciences (Sci China Inf Sci), 2023
Xinglong Mao
Chaoyou Fu
Zhengye Zhang
Tong Xu
Hao Wang
Dianbo Sui
Chunjiang Ge
Ke Li
Xingguo Sun
Enhong Chen
VLM
MLLM
433
231
0
24 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Neural Information Processing Systems (NeurIPS), 2023
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
338
102
0
23 Oct 2023
A Simple Baseline for Knowledge-Based Visual Question Answering
Alexandros Xenos
Themos Stafylakis
Ioannis Patras
Georgios Tzimiropoulos
387
24
0
20 Oct 2023
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
LRM
384
52
0
08 Sep 2023
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs
Ziyi Tang
Ruilin Wang
Weixing Chen
Keze Wang
Zehua Wang
Tianshui Chen
Liang Lin
Tianshui Chen
Liang Lin
LRM
365
12
0
23 Aug 2023
A Survey on Large Language Model based Autonomous Agents
Lei Wang
Chengbang Ma
Xueyang Feng
Zeyu Zhang
Hao-ran Yang
...
Xu Chen
Yankai Lin
Wayne Xin Zhao
Zhewei Wei
Ji-Rong Wen
LLMAG
AI4CE
LM&Ro
859
2,667
0
22 Aug 2023
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
AAAI Conference on Artificial Intelligence (AAAI), 2023
Wenbo Hu
Y. Xu
Jian Wang
W. Li
Zhe Chen
Zhuowen Tu
MLLM
VLM
477
201
0
19 Aug 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
12.4K
16,448
0
18 Jul 2023
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Zhenhailong Wang
Shaoguang Mao
Wenshan Wu
Tao Ge
Furu Wei
Heng Ji
LLMAG
LRM
768
265
0
11 Jul 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
498
114
0
14 Jun 2023
Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Computer Vision and Pattern Recognition (CVPR), 2023
Zaid Khan
B. Vijaykumar
S. Schulter
Xiang Yu
Y. Fu
Manmohan Chandraker
VLM
MLLM
346
26
0
06 Jun 2023
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Neural Information Processing Systems (NeurIPS), 2023
Chunyuan Li
Cliff Wong
Sheng Zhang
Naoto Usuyama
Haotian Liu
Jianwei Yang
Tristan Naumann
Hoifung Poon
Jianfeng Gao
LM&MA
MedIm
457
1,568
0
01 Jun 2023
Collaborative Multi-Agent Video Fast-Forwarding
IEEE transactions on multimedia (IEEE TMM), 2023
Shuyue Lan
Zhilu Wang
Ermin Wei
Amit K. Roy-Chowdhury
Qi Zhu
229
4
0
27 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLM
MLLM
LRM
520
65
0
24 May 2023
Improving Factuality and Reasoning in Language Models through Multiagent Debate
International Conference on Machine Learning (ICML), 2023
Yilun Du
Shuang Li
Antonio Torralba
J. Tenenbaum
Igor Mordatch
LLMAG
LRM
493
1,502
0
23 May 2023
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
IEEE transactions on multimedia (IEEE TMM), 2023
Linhui Xiao
Xiaoshan Yang
Fang Peng
Ming Yan
Yaowei Wang
Changsheng Xu
ObjD
VLM
565
67
0
15 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Neural Information Processing Systems (NeurIPS), 2023
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLM
VLM
1.9K
3,275
0
11 May 2023
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Lei Wang
Wanyu Xu
Yihuai Lan
Zhiqiang Hu
Yunshi Lan
Roy Ka-wei Lee
Ee-Peng Lim
ReLM
LRM
624
670
0
06 May 2023
Retrieval-based Knowledge Augmented Vision Language Pre-training
ACM Multimedia (ACM MM), 2023
Jiahua Rao
Zifei Shan
Long Liu
Yao Zhou
Yuedong Yang
VLM
351
26
0
27 Apr 2023
Visual Instruction Tuning
Neural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
1.4K
8,828
0
17 Apr 2023
Generative Agents: Interactive Simulacra of Human Behavior
ACM Symposium on User Interface Software and Technology (UIST), 2023
Cristina Mata
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Abigail Z. Jacobs
Michael S. Bernstein
LM&Ro
AI4CE
1.1K
3,775
0
07 Apr 2023
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement
IEEE International Conference on Computer Vision (ICCV), 2023
Xiang-yu Zhu
Renrui Zhang
Bowei He
A-Long Zhou
Dong Wang
Bingyan Zhao
Shiyang Feng
VLM
295
119
0
03 Apr 2023
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
E. Azarnasab
Faisal Ahmed
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
ReLM
KELM
LRM
411
545
0
20 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
IEEE International Conference on Computer Vision (ICCV), 2023
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
453
703
0
14 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
20.2K
19,316
0
27 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
International Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
1.6K
7,623
0
30 Jan 2023
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
Computer Vision and Pattern Recognition (CVPR), 2022
Jiaxian Guo
Junnan Li
Dongxu Li
A. M. H. Tiong
Boyang Albert Li
Dacheng Tao
Steven C. H. Hoi
VLM
MLLM
561
174
0
21 Dec 2022
Visual Programming: Compositional visual reasoning without training
Computer Vision and Pattern Recognition (CVPR), 2022
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
578
635
0
18 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
488
134
0
15 Nov 2022
Scaling Instruction-Finetuned Language Models
Journal of machine learning research (JMLR), 2022
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
1.8K
4,038
0
20 Oct 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
Neural Information Processing Systems (NeurIPS), 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
1.5K
4,964
0
16 Oct 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Neural Information Processing Systems (NeurIPS), 2022
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
736
2,137
0
20 Sep 2022
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA
ACM Multimedia (ACM MM), 2022
Yangyang Guo
Liqiang Nie
Yongkang Wong
Zichen Liu
Zhiyong Cheng
Mohan S. Kankanhalli
226
55
0
30 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
European Conference on Computer Vision (ECCV), 2022
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
594
866
0
03 Jun 2022
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Neural Information Processing Systems (NeurIPS), 2022
Yuanze Lin
Yujia Xie
Dongdong Chen
Yichong Xu
Chenguang Zhu
Lu Yuan
379
108
0
02 Jun 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Neural Information Processing Systems (NeurIPS), 2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
869
5,564
0
29 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Computer Vision and Pattern Recognition (CVPR), 2022
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
487
564
0
07 Apr 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
International Conference on Learning Representations (ICLR), 2022
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
3.7K
6,303
0
21 Mar 2022
1
2
Next
Page 1 of 2