Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.15166
Cited By
Large Language Models are Visual Reasoning Coordinators
23 October 2023
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Language Models are Visual Reasoning Coordinators"
50 / 51 papers shown
Title
Unlocking the Capabilities of Vision-Language Models for Generalizable and Explainable Deepfake Detection
Peipeng Yu
Jianwei Fei
Hui Gao
Xuan Feng
Zhihua Xia
Chip-Hong Chang
MLLM
VLM
70
1
0
19 Mar 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
58
0
0
18 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
46
0
0
02 Mar 2025
Spatial Reasoning with Denoising Models
Christopher Wewer
Bart Pogodzinski
Bernt Schiele
J. E. Lenssen
DiffM
LRM
35
0
0
28 Feb 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
Xinwei Long
Zhiyuan Ma
Ermo Hua
Kaiyan Zhang
Biqing Qi
Bowen Zhou
RALM
46
0
0
23 Feb 2025
EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking
Anjiang Wei
Jiannan Cao
Ran Li
H. Chen
Y. Zhang
...
Yuan Liu
Thiago S. F. X. Teixeira
D. Yang
Ke Wang
Alex Aiken
LRM
47
1
0
18 Feb 2025
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
Zeqing Wang
Wentao Wan
Qiqing Lao
Runmeng Chen
Minjie Lang
Keze Wang
Liang Lin
Liang Lin
LRM
94
3
0
17 Feb 2025
Meta-Feature Adapter: Integrating Environmental Metadata for Enhanced Animal Re-identification
Yuzhuo Li
Di Zhao
Yihao Wu
Yun Sing Koh
69
0
0
23 Jan 2025
SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation
Hang Zhang
Zhuoling Li
Jun Liu
LRM
100
1
0
15 Dec 2024
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
88
0
0
04 Dec 2024
Right this way: Can VLMs Guide Us to See More to Answer Questions?
Li Liu
Diji Yang
Sijia Zhong
Kalyana Suma Sree Tholeti
Lei Ding
Yi Zhang
Leilani H. Gilpin
31
2
0
01 Nov 2024
Exploring Foundation Models in Remote Sensing Image Change Detection: A Comprehensive Survey
Zihan Yu
Tianxiao Li
Yuxin Zhu
Rongze Pan
30
0
0
10 Oct 2024
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
...
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
34
6
0
27 Sep 2024
Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators
Harsh Lunia
24
0
0
20 Jul 2024
Improving Multi-Agent Debate with Sparse Communication Topology
Yunxuan Li
Yibing Du
Jiageng Zhang
Le Hou
Peter Grabowski
Yeqing Li
Eugene Ie
LLMAG
26
18
0
17 Jun 2024
Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps
Jian Chen
Peilin Zhou
Yining Hua
Dading Chong
Meng Cao
Yaowei Li
Zixuan Yuan
Bing Zhu
Junwei Liang
VLM
33
1
0
14 Jun 2024
Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
Xin Wang
Fangfang Liu
Zheng Li
Caili Guo
16
1
0
06 Jun 2024
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Yuanhan Zhang
Kaichen Zhang
Bo-wen Li
Fanyi Pu
Christopher Arif Setiadharma
Jingkang Yang
Ziwei Liu
VGen
47
7
0
06 May 2024
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
Dongze Hao
Qunbo Wang
Longteng Guo
Jie Jiang
Jing Liu
24
0
0
22 Apr 2024
A Survey of Optimization-based Task and Motion Planning: From Classical To Learning Approaches
Zhigen Zhao
Shuo Cheng
Yan Ding
Ziyi Zhou
Shiqi Zhang
Danfei Xu
Ye Zhao
33
21
0
03 Apr 2024
Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration
Shu Zhao
Xiaohan Zou
Tan Yu
Huijuan Xu
25
1
0
17 Mar 2024
Embodied Understanding of Driving Scenarios
Yunsong Zhou
Linyan Huang
Qingwen Bu
Jia Zeng
Tianyu Li
Hang Qiu
Hongzi Zhu
Minyi Guo
Yu Qiao
Hongyang Li
LM&Ro
41
30
0
07 Mar 2024
Grounding Language Models for Visual Entity Recognition
Zilin Xiao
Ming Gong
Paola Cascante-Bonilla
Xingyao Zhang
Jie Wu
Vicente Ordonez
VLM
33
8
0
28 Feb 2024
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
37
4
0
23 Feb 2024
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen
Zhuo Xu
Sean Kirmani
Brian Ichter
Danny Driess
Pete Florence
Dorsa Sadigh
Leonidas J. Guibas
Fei Xia
LRM
ReLM
28
194
0
22 Jan 2024
PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas Holdém via Large Language Model
Chenghao Huang
Yanbo Cao
Yinlong Wen
Tao Zhou
Yanru Zhang
OffRL
LLMAG
27
5
0
04 Jan 2024
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Ying Wang
Yanlai Yang
Mengye Ren
24
15
0
07 Dec 2023
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
Aditya Chinchure
Pushkar Shukla
Gaurav Bhatt
Kiri Salij
K. Hosanagar
Leonid Sigal
Matthew A. Turk
15
22
0
03 Dec 2023
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
Zhihe Lu
Jiawang Bai
Xin Li
Zeyu Xiao
Xinchao Wang
VLM
31
11
0
28 Nov 2023
VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation
Zijian Zhou
Miaojing Shi
Holger Caesar
VLM
19
12
0
27 Nov 2023
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Yilun Kong
Jingqing Ruan
Yihong Chen
Bin Zhang
Tianpeng Bao
...
Xiaoru Hu
Hangyu Mao
Ziyue Li
Xingyu Zeng
Rui Zhao
LLMAG
18
37
0
19 Nov 2023
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Jingkang Yang
Yuhao Dong
Shuai Liu
Bo-wen Li
Ziyue Wang
...
Haoran Tan
Jiamu Kang
Yuanhan Zhang
Kaiyang Zhou
Ziwei Liu
LM&Ro
31
45
0
12 Oct 2023
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
18
63
0
20 Sep 2023
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Chunyuan Li
Zhe Gan
Zhengyuan Yang
Jianwei Yang
Linjie Li
Lijuan Wang
Jianfeng Gao
MLLM
110
221
0
18 Sep 2023
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Huayang Li
Siheng Li
Deng Cai
Longyue Wang
Lemao Liu
Taro Watanabe
Yujiu Yang
Shuming Shi
MLLM
44
17
0
14 Sep 2023
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage
Jingqing Ruan
Yihong Chen
Bin Zhang
Zhiwei Xu
Tianpeng Bao
...
Shiwei Shi
Hangyu Mao
Ziyue Li
Xingyu Zeng
Rui Zhao
LLMAG
LM&Ro
36
31
0
07 Aug 2023
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Bo-wen Li
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
C. Li
Ziwei Liu
MLLM
VLM
24
224
0
08 Jun 2023
Otter: A Multi-Modal Model with In-Context Instruction Tuning
Bo-wen Li
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Jingkang Yang
Ziwei Liu
MLLM
21
497
0
05 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
147
323
0
18 Feb 2022
Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices
Mikolaj Malkiñski
Jacek Mañdziuk
107
41
0
28 Jan 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
166
401
0
10 Sep 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
245
671
0
06 Jan 2021
Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering
Fengbin Zhu
Wenqiang Lei
Chao Wang
Jianming Zheng
Soujanya Poria
Tat-Seng Chua
RALM
198
251
0
04 Jan 2021
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
241
1,898
0
31 Dec 2020
1
2
Next