Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.08128
Cited By
ViperGPT: Visual Inference via Python Execution for Reasoning
14 March 2023
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViperGPT: Visual Inference via Python Execution for Reasoning"
50 / 339 papers shown
Title
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Yongdong Luo
Xiawu Zheng
Xiao Yang
Guilin Li
Haojia Lin
Jinfa Huang
Jiayi Ji
Fei Chao
Jiebo Luo
Rongrong Ji
VLM
79
17
0
20 Nov 2024
Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms
Minghe Gao
Wendong Bu
Bingchen Miao
Yang Wu
Yunfei Li
Juncheng Billy Li
Siliang Tang
Qi Wu
Yueting Zhuang
Meng Wang
LM&Ro
33
3
0
17 Nov 2024
VeriGraph: Scene Graphs for Execution Verifiable Robot Planning
Daniel Ekpo
Mara Levy
Saksham Suri
Chuong Huynh
Abhinav Shrivastava
39
2
0
15 Nov 2024
Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning
Jingru Yang
Huan Yu
Yang Jingxin
C. Xu
Yin Biao
Yu Sun
Shengfeng He
21
0
0
15 Nov 2024
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
Yichao Liang
Nishanth Kumar
Hao Tang
Adrian Weller
J. Tenenbaum
Tom Silver
Joao Henriques
Kevin Ellis
38
8
0
30 Oct 2024
Natural Language Inference Improves Compositionality in Vision-Language Models
Paola Cascante-Bonilla
Yu Hou
Yang Trista Cao
Hal Daumé III
Rachel Rudinger
ReLM
CoGe
VLM
39
3
0
29 Oct 2024
SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
Wanhua Li
Zibin Meng
Jiawei Zhou
D. Wei
Chuang Gan
Hanspeter Pfister
LRM
VLM
22
5
0
28 Oct 2024
DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Manan Suri
Puneet Mathur
Franck Dernoncourt
R. Jain
Vlad I. Morariu
Ramit Sawhney
Preslav Nakov
Dinesh Manocha
22
1
0
21 Oct 2024
NetSafe: Exploring the Topological Safety of Multi-agent Networks
Miao Yu
Shilong Wang
Guibin Zhang
Junyuan Mao
Chenlong Yin
Qijiong Liu
Qingsong Wen
Kun Wang
Yang Wang
29
5
0
21 Oct 2024
GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models
Aditya Sharma
Aman Dalmia
Mehran Kazemi
Amal Zouaq
Christopher J. Pal
LRM
26
0
0
17 Oct 2024
SPIN: Self-Supervised Prompt INjection
Leon Zhou
Junfeng Yang
Chengzhi Mao
AAML
SILM
23
0
0
17 Oct 2024
Trust but Verify: Programmatic VLM Evaluation in the Wild
Viraj Prabhu
Senthil Purushwalkam
An Yan
Caiming Xiong
R. Xu
MLLM
26
0
0
17 Oct 2024
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Fengji Zhang
Linquan Wu
Huiyu Bai
Guancheng Lin
Xiao Li
Xiao Yu
Yue Wang
Bei Chen
Jacky Keung
MLLM
ELM
LRM
32
0
0
16 Oct 2024
BlendRL: A Framework for Merging Symbolic and Neural Policy Learning
Hikaru Shindo
Quentin Delfosse
D. Dhami
Kristian Kersting
33
3
0
15 Oct 2024
Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement
J. Shtok
Amit Alfassy
Foad Abo Dahood
Eliyahu Schwartz
Sivan Doveh
Assaf Arbelle
LRM
ReLM
25
0
0
14 Oct 2024
Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets
Thomas Eiter
Jan Hadl
N. Higuera
J. Oetsch
16
0
0
12 Oct 2024
VoxelPrompt: A Vision-Language Agent for Grounded Medical Image Analysis
Andrew Hoopes
V. Butoi
John Guttag
Adrian V. Dalca
MedIm
LM&MA
35
1
0
10 Oct 2024
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Ji-Rong Wen
58
9
0
10 Oct 2024
Grounding Language in Multi-Perspective Referential Communication
Zineng Tang
Lingjun Mao
Alane Suhr
19
2
0
04 Oct 2024
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
Rinon Gal
Adi Haviv
Yuval Alaluf
Amit H. Bermano
Daniel Cohen-Or
Gal Chechik
DiffM
24
3
0
02 Oct 2024
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning
Niki Maria Foteinopoulou
Enjie Ghorbel
Djamila Aouada
16
2
0
01 Oct 2024
DARE: Diverse Visual Question Answering with Robustness Evaluation
Hannah Sterz
Jonas Pfeiffer
Ivan Vulić
OOD
VLM
16
2
0
26 Sep 2024
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely
Siyun Zhao
Yuqing Yang
Zilong Wang
Zhiyuan He
Luna Qiu
Lili Qiu
SyDa
RALM
3DV
32
31
0
23 Sep 2024
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
A. Mavrogiannis
Dehao Yuan
Yiannis Aloimonos
LM&Ro
27
0
0
23 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
34
1
0
19 Sep 2024
NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
Zhixi Cai
Cristian Rojas Cardenas
Kevin Leo
Chenyuan Zhang
Kal Backman
...
Yuan-Fang Li
Mor Vered
Peter James Stuckey
M. G. D. L. Banda
Hamid Rezatofighi
29
5
0
16 Sep 2024
Symbolic Regression with a Learned Concept Library
Arya Grayeli
Atharva Sehgal
Omar Costilla-Reyes
Miles Cranmer
Swarat Chaudhuri
56
9
0
14 Sep 2024
What Makes a Maze Look Like a Maze?
Joy Hsu
Jiayuan Mao
J. Tenenbaum
Noah D. Goodman
Jiajun Wu
OCL
52
6
0
12 Sep 2024
Question-Answering Dense Video Events
Hangyu Qin
Junbin Xiao
Angela Yao
VLM
71
1
0
06 Sep 2024
GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models
Moreno DÍncà
E. Peruzzo
Massimiliano Mancini
Xingqian Xu
Humphrey Shi
N. Sebe
39
0
0
29 Aug 2024
Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models
Yuzhou Huang
Yiran Qin
Shunlin Lu
Xintao Wang
Rui Huang
Ying Shan
Ruimao Zhang
VGen
32
1
0
21 Aug 2024
Visual Agents as Fast and Slow Thinkers
Guangyan Sun
Mingyu Jin
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAG
LRM
77
12
0
16 Aug 2024
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang
Jiequan Cui
Miaoge Li
Wang Lin
Bo Chen
Hanwang Zhang
MLLM
31
3
0
09 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
23
10
0
08 Aug 2024
AppAgent v2: Advanced Agent for Flexible Mobile Interactions
Yanda Li
Chi Zhang
Wanqi Yang
Bin-Bin Fu
Pei Cheng
Xin Chen
Ling Chen
Yunchao Wei
LLMAG
LM&Ro
31
9
0
05 Aug 2024
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Y. Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
32
2
0
05 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
41
5
0
31 Jul 2024
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering
Ruoyue Shen
Nakamasa Inoue
Koichi Shinoda
23
1
0
30 Jul 2024
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
Mingyu Zhang
Jiting Cai
Mingyu Liu
Yue Xu
Cewu Lu
Yong-Lu Li
LRM
31
5
0
29 Jul 2024
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering
Mahiro Ukai
Shuhei Kurita
Atsushi Hashimoto
Yoshitaka Ushiku
Nakamasa Inoue
18
0
0
28 Jul 2024
VACoDe: Visual Augmented Contrastive Decoding
Sihyeon Kim
Boryeong Cho
Sangmin Bae
Sumyeong Ahn
SeYoung Yun
29
3
0
26 Jul 2024
MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery
Pei Zhou
Yanchao Yang
27
1
0
21 Jul 2024
KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models
Kemou Jiang
Xuan Cai
Zhiyong Cui
Aoyong Li
Yilong Ren
Haiyang Yu
Hao Yang
Daocheng Fu
Licheng Wen
Pinlong Cai
LLMAG
38
7
0
19 Jul 2024
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
Chun-Yi Kuan
Chih-Kai Yang
Wei-Ping Huang
Ke-Han Lu
Hung-yi Lee
39
5
0
13 Jul 2024
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct
Yutong Wu
Di Huang
Wenxuan Shi
Wei Wang
Lingzhe Gao
...
Qi Guo
Yewen Pu
Dawei Yin
Xing Hu
Yunji Chen
SyDa
18
1
0
08 Jul 2024
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification
Pritish Sahu
Karan Sikka
Ajay Divakaran
MLLM
LRM
62
4
0
02 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Raghavi Chandu
Linjie Li
Anas Awadalla
Ximing Lu
Jae Sung Park
Jack Hessel
Lijuan Wang
Yejin Choi
36
2
0
02 Jul 2024
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
Chuanqi Cheng
Jian-Yu Guan
Wei Wu
Rui Yan
LRM
35
10
0
28 Jun 2024
UQE: A Query Engine for Unstructured Databases
Hanjun Dai
B. Wang
Xingchen Wan
Bo Dai
Sherry Yang
Azade Nova
Pengcheng Yin
P. Phothilimthana
Charles Sutton
Dale Schuurmans
44
3
0
23 Jun 2024
Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads
A. Cherian
Kuan-Chuan Peng
Suhas Lohit
Joanna Matthiesen
Kevin A. Smith
J. Tenenbaum
ELM
LRM
39
6
0
22 Jun 2024
Previous
1
2
3
4
5
6
7
Next