ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.08128
  4. Cited By
ViperGPT: Visual Inference via Python Execution for Reasoning

ViperGPT: Visual Inference via Python Execution for Reasoning

14 March 2023
Dídac Surís
Sachit Menon
Carl Vondrick
    MLLM
    LRM
    ReLM
ArXivPDFHTML

Papers citing "ViperGPT: Visual Inference via Python Execution for Reasoning"

50 / 339 papers shown
Title
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Fucai Ke
Zhixi Cai
Simindokht Jahangard
Weiqing Wang
P. D. Haghighi
Hamid Rezatofighi
LRM
38
8
0
19 Mar 2024
What Are Tools Anyway? A Survey from the Language Model Perspective
What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang
Zhoujun Cheng
Hao Zhu
Daniel Fried
Graham Neubig
60
26
0
18 Mar 2024
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Yue Fan
Xiaojian Ma
Rujie Wu
Yuntao Du
Jiaqi Li
Zhi Gao
Qing Li
VLM
LLMAG
46
55
0
18 Mar 2024
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma
Weikai Huang
Jieyu Zhang
Tanmay Gupta
Ranjay Krishna
55
18
0
17 Mar 2024
SelfIE: Self-Interpretation of Large Language Model Embeddings
SelfIE: Self-Interpretation of Large Language Model Embeddings
Haozhe Chen
Carl Vondrick
Chengzhi Mao
19
17
0
16 Mar 2024
VideoAgent: Long-form Video Understanding with Large Language Model as
  Agent
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Xiaohan Wang
Yuhui Zhang
Orr Zohar
Serena Yeung-Levy
VLM
108
83
0
15 Mar 2024
Autonomous Monitoring of Pharmaceutical R&D Laboratories with 6 Axis Arm
  Equipped Quadruped Robot and Generative AI: A Preliminary Study
Autonomous Monitoring of Pharmaceutical R&D Laboratories with 6 Axis Arm Equipped Quadruped Robot and Generative AI: A Preliminary Study
Shunichi Hato
Nozomi Ogawa
26
1
0
15 Mar 2024
USimAgent: Large Language Models for Simulating Search Users
USimAgent: Large Language Models for Simulating Search Users
Erhan Zhang
Xingzhu Wang
Peiyuan Gong
Yankai Lin
Jiaxin Mao
LLMAG
35
14
0
14 Mar 2024
AesopAgent: Agent-driven Evolutionary System on Story-to-Video
  Production
AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
Jiuniu Wang
Zehua Du
Yuyuan Zhao
Bo Yuan
Kexiang Wang
...
Yihen Lu
Gengliang Li
Junlong Gao
Xin Tu
Zhenyu Guo
LLMAG
VGen
28
7
0
12 Mar 2024
Materials science in the era of large language models: a perspective
Materials science in the era of large language models: a perspective
Ge Lei
Ronan Docherty
Samuel J. Cooper
38
17
0
11 Mar 2024
Modeling Collaborator: Enabling Subjective Vision Classification With
  Minimal Human Effort via LLM Tool-Use
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
Imad Eddine Toubal
Aditya Avinash
N. Alldrin
Jan Dlabal
Wenlei Zhou
...
Chun-Ta Lu
Howard Zhou
Ranjay Krishna
Ariel Fuxman
Tom Duerig
VLM
73
7
0
05 Mar 2024
What Is Missing in Multilingual Visual Reasoning and How to Fix It
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLM
LRM
82
6
0
03 Mar 2024
TempCompass: Do Video LLMs Really Understand Videos?
TempCompass: Do Video LLMs Really Understand Videos?
Yuanxin Liu
Shicheng Li
Yi Liu
Yuxiang Wang
Shuhuai Ren
Lei Li
Sishuo Chen
Xu Sun
Lu Hou
VLM
41
98
0
01 Mar 2024
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi
Runpei Dong
Shaochen Zhang
Haoran Geng
Chunrui Han
Zheng Ge
Li Yi
Kaisheng Ma
39
49
0
27 Feb 2024
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist
  Autonomous Agents for Desktop and Web
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor
Y. Butala
M. Russak
Jing Yu Koh
Kiran Kamble
Waseem Alshikh
Ruslan Salakhutdinov
LLMAG
51
44
0
27 Feb 2024
VCD: Knowledge Base Guided Visual Commonsense Discovery in Images
VCD: Knowledge Base Guided Visual Commonsense Discovery in Images
Xiangqing Shen
Yurun Song
Siwei Wu
Rui Xia
33
6
0
27 Feb 2024
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form
  Video-Text Understanding
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding
Yuxuan Wang
Yueqian Wang
Pengfei Wu
Jianxin Liang
Dongyan Zhao
Zilong Zheng
VLM
21
9
0
25 Feb 2024
Selective "Selective Prediction": Reducing Unnecessary Abstention in
  Vision-Language Reasoning
Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
Tejas Srinivasan
Jack Hessel
Tanmay Gupta
Bill Yuchen Lin
Yejin Choi
Jesse Thomason
Khyathi Raghavi Chandu
19
6
0
23 Feb 2024
CI w/o TN: Context Injection without Task Name for Procedure Planning
CI w/o TN: Context Injection without Task Name for Procedure Planning
Xinjie Li
29
0
0
23 Feb 2024
Large Multimodal Agents: A Survey
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
37
38
0
23 Feb 2024
Uncertainty-Aware Evaluation for Vision-Language Models
Uncertainty-Aware Evaluation for Vision-Language Models
Vasily Kostumov
Bulat Nutfullin
Oleg Pilipenko
Eugene Ilyushin
ELM
40
7
0
22 Feb 2024
DeiSAM: Segment Anything with Deictic Prompting
DeiSAM: Segment Anything with Deictic Prompting
Hikaru Shindo
Manuel Brack
Gopika Sudhakaran
D. Dhami
P. Schramowski
Kristian Kersting
VLM
24
2
0
21 Feb 2024
EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries
EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries
Jiateng Liu
Pengfei Yu
Yuji Zhang
Sha Li
Zixuan Zhang
Heng Ji
KELM
19
16
0
17 Feb 2024
Question-Instructed Visual Descriptions for Zero-Shot Video Question
  Answering
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
David Romero
Thamar Solorio
98
4
0
16 Feb 2024
Using Left and Right Brains Together: Towards Vision and Language
  Planning
Using Left and Right Brains Together: Towards Vision and Language Planning
Jun Cen
Chenfei Wu
Xiao Liu
Sheng-Siang Yin
Yixuan Pei
Jinglong Yang
Qifeng Chen
Nan Duan
Jianguo Zhang
48
3
0
16 Feb 2024
AgentLens: Visual Analysis for Agent Behaviors in LLM-based Autonomous
  Systems
AgentLens: Visual Analysis for Agent Behaviors in LLM-based Autonomous Systems
Jiaying Lu
Bo Pan
Jieyi Chen
Yingchaojie Feng
Jingyuan Hu
Yuchen Peng
Wei Chen
34
13
0
14 Feb 2024
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned
  Language Models
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Siddharth Karamcheti
Suraj Nair
Ashwin Balakrishna
Percy Liang
Thomas Kollar
Dorsa Sadigh
MLLM
VLM
57
95
0
12 Feb 2024
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
Zhicheng Zheng
Xin Yan
Zhenfang Chen
Jingzhou Wang
Qin Zhi Eddie Lim
Joshua B. Tenenbaum
Chuang Gan
LRM
27
6
0
09 Feb 2024
Open-Universe Indoor Scene Generation using LLM Program Synthesis and
  Uncurated Object Databases
Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases
Rio Aguina-Kang
Maxim Gumin
Do Heon Han
Stewart Morris
Seung Jean Yoo
Aditya Ganeshan
R. K. Jones
Qiuhong Anna Wei
Kailiang Fu
Daniel E. Ritchie
3DV
37
24
0
05 Feb 2024
The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning
The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning
Daniel Cunnington
Mark Law
Jorge Lobo
Alessandra Russo
NAI
27
7
0
02 Feb 2024
InferCept: Efficient Intercept Support for Augmented Large Language
  Model Inference
InferCept: Efficient Intercept Support for Augmented Large Language Model Inference
Reyna Abhyankar
Zijian He
Vikranth Srivatsa
Hao Zhang
Yiying Zhang
RALM
29
11
0
02 Feb 2024
Executable Code Actions Elicit Better LLM Agents
Executable Code Actions Elicit Better LLM Agents
Xingyao Wang
Yangyi Chen
Lifan Yuan
Yizhe Zhang
Yunzhu Li
Hao Peng
Heng Ji
ELM
LLMAG
LM&Ro
26
127
0
01 Feb 2024
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Elias Stengel-Eskin
Archiki Prasad
Mohit Bansal
18
13
0
29 Jan 2024
A Survey on Visual Anomaly Detection: Challenge, Approach, and Prospect
A Survey on Visual Anomaly Detection: Challenge, Approach, and Prospect
Yunkang Cao
Xiaohao Xu
Jiangning Zhang
Yuqi Cheng
Xiaonan Huang
Guansong Pang
Weiming Shen
81
41
0
29 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
37
173
0
24 Jan 2024
ChatterBox: Multi-round Multimodal Referring and Grounding
ChatterBox: Multi-round Multimodal Referring and Grounding
Yunjie Tian
Tianren Ma
Lingxi Xie
Jihao Qiu
Xi Tang
Yuan Zhang
Jianbin Jiao
Qi Tian
Qixiang Ye
18
15
0
24 Jan 2024
TroVE: Inducing Verifiable and Efficient Toolboxes for Solving
  Programmatic Tasks
TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks
Zhiruo Wang
Daniel Fried
Graham Neubig
17
18
0
23 Jan 2024
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
  Generating with Multimodal LLMs
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Ling Yang
Zhaochen Yu
Chenlin Meng
Minkai Xu
Stefano Ermon
Bin Cui
CoGe
DiffM
27
113
0
22 Jan 2024
Prompting Large Vision-Language Models for Compositional Reasoning
Prompting Large Vision-Language Models for Compositional Reasoning
Timothy Ossowski
Ming Jiang
Junjie Hu
CoGe
VLM
LRM
38
3
0
20 Jan 2024
PhotoScout: Synthesis-Powered Multi-Modal Image Search
PhotoScout: Synthesis-Powered Multi-Modal Image Search
Celeste Barnaby
Qiaochu Chen
Chenglong Wang
Işıl Dillig
24
2
0
19 Jan 2024
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
Chenyu Wang
Weixin Luo
Qianyu Chen
Haonan Mai
Jindi Guo
Sixun Dong
Xiaohua Xuan
MLLM
LLMAG
41
19
0
19 Jan 2024
LangProp: A code optimization framework using Large Language Models
  applied to driving
LangProp: A code optimization framework using Large Language Models applied to driving
Shu Ishida
Gianluca Corrado
George Fedoseev
Hudson Yeo
Lloyd Russell
Jamie Shotton
João F. Henriques
Anthony Hu
34
11
0
18 Jan 2024
DiffusionGPT: LLM-Driven Text-to-Image Generation System
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Jie Qin
Jie Wu
Weifeng Chen
Yuxi Ren
Huixian Li
Hefeng Wu
Xuefeng Xiao
Rui Wang
S. Wen
DiffM
50
22
0
18 Jan 2024
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and
  Visual Question Generation
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation
Kohei Uehara
Nabarun Goswami
Hanqin Wang
Toshiaki Baba
Kohtaro Tanaka
...
Takagi Naoya
Ryo Umagami
Yingyi Wen
Tanachai Anakewat
Tatsuya Harada
LRM
21
2
0
18 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
45
35
0
16 Jan 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
ModaVerse: Efficiently Transforming Modalities with LLMs
Xinyu Wang
Bohan Zhuang
Qi Wu
11
11
0
12 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as
  Programmers
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRM
VLM
23
9
0
03 Jan 2024
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
  Empowers Large Language Models to Serve as Intelligent Agents
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents
Ke Yang
Jiateng Liu
John Wu
Chaoqi Yang
Yi Ren Fung
...
Xu Cao
Xingyao Wang
Yiquan Wang
Heng Ji
Chengxiang Zhai
LLMAG
ELM
18
71
0
01 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
47
81
0
29 Dec 2023
A Simple LLM Framework for Long-Range Video Question-Answering
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
100
80
0
28 Dec 2023
Previous
1234567
Next