Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.08128
Cited By
ViperGPT: Visual Inference via Python Execution for Reasoning
14 March 2023
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViperGPT: Visual Inference via Python Execution for Reasoning"
50 / 339 papers shown
Title
TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers
Aiyao He
Sijia Cui
Shuai Xu
Yanna Wang
Bo Xu
24
0
0
13 May 2025
Visually Interpretable Subtask Reasoning for Visual Question Answering
Yu Cheng
A. Goel
Hakan Bilen
LRM
21
0
0
12 May 2025
Neuro-Symbolic Concepts
Jiayuan Mao
Joshua B. Tenenbaum
Jiajun Wu
NAI
19
0
0
09 May 2025
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
Sameer Malik
Moyuru Yamada
Ayush Singh
Dishank Aggarwal
56
0
0
06 May 2025
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Pengxiang Li
Zhi Gao
Bofei Zhang
Yapeng Mi
Xiaojian Ma
...
Tao Yuan
Yuwei Wu
Yunde Jia
Song-Chun Zhu
Qing Li
LLMAG
70
0
0
30 Apr 2025
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Phillip Y. Lee
Jihyeon Je
Chanho Park
Mikaela Angelina Uy
Leonidas J. Guibas
Minhyuk Sung
LRM
41
0
0
24 Apr 2025
MR. Video: "MapReduce" is the Principle for Long Video Understanding
Ziqi Pang
Yu-xiong Wang
VLM
34
0
0
22 Apr 2025
Manipulating Multimodal Agents via Cross-Modal Prompt Injection
Le Wang
Zonghao Ying
Tianyuan Zhang
Siyuan Liang
Shengshan Hu
Mingchuan Zhang
A. Liu
Xianglong Liu
AAML
31
1
0
19 Apr 2025
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding
Yuyang Ji
Haohan Wang
LRM
31
0
0
14 Apr 2025
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao
Xuqi Liu
Zhongqi Yue
Y. Wu
Shuang Chen
Juncheng Billy Li
Siliang Tang
Fei Wu
Tat-Seng Chua
Yueting Zhuang
OffRL
LRM
39
1
0
09 Apr 2025
Resource-efficient Inference with Foundation Model Programs
Lunyiu Nie
Zhimin Ding
Kevin Yu
Marco Cheung
C. Jermaine
S. Chaudhuri
24
0
0
09 Apr 2025
VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT
Zhuo Zhi
Qiangqiang Wu
Minghe shen
W. J. Li
Yinchuan Li
Kun Shao
Kaiwen Zhou
LLMAG
33
0
0
06 Apr 2025
Attention-Aware Multi-View Pedestrian Tracking
Reef Alturki
Adrian Hilton
Jean-Yves Guillemaut
31
0
0
03 Apr 2025
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Zhenyi Liao
Qingsong Xie
Yanhao Zhang
Zijian Kong
Haonan Lu
Zhenyu Yang
Zhijie Deng
ReLM
VLM
LRM
99
0
1
01 Apr 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLM
OffRL
LRM
90
0
0
26 Mar 2025
CubeRobot: Grounding Language in Rubik's Cube Manipulation via Vision-Language Model
Feiyang Wang
Xiaomin Yu
Wangyu Wu
LM&Ro
60
0
0
25 Mar 2025
OmniNova:A General Multimodal Agent Framework
Pengfei Du
LLMAG
47
0
0
25 Mar 2025
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs
Carlos Plou
Cesar Borja
Ruben Martinez-Cantin
Ana C. Murillo
56
0
0
25 Mar 2025
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke
Vijay Kumar B G
Xingjian Leng
Zhixi Cai
Zaid Khan
Weiqing Wang
P. D. Haghighi
H. Rezatofighi
Manmohan Chandraker
42
0
0
25 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
54
0
0
19 Mar 2025
PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play
Wei Fang
Yang Zhang
Kaizhi Qian
James R. Glass
Yada Zhu
LLMAG
64
0
0
18 Mar 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
58
0
0
18 Mar 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Y. Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&Ro
LRM
73
0
0
17 Mar 2025
Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
Haoxuan Li
Sixu Yan
Y. Li
Xinggang Wang
LM&Ro
59
0
0
13 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
51
0
0
12 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
45
0
0
12 Mar 2025
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Shiu-hong Kao
Yu-Wing Tai
Chi-Keung Tang
LRM
MLLM
49
0
0
10 Mar 2025
Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning
Yanjun Chen
Yirong Sun
Xinghao Chen
Jian Wang
Xiaoyu Shen
W. Li
Wei Zhang
3DV
LRM
59
1
0
08 Mar 2025
Program Synthesis Dialog Agents for Interactive Decision-Making
Matthew Toles
Nikhil Balwani
Rattandeep Singh
Valentina Giulia Sartori Rodriguez
Zhou Yu
60
0
0
26 Feb 2025
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
LRM
39
5
0
24 Feb 2025
WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents
Xinhang Liu
Chi-Keung Tang
Yu-Wing Tai
VGen
61
0
0
21 Feb 2025
MoVer: Motion Verification for Motion Graphics Animations
Jiaju Ma
Maneesh Agrawala
VGen
51
0
0
20 Feb 2025
Predicate Hierarchies Improve Few-Shot State Classification
Emily Jin
Joy Hsu
Jiajun Wu
OffRL
67
0
0
18 Feb 2025
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
Zeqing Wang
Wentao Wan
Qiqing Lao
Runmeng Chen
Minjie Lang
Keze Wang
Liang Lin
Liang Lin
LRM
96
3
0
17 Feb 2025
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
Utkarsh Mall
Cheng Perng Phoo
Mia Chiquier
Bharath Hariharan
Kavita Bala
Carl Vondrick
69
1
0
17 Feb 2025
Visual Graph Question Answering with ASP and LLMs for Language Parsing
Jakob Johannes Bauer
Thomas Eiter
Nelson Higuera Ruiz
J. Oetsch
GNN
59
0
0
13 Feb 2025
ENTER: Event Based Interpretable Reasoning for VideoQA
Hammad A. Ayyubi
Junzhang Liu
Ali Asgarov
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
...
Md. Atabuzzaman
Xudong Lin
Naveen Reddy Dyava
Shih-Fu Chang
Chris Thomas
NAI
55
2
0
24 Jan 2025
Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering
Qian Tao
Xiaoyang Fan
Yong Xu
Xingquan Zhu
Yufei Tang
43
0
0
22 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
83
10
0
06 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
91
45
0
03 Jan 2025
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Jihan Yang
Shusheng Yang
Anjali W. Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
LRM
119
50
0
18 Dec 2024
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Dimitrios Mallis
Ahmet Serdar Karadeniz
Sebastian Cavada
Danila Rukhovich
Niki Maria Foteinopoulou
K. Cherenkova
Anis Kacem
Djamila Aouada
73
1
0
18 Dec 2024
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
Tobias Braun
Mark Rothermel
Marcus Rohrbach
Anna Rohrbach
81
1
0
13 Dec 2024
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip H. S. Torr
VLM
ObjD
117
0
0
12 Dec 2024
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
Jiali Chen
Xusen Hei
Yuqi Xue
Yuancheng Wei
Jiayuan Xie
Yi Cai
Qing Li
MLLM
LRM
63
4
0
08 Dec 2024
TANGO: Training-free Embodied AI Agents for Open-world Tasks
Filippo Ziliotto
Tommaso Campari
Luciano Serafini
Lamberto Ballan
LLMAG
LM&Ro
MLLM
LRM
67
0
0
05 Dec 2024
LossAgent: Towards Any Optimization Objectives for Image Processing with LLM Agents
Bingchen Li
Xin Li
Yiting Lu
Zhibo Chen
78
1
0
05 Dec 2024
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
Alice Heiman
Xiaoman Zhang
E. Chen
Sung Eun Kim
Pranav Rajpurkar
HILM
MedIm
72
0
0
27 Nov 2024
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation
Harsh Singh
Rocktim Jyoti Das
Mingfei Han
Preslav Nakov
Ivan Laptev
LM&Ro
LLMAG
67
2
0
26 Nov 2024
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu
J. Wang
Yuan Meng
Yanning Zhang
Le Sun
Zhi Wang
105
0
0
25 Nov 2024
1
2
3
4
5
6
7
Next