ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11559
  4. Cited By
Visual Programming: Compositional visual reasoning without training

Visual Programming: Compositional visual reasoning without training

18 November 2022
Tanmay Gupta
Aniruddha Kembhavi
    ReLM
    VLM
    LRM
ArXivPDFHTML

Papers citing "Visual Programming: Compositional visual reasoning without training"

50 / 309 papers shown
Title
Recursive Visual Programming
Recursive Visual Programming
Jiaxin Ge
Sanjay Subramanian
Baifeng Shi
Roei Herzig
Trevor Darrell
27
4
0
04 Dec 2023
Zero-Shot Video Question Answering with Procedural Programs
Zero-Shot Video Question Answering with Procedural Programs
Rohan Choudhury
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
19
21
0
01 Dec 2023
VIoTGPT: Learning to Schedule Vision Tools towards Intelligent Video
  Internet of Things
VIoTGPT: Learning to Schedule Vision Tools towards Intelligent Video Internet of Things
Yaoyao Zhong
Mengshi Qi
Rui Wang
Yuhan Qiu
Yang Zhang
Huadong Ma
13
2
0
01 Dec 2023
GELDA: A generative language annotation framework to reveal visual
  biases in datasets
GELDA: A generative language annotation framework to reveal visual biases in datasets
Krish Kabra
Kathleen M. Lewis
Guha Balakrishnan
VLM
11
1
0
29 Nov 2023
Leveraging VLM-Based Pipelines to Annotate 3D Objects
Leveraging VLM-Based Pipelines to Annotate 3D Objects
Rishabh Kabra
Loic Matthey
Alexander Lerchner
Niloy J. Mitra
8
6
0
29 Nov 2023
Zero-shot Referring Expression Comprehension via Structural Similarity
  Between Images and Captions
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Zeyu Han
Fangrui Zhu
Qianru Lao
Huaizu Jiang
ObjD
16
11
0
28 Nov 2023
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Chancharik Mitra
Brandon Huang
Trevor Darrell
Roei Herzig
MLLM
LRM
26
80
0
27 Nov 2023
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan
Jinke Ren
Chun-Mei Feng
Hengshuang Zhao
Shuguang Cui
Zhen Li
19
26
0
26 Nov 2023
Large Language Models as Automated Aligners for benchmarking
  Vision-Language Models
Large Language Models as Automated Aligners for benchmarking Vision-Language Models
Yuanfeng Ji
Chongjian Ge
Weikai Kong
Enze Xie
Zhengying Liu
Zhengguo Li
Ping Luo
MLLM
ELM
25
7
0
24 Nov 2023
Vamos: Versatile Action Models for Video Understanding
Vamos: Versatile Action Models for Video Understanding
Shijie Wang
Qi Zhao
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
27
19
0
22 Nov 2023
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided
  Code-Vision Representation
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Yangyi Chen
Xingyao Wang
Manling Li
Derek Hoiem
Heng Ji
25
10
0
22 Nov 2023
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text
  Recognizer
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Zhen Zhao
Jingqun Tang
Chunhui Lin
Binghong Wu
Can Huang
Hao Liu
Xin Tan
Zhizhong Zhang
Yuan Xie
21
18
0
22 Nov 2023
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback
Minghe Gao
Juncheng Li
Hao Fei
Liang Pang
Wei Ji
Guoming Wang
Wenqiao Zhang
Siliang Tang
Yueting Zhuang
16
8
0
21 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
36
248
0
21 Nov 2023
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language
  Model-based Agents in Real-world Systems
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Yilun Kong
Jingqing Ruan
Yihong Chen
Bin Zhang
Tianpeng Bao
...
Xiaoru Hu
Hangyu Mao
Ziyue Li
Xingyu Zeng
Rui Zhao
LLMAG
24
37
0
19 Nov 2023
SelfEval: Leveraging the discriminative nature of generative models for
  evaluation
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Sai Saketh Rambhatla
Ishan Misra
EGVM
25
4
0
17 Nov 2023
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone
  GUI Navigation
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
An Yan
Zhengyuan Yang
Wanrong Zhu
K. Lin
Linjie Li
...
Yiwu Zhong
Julian McAuley
Jianfeng Gao
Zicheng Liu
Lijuan Wang
LLMAG
LM&Ro
14
100
0
13 Nov 2023
Analyzing Modular Approaches for Visual Question Decomposition
Analyzing Modular Approaches for Visual Question Decomposition
Apoorv Khandelwal
Ellie Pavlick
Chen Sun
35
4
0
10 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
66
3
0
10 Nov 2023
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities
  for Image Classification
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification
Reza Esfandiarpoor
Stephen H. Bach
VLM
19
13
0
10 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
47
102
0
09 Nov 2023
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and
  reusing ModulEs
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Zhenfang Chen
Rui Sun
Wenjun Liu
Yining Hong
Chuang Gan
LRM
21
14
0
08 Nov 2023
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
Nicholas Walker
Stefan Ultes
Pierre Lison
LM&Ro
43
1
0
03 Nov 2023
Implicit Chain of Thought Reasoning via Knowledge Distillation
Implicit Chain of Thought Reasoning via Knowledge Distillation
Yuntian Deng
Kiran Prasad
Roland Fernandez
P. Smolensky
Vishrav Chaudhary
Stuart M. Shieber
ReLM
LRM
14
42
0
02 Nov 2023
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
  Generation and Editing
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Wei-Ge Chen
Irina Spiridonova
Jianwei Yang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
11
33
0
01 Nov 2023
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu
Zeqiang Lai
Zhangwei Gao
Erfei Cui
Ziheng Li
...
Lewei Lu
Qifeng Chen
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
126
30
0
26 Oct 2023
Symbolic Planning and Code Generation for Grounded Dialogue
Symbolic Planning and Code Generation for Grounded Dialogue
Justin T. Chiu
Wenting Zhao
Derek Chen
Saujas Vaduguru
Alexander M. Rush
Daniel Fried
LLMAG
8
7
0
26 Oct 2023
Graph Agent: Explicit Reasoning Agent for Graphs
Graph Agent: Explicit Reasoning Agent for Graphs
Qinyong Wang
Zhenxiang Gao
Rong Xu
AI4CE
21
5
0
25 Oct 2023
Woodpecker: Hallucination Correction for Multimodal Large Language
  Models
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Tong Bill Xu
Hao Wang
Dianbo Sui
Yunhang Shen
Ke Li
Xingguo Sun
Enhong Chen
VLM
MLLM
30
112
0
24 Oct 2023
WebWISE: Web Interface Control and Sequential Exploration with Large
  Language Models
WebWISE: Web Interface Control and Sequential Exploration with Large Language Models
Heyi Tao
TV Sethuraman
Michal Shlapentokh-Rothman
Derek Hoiem
LLMAG
48
4
0
24 Oct 2023
What's Left? Concept Grounding with Logic-Enhanced Foundation Models
What's Left? Concept Grounding with Logic-Enhanced Foundation Models
Joy Hsu
Jiayuan Mao
Joshua B. Tenenbaum
Jiajun Wu
VLM
ReLM
LRM
18
21
0
24 Oct 2023
Towards Perceiving Small Visual Details in Zero-shot Visual Question
  Answering with Multimodal LLMs
Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
13
2
0
24 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
34
46
0
23 Oct 2023
Branch-Solve-Merge Improves Large Language Model Evaluation and
  Generation
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Swarnadeep Saha
Omer Levy
Asli Celikyilmaz
Mohit Bansal
Jason Weston
Xian Li
MoMe
16
69
0
23 Oct 2023
API-Assisted Code Generation for Question Answering on Varied Table
  Structures
API-Assisted Code Generation for Question Answering on Varied Table Structures
Yihan Cao
Shuyi Chen
Ryan Liu
Zhiruo Wang
Daniel Fried
LMTD
14
10
0
23 Oct 2023
MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with
  Large Language Model
MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model
Le Zhang
Yihong Wu
Fengran Mo
Jian-Yun Nie
Aishwarya Agrawal
MLLM
RALM
27
6
0
20 Oct 2023
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in
  Open Worlds
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Sipeng Zheng
Jiazheng Liu
Yicheng Feng
Zongqing Lu
29
29
0
20 Oct 2023
Frozen Transformers in Language Models Are Effective Visual Encoder
  Layers
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
Ziqi Pang
Ziyang Xie
Yunze Man
Yu-xiong Wang
38
25
0
19 Oct 2023
Neurosymbolic Grounding for Compositional World Models
Neurosymbolic Grounding for Compositional World Models
Atharva Sehgal
Arya Grayeli
Jennifer J. Sun
Swarat Chaudhuri
14
5
0
19 Oct 2023
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Jingkang Yang
Yuhao Dong
Shuai Liu
Bo-wen Li
Ziyue Wang
...
Haoran Tan
Jiamu Kang
Yuanhan Zhang
Kaiyang Zhou
Ziwei Liu
LM&Ro
31
45
0
12 Oct 2023
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic
  Image Design and Generation
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
LRM
MLLM
DiffM
13
22
0
12 Oct 2023
Towards Robust Multi-Modal Reasoning via Model Selection
Towards Robust Multi-Modal Reasoning via Model Selection
Xiangyan Liu
Rongxue Li
Wei Ji
Tao Lin
LLMAG
LRM
22
3
0
12 Oct 2023
OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation
OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation
Jie An
Zhengyuan Yang
Linjie Li
Jianfeng Wang
K. Lin
Zicheng Liu
Lijuan Wang
Jiebo Luo
14
11
0
11 Oct 2023
Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained
  Decoding
Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding
Kexun Zhang
Hongqiao Chen
Lei Li
W. Wang
35
4
0
10 Oct 2023
Large Language Models can Learn Rules
Large Language Models can Learn Rules
Zhaocheng Zhu
Yuan Xue
Xinyun Chen
Denny Zhou
Jian Tang
Dale Schuurmans
Hanjun Dai
LRM
ReLM
11
62
0
10 Oct 2023
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of
  Multi-modal Language Models
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang
Xiaotong Zhai
Zhongkai Zhao
Yongshuo Zong
Xin Wen
Bingchen Zhao
LRM
11
0
0
10 Oct 2023
ViCor: Bridging Visual Understanding and Commonsense Reasoning with
  Large Language Models
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
KAI-QING Zhou
Kwonjoon Lee
Teruhisa Misu
Xin Eric Wang
LRM
19
3
0
09 Oct 2023
InstructDET: Diversifying Referring Object Detection with Generalized
  Instructions
InstructDET: Diversifying Referring Object Detection with Generalized Instructions
Ronghao Dang
Jiangyan Feng
Haodong Zhang
Chongjian Ge
Lin Song
...
Chengju Liu
Qi Chen
Feng Zhu
Rui Zhao
Yibing Song
ObjD
8
11
0
08 Oct 2023
Expedited Training of Visual Conditioned Language Generation via
  Redundancy Reduction
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
15
7
0
05 Oct 2023
Large Language Model Cascades with Mixture of Thoughts Representations
  for Cost-efficient Reasoning
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Murong Yue
Jie Zhao
Min Zhang
Liang Du
Ziyu Yao
LRM
22
54
0
04 Oct 2023
Previous
1234567
Next