ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11559
  4. Cited By
Visual Programming: Compositional visual reasoning without training

Visual Programming: Compositional visual reasoning without training

18 November 2022
Tanmay Gupta
Aniruddha Kembhavi
    ReLM
    VLM
    LRM
ArXivPDFHTML

Papers citing "Visual Programming: Compositional visual reasoning without training"

50 / 309 papers shown
Title
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
Éloi Zablocki
Valentin Gerard
Amaia Cardiel
Eric Gaussier
Matthieu Cord
Eduardo Valle
69
0
0
23 Nov 2024
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Yongdong Luo
Xiawu Zheng
Xiao Yang
Guilin Li
Haojia Lin
Jinfa Huang
Jiayi Ji
Fei Chao
Jiebo Luo
Rongrong Ji
VLM
79
17
0
20 Nov 2024
Retinal Vessel Segmentation via Neuron Programming
Tingting Wu
Ruyi Min
Peixuan Song
Hengtao Guo
Tieyong Zeng
Feng-Lei Fan
26
0
0
17 Nov 2024
Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms
Minghe Gao
Wendong Bu
Bingchen Miao
Yang Wu
Yunfei Li
Juncheng Billy Li
Siliang Tang
Qi Wu
Yueting Zhuang
Meng Wang
LM&Ro
33
3
0
17 Nov 2024
TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake
  Detection in PRocedural EGOcentric Videos
TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos
Leonardo Plini
Luca Scofano
Edoardo De Matteis
Guido Maria DÁmely di Melendugno
Alessandro Flaborea
Andrea Sanchietti
G. Farinella
Fabio Galasso
Antonino Furnari
EgoV
LRM
43
1
0
04 Nov 2024
AutoVFX: Physically Realistic Video Editing from Natural Language
  Instructions
AutoVFX: Physically Realistic Video Editing from Natural Language Instructions
Hao-Yu Hsu
Zhi-Hao Lin
Albert Zhai
Hongchi Xia
Shenlong Wang
VGen
40
9
0
04 Nov 2024
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
Yichao Liang
Nishanth Kumar
Hao Tang
Adrian Weller
J. Tenenbaum
Tom Silver
Joao Henriques
Kevin Ellis
38
8
0
30 Oct 2024
Natural Language Inference Improves Compositionality in Vision-Language
  Models
Natural Language Inference Improves Compositionality in Vision-Language Models
Paola Cascante-Bonilla
Yu Hou
Yang Trista Cao
Hal Daumé III
Rachel Rudinger
ReLM
CoGe
VLM
39
3
0
29 Oct 2024
What Factors Affect Multi-Modal In-Context Learning? An In-Depth
  Exploration
What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration
L. Qin
Qiguang Chen
Hao Fei
Zhi Chen
Min Li
Wanxiang Che
34
5
0
27 Oct 2024
GRS: Generating Robotic Simulation Tasks from Real-World Images
GRS: Generating Robotic Simulation Tasks from Real-World Images
Alex Zook
Fan-Yun Sun
Josef Spjut
Valts Blukis
Stan Birchfield
Jonathan Tremblay
42
4
0
20 Oct 2024
GeoCoder: Solving Geometry Problems by Generating Modular Code through
  Vision-Language Models
GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models
Aditya Sharma
Aman Dalmia
Mehran Kazemi
Amal Zouaq
Christopher J. Pal
LRM
26
0
0
17 Oct 2024
Trust but Verify: Programmatic VLM Evaluation in the Wild
Trust but Verify: Programmatic VLM Evaluation in the Wild
Viraj Prabhu
Senthil Purushwalkam
An Yan
Caiming Xiong
R. Xu
MLLM
26
0
0
17 Oct 2024
Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and
  Refinement
Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement
J. Shtok
Amit Alfassy
Foad Abo Dahood
Eliyahu Schwartz
Sivan Doveh
Assaf Arbelle
LRM
ReLM
25
0
0
14 Oct 2024
VoxelPrompt: A Vision-Language Agent for Grounded Medical Image Analysis
VoxelPrompt: A Vision-Language Agent for Grounded Medical Image Analysis
Andrew Hoopes
V. Butoi
John Guttag
Adrian V. Dalca
MedIm
LM&MA
35
1
0
10 Oct 2024
GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language
  Models Through Traversing 2D Game Maps
GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps
Muhammad Umair Nasir
Steven D. James
Julian Togelius
ELM
LRM
19
1
0
10 Oct 2024
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan
Elias Stengel-Eskin
Jaemin Cho
Mohit Bansal
VGen
36
1
0
08 Oct 2024
Multi-Step Time Series Inference Agent for Reasoning and Automated Task Execution
Multi-Step Time Series Inference Agent for Reasoning and Automated Task Execution
Wen Ye
Yizhou Zhang
Wei Yang
Lumingyuan Tang
Defu Cao
Jie Cai
Yan Liu
BDL
CoGe
AI4TS
24
2
0
05 Oct 2024
Grounding Language in Multi-Perspective Referential Communication
Grounding Language in Multi-Perspective Referential Communication
Zineng Tang
Lingjun Mao
Alane Suhr
19
2
0
04 Oct 2024
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
Rinon Gal
Adi Haviv
Yuval Alaluf
Amit H. Bermano
Daniel Cohen-Or
Gal Chechik
DiffM
24
3
0
02 Oct 2024
A Survey on Complex Tasks for Goal-Directed Interactive Agents
A Survey on Complex Tasks for Goal-Directed Interactive Agents
Mareike Hartmann
Alexander Koller
LM&Ro
LLMAG
32
0
0
27 Sep 2024
Visual Data Diagnosis and Debiasing with Concept Graphs
Visual Data Diagnosis and Debiasing with Concept Graphs
Rwiddhi Chakraborty
Yinong Wang
Jialu Gao
Runkai Zheng
Cheng Zhang
Fernando De la Torre
18
2
0
26 Sep 2024
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and
  Interpretable Reasoning
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning
Debargha Ganguly
Srinivasan Iyengar
Vipin Chaudhary
Shivkumar Kalyanaraman
LRM
27
2
0
25 Sep 2024
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
A. Mavrogiannis
Dehao Yuan
Yiannis Aloimonos
LM&Ro
27
0
0
23 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
34
1
0
19 Sep 2024
NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous
  Perception, Reasoning, and Planning in Complex UAV Search Missions
NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
Zhixi Cai
Cristian Rojas Cardenas
Kevin Leo
Chenyuan Zhang
Kal Backman
...
Yuan-Fang Li
Mor Vered
Peter James Stuckey
M. G. D. L. Banda
Hamid Rezatofighi
29
5
0
16 Sep 2024
Symbolic Regression with a Learned Concept Library
Symbolic Regression with a Learned Concept Library
Arya Grayeli
Atharva Sehgal
Omar Costilla-Reyes
Miles Cranmer
Swarat Chaudhuri
56
9
0
14 Sep 2024
What Makes a Maze Look Like a Maze?
What Makes a Maze Look Like a Maze?
Joy Hsu
Jiayuan Mao
J. Tenenbaum
Noah D. Goodman
Jiajun Wu
OCL
52
6
0
12 Sep 2024
GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative
  Models
GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models
Moreno DÍncà
E. Peruzzo
Massimiliano Mancini
Xingqian Xu
Humphrey Shi
N. Sebe
39
0
0
29 Aug 2024
Story3D-Agent: Exploring 3D Storytelling Visualization with Large
  Language Models
Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models
Yuzhou Huang
Yiran Qin
Shunlin Lu
Xintao Wang
Rui Huang
Ying Shan
Ruimao Zhang
VGen
32
1
0
21 Aug 2024
A Training-Free Framework for Video License Plate Tracking and
  Recognition with Only One-Shot
A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot
Haoxuan Ding
Qi. Wang
Junyu Gao
Qiang Li
VLM
37
0
0
11 Aug 2024
Compromising Embodied Agents with Contextual Backdoor Attacks
Compromising Embodied Agents with Contextual Backdoor Attacks
Aishan Liu
Yuguang Zhou
Xianglong Liu
Tianyuan Zhang
Siyuan Liang
...
Tianlin Li
Junqi Zhang
Wenbo Zhou
Qing-Wu Guo
Dacheng Tao
LLMAG
AAML
34
7
0
06 Aug 2024
ExoViP: Step-by-step Verification and Exploration with Exoskeleton
  Modules for Compositional Visual Reasoning
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Y. Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
32
2
0
05 Aug 2024
Pyramid Coder: Hierarchical Code Generator for Compositional Visual
  Question Answering
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering
Ruoyue Shen
Nakamasa Inoue
Koichi Shinoda
23
1
0
30 Jul 2024
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
Mingyu Zhang
Jiting Cai
Mingyu Liu
Yue Xu
Cewu Lu
Yong-Lu Li
LRM
31
5
0
29 Jul 2024
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question
  Answering
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering
Mahiro Ukai
Shuhei Kurita
Atsushi Hashimoto
Yoshitaka Ushiku
Nakamasa Inoue
18
0
0
28 Jul 2024
Multi-Modality Co-Learning for Efficient Skeleton-based Action
  Recognition
Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition
Jinfu Liu
C. L. P. Chen
Mengyuan Liu
47
11
0
22 Jul 2024
MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept
  Discovery
MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery
Pei Zhou
Yanchao Yang
27
1
0
21 Jul 2024
On the Design and Analysis of LLM-Based Algorithms
On the Design and Analysis of LLM-Based Algorithms
Yanxi Chen
Yaliang Li
Bolin Ding
Jingren Zhou
41
4
0
20 Jul 2024
Rethinking Video-Text Understanding: Retrieval from Counterfactually
  Augmented Data
Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
Wufei Ma
Kai Li
Zhongshi Jiang
Moustafa Meshry
Qihao Liu
Huiyu Wang
Christian Hane
Alan L. Yuille
VGen
22
1
0
18 Jul 2024
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data
  via Visual Prompting
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
Hyungjun Yoon
Biniyam Aschalew Tolera
Taesik Gong
Kimin Lee
Sung-Ju Lee
28
6
0
15 Jul 2024
Constructing Concept-based Models to Mitigate Spurious Correlations with
  Minimal Human Effort
Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim
Ze Wang
Qiang Qiu
38
1
0
12 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey
  from Co-Development Perspective
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
48
5
0
11 Jul 2024
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with
  Inverse-Instruct
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct
Yutong Wu
Di Huang
Wenxuan Shi
Wei Wang
Lingzhe Gao
...
Qi Guo
Yewen Pu
Dawei Yin
Xing Hu
Yunji Chen
SyDa
18
1
0
08 Jul 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and
  Editing
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLM
DiffM
41
25
0
08 Jul 2024
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
Zeyu Leo Liu
Shrey Pandit
Xi Ye
Eunsol Choi
Greg Durrett
KELM
ALM
66
4
0
08 Jul 2024
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition
  and Program of Thought Verification
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification
Pritish Sahu
Karan Sikka
Ajay Divakaran
MLLM
LRM
62
4
0
02 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and
  Aleatoric Awareness
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Raghavi Chandu
Linjie Li
Anas Awadalla
Ximing Lu
Jae Sung Park
Jack Hessel
Lijuan Wang
Yejin Choi
36
2
0
02 Jul 2024
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via
  Data Synthesis
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
Chuanqi Cheng
Jian-Yu Guan
Wei Wu
Rui Yan
LRM
35
10
0
28 Jun 2024
Tools Fail: Detecting Silent Errors in Faulty Tools
Tools Fail: Detecting Silent Errors in Faulty Tools
Jimin Sun
So Yeon Min
Yingshan Chang
Yonatan Bisk
32
4
0
27 Jun 2024
CogExplore: Contextual Exploration with Language-Encoded Environment
  Representations
CogExplore: Contextual Exploration with Language-Encoded Environment Representations
Harel Biggie
Patrick Cooper
Doncey Albin
Kristen Such
Christoffer Heckman
LM&Ro
30
0
0
24 Jun 2024
Previous
1234567
Next