ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.03633
  4. Cited By
Inferring and Executing Programs for Visual Reasoning

Inferring and Executing Programs for Visual Reasoning

10 May 2017
Justin Johnson
B. Hariharan
Laurens van der Maaten
Judy Hoffman
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
    NAI
ArXiv (abs)PDFHTML

Papers citing "Inferring and Executing Programs for Visual Reasoning"

50 / 312 papers shown
Title
SpatialTraceGen: High-Fidelity Traces for Efficient VLM Spatial Reasoning Distillation
SpatialTraceGen: High-Fidelity Traces for Efficient VLM Spatial Reasoning Distillation
Gio Huh
Dhruv Sheth
Rayhan Zirvi
Frank Xiao
LRM
64
0
0
28 Oct 2025
NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
Danial Kamali
Parisa Kordjamshidi
NAILRMCoGeVLM
773
2
0
30 Sep 2025
Adaptive Fast-and-Slow Visual Program Reasoning for Long-Form VideoQA
Adaptive Fast-and-Slow Visual Program Reasoning for Long-Form VideoQA
Chenglin Li
Feng Han
FengTao
Ruilin Li
Qianglong Chen
Jingqi Tong
Yin Zhang
Jiaqi Wang
LRM
177
0
0
22 Sep 2025
SHERPA: A Model-Driven Framework for Large Language Model Execution
SHERPA: A Model-Driven Framework for Large Language Model Execution
Boqi Chen
Kua Chen
José Antonio Hernández López
Gunter Mussbacher
Dániel Varró
Amir Feizpour
LRM
109
1
0
29 Aug 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLMCoGeLRM
332
8
0
24 Aug 2025
PyVision: Agentic Vision with Dynamic Tooling
PyVision: Agentic Vision with Dynamic Tooling
Shitian Zhao
H. Zhang
Shaoheng Lin
Ming Li
Qilong Wu
Kaipeng Zhang
Chen Wei
LRM
249
19
0
10 Jul 2025
Think before You Simulate: Symbolic Reasoning to Orchestrate Neural Computation for Counterfactual Question Answering
Think before You Simulate: Symbolic Reasoning to Orchestrate Neural Computation for Counterfactual Question AnsweringIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Adam Ishay
Zhun Yang
Joohyung Lee
Ilgu Kang
Dongjae Lim
NAI
254
1
0
12 Jun 2025
A Neurosymbolic Agent System for Compositional Visual Reasoning
A Neurosymbolic Agent System for Compositional Visual Reasoning
Yichang Xu
Gaowen Liu
Ramana Rao Kompella
Sihao Hu
Tiansheng Huang
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
LRMVLM
217
0
0
09 Jun 2025
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Chaoyang Wang
Zeyu Zhang
Meng Meng
Xu Zhou
Haiyun Jiang
OffRLLRM
202
1
0
07 Jun 2025
Understanding Complexity in VideoQA via Visual Program Generation
Understanding Complexity in VideoQA via Visual Program Generation
Cristobal Eyzaguirre
Igor Vasiljevic
Achal Dave
Jiajun Wu
Rares Andrei Ambrus
Thomas Kollar
Juan Carlos Niebles
P. Tokmakov
238
0
0
19 May 2025
Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models
Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models
X. Wang
Haoyang Li
Zeyang Zhang
Zeyang Zhang
Wenwu Zhu
LRM
365
1
0
28 Apr 2025
Symbolic Representation for Any-to-Any Generative Tasks
Symbolic Representation for Any-to-Any Generative TasksComputer Vision and Pattern Recognition (CVPR), 2025
Jianfei Chen
Xiaoye Zhu
Yanjie Wang
Tianyang Liu
Xinhui Chen
...
Yifei Ke
Qingbin Liu
Yiwen Yuan
Julian McAuley
Li Li
DiffM
218
0
0
24 Apr 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLMOffRLLRM
489
0
0
26 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
335
3
0
19 Mar 2025
MuBlE: MuJoCo and Blender simulation Environment and Benchmark for Task Planning in Robot Manipulation
Michal Nazarczuk
Karla Stepanova
Jan Kristof Behrens
Matej Hoffmann
K. Mikolajczyk
LM&Ro
410
1
0
04 Mar 2025
MoVer: Motion Verification for Motion Graphics Animations
MoVer: Motion Verification for Motion Graphics AnimationsACM Transactions on Graphics (TOG), 2025
Jiaju Ma
Maneesh Agrawala
VGen
289
7
0
19 Feb 2025
DiSciPLE: Learning Interpretable Programs for Scientific Visual DiscoveryComputer Vision and Pattern Recognition (CVPR), 2025
Utkarsh Mall
Cheng Perng Phoo
Mia Chiquier
Bharath Hariharan
Kavita Bala
Carl Vondrick
431
3
0
17 Feb 2025
A Concept-Centric Approach to Multi-Modality Learning
A Concept-Centric Approach to Multi-Modality Learning
Yuchong Geng
Ao Tang
296
0
0
18 Dec 2024
TANGO: Training-free Embodied AI Agents for Open-world Tasks
TANGO: Training-free Embodied AI Agents for Open-world TasksComputer Vision and Pattern Recognition (CVPR), 2024
Filippo Ziliotto
Tommaso Campari
Luciano Serafini
Lamberto Ballan
LLMAGLM&RoMLLMLRM
315
11
0
05 Dec 2024
Learning to Reason Iteratively and Parallelly for Complex Visual
  Reasoning Scenarios
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning ScenariosNeural Information Processing Systems (NeurIPS), 2024
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLMLRM
323
5
0
20 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
261
8
0
17 Nov 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLMLRM
250
9
0
28 Oct 2024
Multi-granularity Contrastive Cross-modal Collaborative Generation for
  End-to-End Long-term Video Question Answering
Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question AnsweringIEEE Transactions on Image Processing (TIP), 2024
Ting Yu
Kunhao Fu
Jian Zhang
Qingming Huang
Jun Yu
186
6
0
12 Oct 2024
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIsIEEE International Conference on Robotics and Automation (ICRA), 2024
A. Mavrogiannis
Dehao Yuan
Yiannis Aloimonos
LM&Ro
291
2
0
23 Sep 2024
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph GenerationInternational Conference on Learning Representations (ICLR), 2024
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
404
8
0
16 Sep 2024
ExoViP: Step-by-step Verification and Exploration with Exoskeleton
  Modules for Compositional Visual Reasoning
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yanjie Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
296
6
0
05 Aug 2024
Pyramid Coder: Hierarchical Code Generator for Compositional Visual
  Question Answering
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering
Ruoyue Shen
Nakamasa Inoue
Koichi Shinoda
169
3
0
30 Jul 2024
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World
  Knowledge
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World KnowledgeComputer Vision and Pattern Recognition (CVPR), 2024
Andong Wang
Bo Wu
Sunli Chen
Zhenfang Chen
Haotian Guan
Wei-Ning Lee
Li Erran Li
Chuang Gan
LRMRALM
250
30
0
15 May 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B. Tenenbaum
Chuang Gan
449
255
0
15 May 2024
Naturally Supervised 3D Visual Grounding with Language-Regularized
  Concept Learners
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
Chun Feng
Joy Hsu
Weiyu Liu
Jiajun Wu
PINNLRM
234
9
0
30 Apr 2024
Closed Loop Interactive Embodied Reasoning for Robot Manipulation
Closed Loop Interactive Embodied Reasoning for Robot Manipulation
Michal Nazarczuk
Jan Kristof Behrens
Karla Stepanova
Matej Hoffmann
K. Mikolajczyk
LM&RoLRM
390
4
0
23 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
381
62
0
09 Apr 2024
Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal
  Reasoning for Real-world Video Question Answering
Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering
Lili Liang
Guanglu Sun
Jin Qiu
Lizhong Zhang
NAI
205
5
0
05 Apr 2024
PhotoScout: Synthesis-Powered Multi-Modal Image Search
PhotoScout: Synthesis-Powered Multi-Modal Image Search
Celeste Barnaby
Qiaochu Chen
Chenglong Wang
Işıl Dillig
187
6
0
19 Jan 2024
Generalizing Visual Question Answering from Synthetic to Human-Written
  Questions via a Chain of QA with a Large Language Model
Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language ModelEuropean Conference on Artificial Intelligence (ECAI), 2024
Taehee Kim
Yeongjae Cho
Heejun Shin
Yohan Jo
Dongmyung Shin
319
6
0
12 Jan 2024
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results
  for Video Question Answering
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question AnsweringAAAI Conference on Artificial Intelligence (AAAI), 2024
Yueqian Wang
Yuxuan Wang
Kai Chen
Dongyan Zhao
192
2
0
08 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as
  Programmers
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRMVLM
278
12
0
03 Jan 2024
Interactive Visual Task Learning for Robots
Interactive Visual Task Learning for Robots
Weiwei Gu
Anant Sah
N. Gopalan
215
7
0
20 Dec 2023
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal
  Models
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal ModelsEuropean Conference on Computer Vision (ECCV), 2023
Rizhao Cai
Zirui Song
Dayan Guan
Zhenhao Chen
Xing Luo
Chenyu Yi
Alex C. Kot
MLLMVLM
284
44
0
05 Dec 2023
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Compositional Chain-of-Thought Prompting for Large Multimodal ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Chancharik Mitra
Brandon Huang
Trevor Darrell
Roei Herzig
MLLMLRM
316
162
0
27 Nov 2023
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback
De-fine: Decomposing and Refining Visual Programs with Auto-FeedbackACM Multimedia (ACM MM), 2023
Minghe Gao
Juncheng Li
Hao Fei
Liang Pang
Wei Ji
Guoming Wang
Wenqiao Zhang
Siliang Tang
Yueting Zhuang
154
12
0
21 Nov 2023
Attribute Diversity Determines the Systematicity Gap in VQA
Attribute Diversity Determines the Systematicity Gap in VQAConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ian Berlot-Attwell
Kumar Krishna Agrawal
A. M. Carrell
Yash Sharma
Naomi Saphra
218
2
0
15 Nov 2023
Analyzing Modular Approaches for Visual Question Decomposition
Analyzing Modular Approaches for Visual Question DecompositionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Apoorv Khandelwal
Ellie Pavlick
Chen Sun
238
5
0
10 Nov 2023
ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model
  for Visual Question Answering in Vietnamese
ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese
Khiem Vinh Tran
Hao Phu Phan
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
139
16
0
27 Oct 2023
Symbolic Planning and Code Generation for Grounded Dialogue
Symbolic Planning and Code Generation for Grounded Dialogue
Justin T. Chiu
Wenting Zhao
Derek Chen
Saujas Vaduguru
Alexander M. Rush
Daniel Fried
LLMAG
118
10
0
26 Oct 2023
What's Left? Concept Grounding with Logic-Enhanced Foundation Models
What's Left? Concept Grounding with Logic-Enhanced Foundation ModelsNeural Information Processing Systems (NeurIPS), 2023
Joy Hsu
Jiayuan Mao
Joshua B. Tenenbaum
Jiajun Wu
VLMReLMLRM
376
38
0
24 Oct 2023
API-Assisted Code Generation for Question Answering on Varied Table
  Structures
API-Assisted Code Generation for Question Answering on Varied Table StructuresConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yihan Cao
Shuyi Chen
Ryan Liu
Zhiruo Wang
Daniel Fried
LMTD
204
26
0
23 Oct 2023
NEUCORE: Neural Concept Reasoning for Composed Image Retrieval
NEUCORE: Neural Concept Reasoning for Composed Image Retrieval
Shu Zhao
Huijuan Xu
143
9
0
02 Oct 2023
D3: Data Diversity Design for Systematic Generalization in Visual
  Question Answering
D3: Data Diversity Design for Systematic Generalization in Visual Question Answering
Amir Rahimi
Vanessa D’Amario
Moyuru Yamada
Kentaro Takemoto
Tomotake Sasaki
Xavier Boix
153
2
0
15 Sep 2023
Neuro-Symbolic Recommendation Model based on Logic Query
Neuro-Symbolic Recommendation Model based on Logic QueryKnowledge-Based Systems (KBS), 2023
Maonian Wu
Bang-Chao Chen
Shaojun Zhu
Bo Zheng
Wei Peng
Mingyi Zhang
NAI
193
3
0
14 Sep 2023
1234567
Next