Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.03316
Cited By
IQA: Visual Question Answering in Interactive Environments
9 December 2017
Daniel Gordon
Aniruddha Kembhavi
Mohammad Rastegari
Joseph Redmon
D. Fox
Ali Farhadi
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"IQA: Visual Question Answering in Interactive Environments"
48 / 48 papers shown
Title
Visual Environment-Interactive Planning for Embodied Complex-Question Answering
Ning Lan
Baoshan Ou
Xuemei Xie
G. Shi
LM&Ro
60
1
0
01 Apr 2025
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
Kaixuan Jiang
Y. Liu
Weixing Chen
Jingzhou Luo
Ziliang Chen
Ling Pan
G. Li
Liang Lin
51
2
0
14 Mar 2025
EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments
Dongping Li
Tielong Cai
Tianci Tang
Wenhao Chai
Katherine Rose Driggs-Campbell
Gaoang Wang
LM&Ro
56
0
0
11 Mar 2025
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
Rasoul Shafipour
David Harrison
Maxwell Horton
Jeffrey Marker
Houman Bedayat
Sachin Mehta
Mohammad Rastegari
Mahyar Najibi
Saman Naderiparizi
MQ
43
3
0
14 Oct 2024
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories
Yifan Song
Weimin Xiong
Xiutian Zhao
Dawei Zhu
Wenhao Wu
Ke Wang
Cheng Li
Wei Peng
Sujian Li
LLMAG
24
9
0
10 Oct 2024
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
Taewoong Kim
Cheolhong Min
Byeonghwi Kim
Jinyeon Kim
Wonje Jeung
Jonghyun Choi
LM&Ro
34
4
0
26 Jul 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
67
41
0
23 May 2024
Which way is `right'?: Uncovering limitations of Vision-and-Language Navigation model
Meera Hahn
Amit Raj
James M. Rehg
30
3
0
30 Nov 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
29
7
0
14 Jun 2023
Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following
Mingyu Ding
Yan Xu
Zhenfang Chen
David D. Cox
Ping Luo
J. Tenenbaum
Chuang Gan
LM&Ro
49
21
0
07 Apr 2023
Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling
Kolby Nottingham
Prithviraj Ammanabrolu
Alane Suhr
Yejin Choi
Hannaneh Hajishirzi
Sameer Singh
Roy Fox
LLMAG
LM&Ro
17
76
0
28 Jan 2023
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes
Ahmed Abdelreheem
Kyle Olszewski
Hsin-Ying Lee
Peter Wonka
Panos Achlioptas
3DPC
20
28
0
12 Dec 2022
Navigating to Objects in the Real World
Théophile Gervet
Soumith Chintala
Dhruv Batra
Jitendra Malik
Devendra Singh Chaplot
27
122
0
02 Dec 2022
A General Purpose Supervisory Signal for Embodied Agents
Kunal Pratap Singh
Jordi Salvador
Luca Weihs
Aniruddha Kembhavi
SSL
19
3
0
01 Dec 2022
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Matt Deitke
Eli VanderBilt
Alvaro Herrasti
Luca Weihs
Jordi Salvador
...
Winson Han
Eric Kolve
Ali Farhadi
Aniruddha Kembhavi
Roozbeh Mottaghi
LM&Ro
25
233
0
14 Jun 2022
Episodic Memory Question Answering
Samyak Datta
Sameer Dharur
Vincent Cartillier
Ruta Desai
Mukul Khanna
Dhruv Batra
Devi Parikh
EgoV
11
31
0
03 May 2022
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
Ram Ramrakhya
Eric Undersander
Dhruv Batra
Abhishek Das
LM&Ro
13
109
0
07 Apr 2022
Continuous Scene Representations for Embodied AI
S. Gadre
Kiana Ehsani
Shuran Song
Roozbeh Mottaghi
23
46
0
31 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
27
27
0
08 Mar 2022
DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following
Xiaofeng Gao
Qiaozi Gao
Ran Gong
Kaixiang Lin
Govind Thattai
Gaurav Sukhatme
LM&Ro
78
70
0
27 Feb 2022
Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models, Benchmark and Efficient Evaluation
Marco Rosano
Antonino Furnari
Luigi Gulino
C. Santoro
G. Farinella
EgoV
27
5
0
02 Feb 2022
Shaping embodied agent behavior with activity-context priors from egocentric video
Tushar Nagarajan
Kristen Grauman
EgoV
LM&Ro
38
13
0
14 Oct 2021
Are you doing what I say? On modalities alignment in ALFRED
Ting-Rui Chiang
Yi-Ting Yeh
Ta-Chung Chi
Yau-Shian Wang
17
1
0
12 Oct 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360
∘
^\circ
∘
Videos
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
10
78
0
11 Oct 2021
Procedures as Programs: Hierarchical Control of Situated Agents through Natural Language
Shuyan Zhou
Pengcheng Yin
Graham Neubig
LM&Ro
9
1
0
16 Sep 2021
Knowledge-based Embodied Question Answering
Sinan Tan
Mengmeng Ge
Di Guo
Huaping Liu
F. Sun
15
20
0
16 Sep 2021
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution
Valts Blukis
Chris Paxton
D. Fox
Animesh Garg
Yoav Artzi
LM&Ro
212
133
0
12 Jul 2021
SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents
Grgur Kovač
Rémy Portelas
Katja Hofmann
Pierre-Yves Oudeyer
ALM
16
6
0
02 Jul 2021
Learning to Map for Active Semantic Goal Navigation
G. Georgakis
Bernadette Bucher
Karl Schmeckpeper
Siddharth Singh
Kostas Daniilidis
25
73
0
29 Jun 2021
Core Challenges in Embodied Vision-Language Planning
Jonathan M Francis
Nariaki Kitamura
Felix Labelle
Xiaopeng Lu
Ingrid Navarro
Jean Oh
LM&Ro
39
45
0
26 Jun 2021
A Survey on Human-aware Robot Navigation
Ronja Möller
Antonino Furnari
S. Battiato
Aki Härmä
G. Farinella
31
87
0
22 Jun 2021
RobustNav: Towards Benchmarking Robustness in Embodied Navigation
Prithvijit Chattopadhyay
Judy Hoffman
Roozbeh Mottaghi
Aniruddha Kembhavi
13
55
0
08 Jun 2021
Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring
Yichi Zhang
J. Chai
20
78
0
07 Jun 2021
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget
Erik Wijmans
Irfan Essa
Dhruv Batra
3DPC
17
10
0
11 Dec 2020
Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule
Shuhei Kurita
Kyunghyun Cho
LM&Ro
9
23
0
16 Sep 2020
Semantic Curiosity for Active Visual Learning
Devendra Singh Chaplot
Helen Jiang
Saurabh Gupta
Abhinav Gupta
ObjD
16
72
0
16 Jun 2020
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
19
350
0
21 Apr 2020
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments
Jacob Krantz
Erik Wijmans
Arjun Majumdar
Dhruv Batra
Stefan Lee
19
263
0
06 Apr 2020
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
26
9
0
31 Oct 2019
CraftAssist: A Framework for Dialogue-enabled Interactive Agents
Jonathan Gray
Kavya Srinet
Yacine Jernite
Haonan Yu
Zhuoyuan Chen
Demi Guo
Siddharth Goyal
C. L. Zitnick
Arthur Szlam
14
38
0
19 Jul 2019
Vision-and-Dialog Navigation
Jesse Thomason
Michael Murray
Maya Cakmak
Luke Zettlemoyer
LM&Ro
32
322
0
10 Jul 2019
Embodied Visual Recognition
Jianwei Yang
Zhile Ren
Mingze Xu
Xinlei Chen
David J. Crandall
Devi Parikh
Dhruv Batra
27
26
0
09 Apr 2019
Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches
Shane Storks
Qiaozi Gao
J. Chai
13
128
0
02 Apr 2019
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
Howard Chen
Alane Suhr
Dipendra Kumar Misra
Noah Snavely
Yoav Artzi
12
379
0
29 Nov 2018
Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction
Dipendra Kumar Misra
Andrew Bennett
Valts Blukis
Eyvind Niklasson
Max Shatkhin
Yoav Artzi
LM&Ro
16
186
0
04 Sep 2018
CAD2RL: Real Single-Image Flight without a Single Real Image
Fereshteh Sadeghi
Sergey Levine
SSL
216
809
0
13 Nov 2016
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
FAtt
18
19,446
0
07 Oct 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
144
1,464
0
06 Jun 2016
1