IQA: Visual Question Answering in Interactive Environments

9 December 2017

Papers citing "IQA: Visual Question Answering in Interactive Environments"

48 / 48 papers shown

Title
Visual Environment-Interactive Planning for Embodied Complex-Question Answering Ning Lan Baoshan Ou Xuemei Xie G. Shi LM&Ro 60 1 0 01 Apr 2025
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering Kaixuan Jiang Y. Liu Weixing Chen Jingzhou Luo Ziliang Chen Ling Pan G. Li Liang Lin 51 2 0 14 Mar 2025
EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments Dongping Li Tielong Cai Tianci Tang Wenhao Chai Katherine Rose Driggs-Campbell Gaoang Wang LM&Ro 56 0 0 11 Mar 2025
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators Rasoul Shafipour David Harrison Maxwell Horton Jeffrey Marker Houman Bedayat Sachin Mehta Mohammad Rastegari Mahyar Najibi Saman Naderiparizi MQ 43 3 0 14 Oct 2024
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories Yifan Song Weimin Xiong Xiutian Zhao Dawei Zhu Wenhao Wu Ke Wang Cheng Li Wei Peng Sujian Li LLMAG 24 9 0 10 Oct 2024
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments Taewoong Kim Cheolhong Min Byeonghwi Kim Jinyeon Kim Wonje Jeung Jonghyun Choi LM&Ro 34 4 0 26 Jul 2024
A Survey on Vision-Language-Action Models for Embodied AI Yueen Ma Zixing Song Yuzheng Zhuang Jianye Hao Irwin King LM&Ro 67 41 0 23 May 2024
Which way is `right'?: Uncovering limitations of Vision-and-Language Navigation model Meera Hahn Amit Raj James M. Rehg 30 3 0 30 Nov 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models Lingxi Xie Longhui Wei Xiaopeng Zhang Kaifeng Bi Xiaotao Gu Jianlong Chang Qi Tian 29 7 0 14 Jun 2023
Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following Mingyu Ding Yan Xu Zhenfang Chen David D. Cox Ping Luo J. Tenenbaum Chuang Gan LM&Ro 49 21 0 07 Apr 2023
Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling Kolby Nottingham Prithviraj Ammanabrolu Alane Suhr Yejin Choi Hannaneh Hajishirzi Sameer Singh Roy Fox LLMAG LM&Ro 17 76 0 28 Jan 2023
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes Ahmed Abdelreheem Kyle Olszewski Hsin-Ying Lee Peter Wonka Panos Achlioptas 3DPC 20 28 0 12 Dec 2022
Navigating to Objects in the Real World Théophile Gervet Soumith Chintala Dhruv Batra Jitendra Malik Devendra Singh Chaplot 27 122 0 02 Dec 2022
A General Purpose Supervisory Signal for Embodied Agents Kunal Pratap Singh Jordi Salvador Luca Weihs Aniruddha Kembhavi SSL 19 3 0 01 Dec 2022
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation Matt Deitke Eli VanderBilt Alvaro Herrasti Luca Weihs Jordi Salvador ... Winson Han Eric Kolve Ali Farhadi Aniruddha Kembhavi Roozbeh Mottaghi LM&Ro 25 233 0 14 Jun 2022
Episodic Memory Question Answering Samyak Datta Sameer Dharur Vincent Cartillier Ruta Desai Mukul Khanna Dhruv Batra Devi Parikh EgoV 11 31 0 03 May 2022
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale Ram Ramrakhya Eric Undersander Dhruv Batra Abhishek Das LM&Ro 13 109 0 07 Apr 2022
Continuous Scene Representations for Embodied AI S. Gadre Kiana Ehsani Shuran Song Roozbeh Mottaghi 23 46 0 31 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant B. Wong Joya Chen You Wu Stan Weixian Lei Dongxing Mao Difei Gao Mike Zheng Shou EgoV 27 27 0 08 Mar 2022
DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following Xiaofeng Gao Qiaozi Gao Ran Gong Kaixiang Lin Govind Thattai Gaurav Sukhatme LM&Ro 78 70 0 27 Feb 2022
Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models, Benchmark and Efficient Evaluation Marco Rosano Antonino Furnari Luigi Gulino C. Santoro G. Farinella EgoV 27 5 0 02 Feb 2022
Shaping embodied agent behavior with activity-context priors from egocentric video Tushar Nagarajan Kristen Grauman EgoV LM&Ro 38 13 0 14 Oct 2021
Are you doing what I say? On modalities alignment in ALFRED Ting-Rui Chiang Yi-Ting Yeh Ta-Chung Chi Yau-Shian Wang 17 1 0 12 Oct 2021
$Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos$ Pano-AVQA: Grounded Audio-Visual Question Answering on 360 $^\circ$ Videos Heeseung Yun Youngjae Yu Wonsuk Yang Kangil Lee Gunhee Kim 10 78 0 11 Oct 2021
Procedures as Programs: Hierarchical Control of Situated Agents through Natural Language Shuyan Zhou Pengcheng Yin Graham Neubig LM&Ro 9 1 0 16 Sep 2021
Knowledge-based Embodied Question Answering Sinan Tan Mengmeng Ge Di Guo Huaping Liu F. Sun 15 20 0 16 Sep 2021
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution Valts Blukis Chris Paxton D. Fox Animesh Garg Yoav Artzi LM&Ro 212 133 0 12 Jul 2021
SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents Grgur Kovač Rémy Portelas Katja Hofmann Pierre-Yves Oudeyer ALM 16 6 0 02 Jul 2021
Learning to Map for Active Semantic Goal Navigation G. Georgakis Bernadette Bucher Karl Schmeckpeper Siddharth Singh Kostas Daniilidis 25 73 0 29 Jun 2021
Core Challenges in Embodied Vision-Language Planning Jonathan M Francis Nariaki Kitamura Felix Labelle Xiaopeng Lu Ingrid Navarro Jean Oh LM&Ro 39 45 0 26 Jun 2021
A Survey on Human-aware Robot Navigation Ronja Möller Antonino Furnari S. Battiato Aki Härmä G. Farinella 31 87 0 22 Jun 2021
RobustNav: Towards Benchmarking Robustness in Embodied Navigation Prithvijit Chattopadhyay Judy Hoffman Roozbeh Mottaghi Aniruddha Kembhavi 13 55 0 08 Jun 2021
Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring Yichi Zhang J. Chai 20 78 0 07 Jun 2021
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget Erik Wijmans Irfan Essa Dhruv Batra 3DPC 17 10 0 11 Dec 2020
Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule Shuhei Kurita Kyunghyun Cho LM&Ro 9 23 0 16 Sep 2020
Semantic Curiosity for Active Visual Learning Devendra Singh Chaplot Helen Jiang Saurabh Gupta Abhinav Gupta ObjD 16 72 0 16 Jun 2020
Experience Grounds Language Yonatan Bisk Ari Holtzman Jesse Thomason Jacob Andreas Yoshua Bengio ... Angeliki Lazaridou Jonathan May Aleksandr Nisnevich Nicolas Pinto Joseph P. Turian 19 350 0 21 Apr 2020
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments Jacob Krantz Erik Wijmans Arjun Majumdar Dhruv Batra Stefan Lee 19 263 0 06 Apr 2020
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines Jingxiang Lin Unnat Jain A. Schwing LRM ReLM 26 9 0 31 Oct 2019
CraftAssist: A Framework for Dialogue-enabled Interactive Agents Jonathan Gray Kavya Srinet Yacine Jernite Haonan Yu Zhuoyuan Chen Demi Guo Siddharth Goyal C. L. Zitnick Arthur Szlam 14 38 0 19 Jul 2019
Vision-and-Dialog Navigation Jesse Thomason Michael Murray Maya Cakmak Luke Zettlemoyer LM&Ro 32 322 0 10 Jul 2019
Embodied Visual Recognition Jianwei Yang Zhile Ren Mingze Xu Xinlei Chen David J. Crandall Devi Parikh Dhruv Batra 27 26 0 09 Apr 2019
Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches Shane Storks Qiaozi Gao J. Chai 13 128 0 02 Apr 2019
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments Howard Chen Alane Suhr Dipendra Kumar Misra Noah Snavely Yoav Artzi 12 379 0 29 Nov 2018
Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction Dipendra Kumar Misra Andrew Bennett Valts Blukis Eyvind Niklasson Max Shatkhin Yoav Artzi LM&Ro 16 186 0 04 Sep 2018
CAD2RL: Real Single-Image Flight without a Single Real Image Fereshteh Sadeghi Sergey Levine SSL 216 809 0 13 Nov 2016
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath R. Selvaraju Michael Cogswell Abhishek Das Ramakrishna Vedantam Devi Parikh Dhruv Batra FAtt 18 19,446 0 07 Oct 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach 144 1,464 0 06 Jun 2016