Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2105.06453
Cited By
v1
v2 (latest)
Episodic Transformer for Vision-and-Language Navigation
IEEE International Conference on Computer Vision (ICCV), 2021
13 May 2021
Alexander Pashevich
Cordelia Schmid
Chen Sun
LM&Ro
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Episodic Transformer for Vision-and-Language Navigation"
40 / 140 papers shown
Title
Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
So Yeon Min
Hao Zhu
Ruslan Salakhutdinov
Yonatan Bisk
LM&Ro
256
14
0
10 Oct 2022
Dialog Acts for Task-Driven Embodied Agents
SIGDIAL Conferences (SIGDIAL), 2022
Spandana Gella
Aishwarya Padmakumar
P. Lange
Dilek Z. Hakkani-Tür
LM&Ro
165
20
0
26 Sep 2022
NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields
IEEE Robotics and Automation Letters (RA-L), 2022
Jiankai Sun
Yan Xu
Mingyu Ding
Hongwei Yi
Chen Wang
Jingdong Wang
Liangjun Zhang
Mac Schwager
181
13
0
24 Sep 2022
Instruction-driven history-aware policies for robotic manipulations
Conference on Robot Learning (CoRL), 2022
Pierre-Louis Guhur
Shizhe Chen
Ricardo Garcia Pinel
Makarand Tapaswi
Ivan Laptev
Cordelia Schmid
LM&Ro
389
139
0
11 Sep 2022
On Grounded Planning for Embodied Tasks with Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2022
Bill Yuchen Lin
Chengsong Huang
Qian Liu
Wenda Gu
Sam Sommerer
Xiang Ren
LM&Ro
315
49
0
29 Aug 2022
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Kai Zheng
KAI-QING Zhou
Jing Gu
Yue Fan
Jialu Wang
Zong-xiao Li
Xuehai He
Xinze Wang
LM&Ro
284
47
0
28 Aug 2022
Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
European Conference on Computer Vision (ECCV), 2022
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
310
56
0
24 Aug 2022
MemoNav: Selecting Informative Memories for Visual Navigation
Hongxin Li
Xueke Yang
Yu-Ren Yang
Shuqi Mei
Zhaoxiang Zhang
96
4
0
20 Aug 2022
Target-Driven Structured Transformer Planner for Vision-Language Navigation
ACM Multimedia (ACM MM), 2022
Yusheng Zhao
Jinyu Chen
Chen Gao
Wenguan Wang
Lirong Yang
Haibing Ren
Huaxia Xia
Si Liu
LM&Ro
388
73
0
19 Jul 2022
1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)
Dongyan An
Zun Wang
Yangguang Li
Yi Wang
Yicong Hong
Yan Huang
Liang Wang
Jing Shao
174
22
0
23 Jun 2022
A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search
Brandon Trabucco
Gunnar Sigurdsson
Robinson Piramuthu
Gaurav Sukhatme
Ruslan Salakhutdinov
OCL
227
9
0
21 Jun 2022
EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL
Neural Information Processing Systems (NeurIPS), 2022
Thomas Carta
Pierre-Yves Oudeyer
Olivier Sigaud
Sylvain Lamprier
OffRL
268
29
0
20 Jun 2022
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Neural Information Processing Systems (NeurIPS), 2022
Kai Zheng
Xiaotong Chen
Odest Chadwicke Jenkins
Xinze Wang
LM&Ro
CoGe
210
79
0
17 Jun 2022
Multimodal Learning with Transformers: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
451
804
0
13 Jun 2022
Aerial Vision-and-Dialog Navigation
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Yue Fan
Winson X. Chen
Tongzhou Jiang
Chun-ni Zhou
Yi Zhang
Xinze Wang
254
34
0
24 May 2022
On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets
First Workshop on Insights from Negative Results in NLP (IFNRN), 2022
Hyounghun Kim
Aishwarya Padmakumar
Di Jin
Joey Tianyi Zhou
Dilek Z. Hakkani-Tür
76
0
0
18 May 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Computer Vision and Pattern Recognition (CVPR), 2022
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
198
53
0
04 May 2022
On the Importance of Karaka Framework in Multi-modal Grounding
Sai Kiran Gorthi
R. Mamidi
112
1
0
09 Apr 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Conference on Robot Learning (CoRL), 2022
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
503
2,527
0
04 Apr 2022
Moment-based Adversarial Training for Embodied Language Comprehension
International Conference on Pattern Recognition (ICPR), 2022
Shintaro Ishikawa
K. Sugiura
LM&Ro
130
9
0
02 Apr 2022
Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Jing Gu
Eliana Stefani
Qi Wu
Jesse Thomason
Xinze Wang
LM&Ro
307
145
0
22 Mar 2022
Summarizing a virtual robot's past actions in natural language
Chad DeChant
Daniel Bauer
LM&Ro
144
4
0
13 Mar 2022
Cross-modal Map Learning for Vision and Language Navigation
Computer Vision and Pattern Recognition (CVPR), 2022
G. Georgakis
Karl Schmeckpeper
Karan Wanchoo
Soham Dan
E. Miltsakaki
Dan Roth
Kostas Daniilidis
320
94
0
10 Mar 2022
LEBP -- Language Expectation & Binding Policy: A Two-Stream Framework for Embodied Vision-and-Language Interaction Task Learning Agents
Hao Liu
Yang Liu
Hong He
Hang Yang
LM&Ro
127
23
0
09 Mar 2022
DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following
IEEE Robotics and Automation Letters (RA-L), 2022
Xiaofeng Gao
Qiaozi Gao
Ran Gong
Kaixiang Lin
Govind Thattai
Gaurav Sukhatme
LM&Ro
286
79
0
27 Feb 2022
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Computer Vision and Pattern Recognition (CVPR), 2022
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
LM&Ro
214
198
0
23 Feb 2022
One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Computer Vision and Pattern Recognition (CVPR), 2022
Chan Hee Song
Jihyung Kil
Tai-Yu Pan
Brian M. Sadler
Wei-Lun Chao
Yu-Chuan Su
LRM
270
39
0
14 Feb 2022
ASC me to Do Anything: Multi-task Training for Embodied AI
Jiasen Lu
Jordi Salvador
Roozbeh Mottaghi
Aniruddha Kembhavi
142
3
0
14 Feb 2022
Learning to Act with Affordance-Aware Multimodal Neural SLAM
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Zhiwei Jia
Kaixiang Lin
Yizhou Zhao
Qiaozi Gao
Govind Thattai
Gaurav Sukhatme
LM&Ro
177
16
0
24 Jan 2022
Video Transformers: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
374
132
0
16 Jan 2022
Less is More: Generating Grounded Navigation Instructions from Landmarks
Su Wang
Ceslee Montgomery
Jordi Orbay
Vighnesh Birodkar
Aleksandra Faust
Izzeddin Gur
Natasha Jaques
Austin Waters
Jason Baldridge
Peter Anderson
353
77
0
25 Nov 2021
Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation
European Conference on Computer Vision (ECCV), 2021
Chuang Lin
Yi Jiang
Jianfei Cai
Zhuang Li
Gholamreza Haffari
Zehuan Yuan
159
36
0
10 Nov 2021
LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Yizhou Zhao
Kaixiang Lin
Zhiwei Jia
Qiaozi Gao
Govind Thattai
Jesse Thomason
Gaurav Sukhatme
3DV
LM&Ro
101
18
0
10 Nov 2021
History Aware Multimodal Transformer for Vision-and-Language Navigation
Shizhe Chen
Pierre-Louis Guhur
Cordelia Schmid
Ivan Laptev
LM&Ro
267
300
0
25 Oct 2021
FILM: Following Instructions in Language with Modular Methods
International Conference on Learning Representations (ICLR), 2021
So Yeon Min
Devendra Singh Chaplot
Pradeep Ravikumar
Yonatan Bisk
Ruslan Salakhutdinov
LM&Ro
506
181
0
12 Oct 2021
Skill Induction and Planning with Latent Language
Pratyusha Sharma
Antonio Torralba
Jacob Andreas
LM&Ro
452
121
0
04 Oct 2021
TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar
Jesse Thomason
Ayush Shrivastava
P. Lange
Anjali Narayan-Chen
Spandana Gella
Robinson Piramithu
Gokhan Tur
Dilek Z. Hakkani-Tür
LM&Ro
752
229
0
01 Oct 2021
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion
Alessandro Suglia
Qiaozi Gao
Jesse Thomason
Govind Thattai
Gaurav Sukhatme
LM&Ro
228
84
0
10 Aug 2021
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution
Valts Blukis
Chris Paxton
Dieter Fox
Animesh Garg
Yoav Artzi
LM&Ro
467
153
0
12 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Neural Information Processing Systems (NeurIPS), 2021
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
528
678
0
30 Jun 2021
Previous
1
2
3