ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.06453
  4. Cited By
Episodic Transformer for Vision-and-Language Navigation
v1v2 (latest)

Episodic Transformer for Vision-and-Language Navigation

IEEE International Conference on Computer Vision (ICCV), 2021
13 May 2021
Alexander Pashevich
Cordelia Schmid
Chen Sun
    LM&Ro
ArXiv (abs)PDFHTML

Papers citing "Episodic Transformer for Vision-and-Language Navigation"

40 / 140 papers shown
Title
Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue
Don't Copy the Teacher: Data and Model Challenges in Embodied DialogueConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
So Yeon Min
Hao Zhu
Ruslan Salakhutdinov
Yonatan Bisk
LM&Ro
256
14
0
10 Oct 2022
Dialog Acts for Task-Driven Embodied Agents
Dialog Acts for Task-Driven Embodied AgentsSIGDIAL Conferences (SIGDIAL), 2022
Spandana Gella
Aishwarya Padmakumar
P. Lange
Dilek Z. Hakkani-Tür
LM&Ro
165
20
0
26 Sep 2022
NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance
  Fields
NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance FieldsIEEE Robotics and Automation Letters (RA-L), 2022
Jiankai Sun
Yan Xu
Mingyu Ding
Hongwei Yi
Chen Wang
Jingdong Wang
Liangjun Zhang
Mac Schwager
181
13
0
24 Sep 2022
Instruction-driven history-aware policies for robotic manipulations
Instruction-driven history-aware policies for robotic manipulationsConference on Robot Learning (CoRL), 2022
Pierre-Louis Guhur
Shizhe Chen
Ricardo Garcia Pinel
Makarand Tapaswi
Ivan Laptev
Cordelia Schmid
LM&Ro
389
139
0
11 Sep 2022
On Grounded Planning for Embodied Tasks with Language Models
On Grounded Planning for Embodied Tasks with Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2022
Bill Yuchen Lin
Chengsong Huang
Qian Liu
Wenda Gu
Sam Sommerer
Xiang Ren
LM&Ro
315
49
0
29 Aug 2022
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Kai Zheng
KAI-QING Zhou
Jing Gu
Yue Fan
Jialu Wang
Zong-xiao Li
Xuehai He
Xinze Wang
LM&Ro
284
47
0
28 Aug 2022
Learning from Unlabeled 3D Environments for Vision-and-Language
  Navigation
Learning from Unlabeled 3D Environments for Vision-and-Language NavigationEuropean Conference on Computer Vision (ECCV), 2022
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
310
56
0
24 Aug 2022
MemoNav: Selecting Informative Memories for Visual Navigation
MemoNav: Selecting Informative Memories for Visual Navigation
Hongxin Li
Xueke Yang
Yu-Ren Yang
Shuqi Mei
Zhaoxiang Zhang
96
4
0
20 Aug 2022
Target-Driven Structured Transformer Planner for Vision-Language
  Navigation
Target-Driven Structured Transformer Planner for Vision-Language NavigationACM Multimedia (ACM MM), 2022
Yusheng Zhao
Jinyu Chen
Chen Gao
Wenguan Wang
Lirong Yang
Haibing Ren
Huaxia Xia
Si Liu
LM&Ro
388
73
0
19 Jul 2022
1st Place Solutions for RxR-Habitat Vision-and-Language Navigation
  Competition (CVPR 2022)
1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)
Dongyan An
Zun Wang
Yangguang Li
Yi Wang
Yicong Hong
Yan Huang
Liang Wang
Jing Shao
174
22
0
23 Jun 2022
A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic
  Search
A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search
Brandon Trabucco
Gunnar Sigurdsson
Robinson Piramuthu
Gaurav Sukhatme
Ruslan Salakhutdinov
OCL
227
9
0
21 Jun 2022
EAGER: Asking and Answering Questions for Automatic Reward Shaping in
  Language-guided RL
EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RLNeural Information Processing Systems (NeurIPS), 2022
Thomas Carta
Pierre-Yves Oudeyer
Olivier Sigaud
Sylvain Lamprier
OffRL
268
29
0
20 Jun 2022
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
VLMbench: A Compositional Benchmark for Vision-and-Language ManipulationNeural Information Processing Systems (NeurIPS), 2022
Kai Zheng
Xiaotong Chen
Odest Chadwicke Jenkins
Xinze Wang
LM&RoCoGe
210
79
0
17 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
451
804
0
13 Jun 2022
Aerial Vision-and-Dialog Navigation
Aerial Vision-and-Dialog NavigationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yue Fan
Winson X. Chen
Tongzhou Jiang
Chun-ni Zhou
Yi Zhang
Xinze Wang
254
34
0
24 May 2022
On the Limits of Evaluating Embodied Agent Model Generalization Using
  Validation Sets
On the Limits of Evaluating Embodied Agent Model Generalization Using Validation SetsFirst Workshop on Insights from Negative Results in NLP (IFNRN), 2022
Hyounghun Kim
Aishwarya Padmakumar
Di Jin
Joey Tianyi Zhou
Dilek Z. Hakkani-Tür
76
0
0
18 May 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with
  Weak Supervision
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak SupervisionComputer Vision and Pattern Recognition (CVPR), 2022
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
198
53
0
04 May 2022
On the Importance of Karaka Framework in Multi-modal Grounding
On the Importance of Karaka Framework in Multi-modal Grounding
Sai Kiran Gorthi
R. Mamidi
112
1
0
09 Apr 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Do As I Can, Not As I Say: Grounding Language in Robotic AffordancesConference on Robot Learning (CoRL), 2022
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
503
2,527
0
04 Apr 2022
Moment-based Adversarial Training for Embodied Language Comprehension
Moment-based Adversarial Training for Embodied Language ComprehensionInternational Conference on Pattern Recognition (ICPR), 2022
Shintaro Ishikawa
K. Sugiura
LM&Ro
130
9
0
02 Apr 2022
Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future
  Directions
Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future DirectionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Jing Gu
Eliana Stefani
Qi Wu
Jesse Thomason
Xinze Wang
LM&Ro
307
145
0
22 Mar 2022
Summarizing a virtual robot's past actions in natural language
Summarizing a virtual robot's past actions in natural language
Chad DeChant
Daniel Bauer
LM&Ro
144
4
0
13 Mar 2022
Cross-modal Map Learning for Vision and Language Navigation
Cross-modal Map Learning for Vision and Language NavigationComputer Vision and Pattern Recognition (CVPR), 2022
G. Georgakis
Karl Schmeckpeper
Karan Wanchoo
Soham Dan
E. Miltsakaki
Dan Roth
Kostas Daniilidis
320
94
0
10 Mar 2022
LEBP -- Language Expectation & Binding Policy: A Two-Stream Framework
  for Embodied Vision-and-Language Interaction Task Learning Agents
LEBP -- Language Expectation & Binding Policy: A Two-Stream Framework for Embodied Vision-and-Language Interaction Task Learning Agents
Hao Liu
Yang Liu
Hong He
Hang Yang
LM&Ro
127
23
0
09 Mar 2022
DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following
DialFRED: Dialogue-Enabled Agents for Embodied Instruction FollowingIEEE Robotics and Automation Letters (RA-L), 2022
Xiaofeng Gao
Qiaozi Gao
Ran Gong
Kaixiang Lin
Govind Thattai
Gaurav Sukhatme
LM&Ro
286
79
0
27 Feb 2022
Think Global, Act Local: Dual-scale Graph Transformer for
  Vision-and-Language Navigation
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language NavigationComputer Vision and Pattern Recognition (CVPR), 2022
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
LM&Ro
214
198
0
23 Feb 2022
One Step at a Time: Long-Horizon Vision-and-Language Navigation with
  Milestones
One Step at a Time: Long-Horizon Vision-and-Language Navigation with MilestonesComputer Vision and Pattern Recognition (CVPR), 2022
Chan Hee Song
Jihyung Kil
Tai-Yu Pan
Brian M. Sadler
Wei-Lun Chao
Yu-Chuan Su
LRM
270
39
0
14 Feb 2022
ASC me to Do Anything: Multi-task Training for Embodied AI
ASC me to Do Anything: Multi-task Training for Embodied AI
Jiasen Lu
Jordi Salvador
Roozbeh Mottaghi
Aniruddha Kembhavi
142
3
0
14 Feb 2022
Learning to Act with Affordance-Aware Multimodal Neural SLAM
Learning to Act with Affordance-Aware Multimodal Neural SLAMIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Zhiwei Jia
Kaixiang Lin
Yizhou Zhao
Qiaozi Gao
Govind Thattai
Gaurav Sukhatme
LM&Ro
177
16
0
24 Jan 2022
Video Transformers: A Survey
Video Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
374
132
0
16 Jan 2022
Less is More: Generating Grounded Navigation Instructions from Landmarks
Less is More: Generating Grounded Navigation Instructions from Landmarks
Su Wang
Ceslee Montgomery
Jordi Orbay
Vighnesh Birodkar
Aleksandra Faust
Izzeddin Gur
Natasha Jaques
Austin Waters
Jason Baldridge
Peter Anderson
353
77
0
25 Nov 2021
Multimodal Transformer with Variable-length Memory for
  Vision-and-Language Navigation
Multimodal Transformer with Variable-length Memory for Vision-and-Language NavigationEuropean Conference on Computer Vision (ECCV), 2021
Chuang Lin
Yi Jiang
Jianfei Cai
Zhuang Li
Gholamreza Haffari
Zehuan Yuan
159
36
0
10 Nov 2021
LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Yizhou Zhao
Kaixiang Lin
Zhiwei Jia
Qiaozi Gao
Govind Thattai
Jesse Thomason
Gaurav Sukhatme
3DVLM&Ro
101
18
0
10 Nov 2021
History Aware Multimodal Transformer for Vision-and-Language Navigation
History Aware Multimodal Transformer for Vision-and-Language Navigation
Shizhe Chen
Pierre-Louis Guhur
Cordelia Schmid
Ivan Laptev
LM&Ro
267
300
0
25 Oct 2021
FILM: Following Instructions in Language with Modular Methods
FILM: Following Instructions in Language with Modular MethodsInternational Conference on Learning Representations (ICLR), 2021
So Yeon Min
Devendra Singh Chaplot
Pradeep Ravikumar
Yonatan Bisk
Ruslan Salakhutdinov
LM&Ro
506
181
0
12 Oct 2021
Skill Induction and Planning with Latent Language
Skill Induction and Planning with Latent Language
Pratyusha Sharma
Antonio Torralba
Jacob Andreas
LM&Ro
452
121
0
04 Oct 2021
TEACh: Task-driven Embodied Agents that Chat
TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar
Jesse Thomason
Ayush Shrivastava
P. Lange
Anjali Narayan-Chen
Spandana Gella
Robinson Piramithu
Gokhan Tur
Dilek Z. Hakkani-Tür
LM&Ro
752
229
0
01 Oct 2021
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual
  Task Completion
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion
Alessandro Suglia
Qiaozi Gao
Jesse Thomason
Govind Thattai
Gaurav Sukhatme
LM&Ro
228
84
0
10 Aug 2021
A Persistent Spatial Semantic Representation for High-level Natural
  Language Instruction Execution
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution
Valts Blukis
Chris Paxton
Dieter Fox
Animesh Garg
Yoav Artzi
LM&Ro
467
153
0
12 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Attention Bottlenecks for Multimodal FusionNeural Information Processing Systems (NeurIPS), 2021
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
528
678
0
30 Jun 2021
Previous
123