ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.06453
  4. Cited By
Episodic Transformer for Vision-and-Language Navigation

Episodic Transformer for Vision-and-Language Navigation

13 May 2021
Alexander Pashevich
Cordelia Schmid
Chen Sun
    LM&Ro
ArXivPDFHTML

Papers citing "Episodic Transformer for Vision-and-Language Navigation"

50 / 139 papers shown
Title
VISTA: Generative Visual Imagination for Vision-and-Language Navigation
VISTA: Generative Visual Imagination for Vision-and-Language Navigation
Yanjia Huang
M. Wu
Renjie Li
Zhengzhong Tu
LM&Ro
36
0
0
09 May 2025
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
Lik Hang Kenny Wong
Xueyang Kang
Kaixin Bai
Jianwei Zhang
54
0
0
01 May 2025
LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps
LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps
Yihao Wang
Raphael Memmesheimer
Sven Behnke
LM&Ro
53
0
0
15 Mar 2025
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
Siyin Wang
Zhaoye Fei
Qinyuan Cheng
S. Zhang
Panpan Cai
Jinlan Fu
Xipeng Qiu
48
1
0
13 Mar 2025
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Yue Zhang
Ziqiao Ma
Jialu Li
Yanyuan Qiao
Zun Wang
J. Chai
Qi Wu
Mohit Bansal
Parisa Kordjamshidi
LRM
51
18
0
31 Dec 2024
Referencing Where to Focus: Improving VisualGrounding with Referential
  Query
Referencing Where to Focus: Improving VisualGrounding with Referential Query
Yabing Wang
Zhuotao Tian
Q. Guo
Zheng Qin
Sanping Zhou
Ming Yang
Le Wang
ObjD
39
1
0
26 Dec 2024
Vision-Language Navigation with Energy-Based Policy
Vision-Language Navigation with Energy-Based Policy
Rui Liu
Wenguan Wang
Y. Yang
32
3
0
18 Oct 2024
EPO: Hierarchical LLM Agents with Environment Preference Optimization
EPO: Hierarchical LLM Agents with Environment Preference Optimization
Qi Zhao
Haotian Fu
Chen Sun
G. Konidaris
20
8
0
28 Aug 2024
Towards Coarse-grained Visual Language Navigation Task Planning Enhanced
  by Event Knowledge Graph
Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph
Zhao Kaichen
Song Yaoxian
Zhao Haiquan
Liu Haoyu
Li Tiefeng
Li Zhixu
39
0
0
05 Aug 2024
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic
  Environments
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
Taewoong Kim
Cheolhong Min
Byeonghwi Kim
Jinyeon Kim
Wonje Jeung
Jonghyun Choi
LM&Ro
34
4
0
26 Jul 2024
HAPFI: History-Aware Planning based on Fused Information
HAPFI: History-Aware Planning based on Fused Information
Sujin Jeon
Suyeon Shin
Byoung-Tak Zhang
29
0
0
23 Jul 2024
DISCO: Embodied Navigation and Interaction via Differentiable Scene
  Semantics and Dual-level Control
DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control
Xinyu Xu
Shengcheng Luo
Yanchao Yang
Yong-Lu Li
Cewu Lu
LM&Ro
33
1
0
20 Jul 2024
Human-Aware Vision-and-Language Navigation: Bridging Simulation to
  Reality with Dynamic Human Interactions
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Minghan Li
Heng Li
Zhi-Qi Cheng
Yifei Dong
Yuxuan Zhou
Jun-Yan He
Qi Dai
Teruko Mitamura
Alexander G. Hauptmann
LM&Ro
35
4
0
27 Jun 2024
Human-centered In-building Embodied Delivery Benchmark
Human-centered In-building Embodied Delivery Benchmark
Zhuoqun Xu
Yang Liu
Xiaoqi Li
Jiyao Zhang
Hao Dong
40
0
0
25 Jun 2024
ET tu, CLIP? Addressing Common Object Errors for Unseen Environments
ET tu, CLIP? Addressing Common Object Errors for Unseen Environments
Ye Won Byun
Cathy Jiao
Shahriar Noroozizadeh
Jimin Sun
Rosa Vitiello
VLM
27
1
0
25 Jun 2024
VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought
VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought
Gabriel H. Sarch
Lawrence Jang
Michael J. Tarr
William W. Cohen
Kenneth Marino
Katerina Fragkiadaki
LLMAG
38
0
0
20 Jun 2024
Embodied Instruction Following in Unknown Environments
Embodied Instruction Following in Unknown Environments
Zhenyu Wu
Ziwei Wang
Xiuwei Xu
Jiwen Lu
Haibin Yan
LM&Ro
22
4
0
17 Jun 2024
Augmented Commonsense Knowledge for Remote Object Grounding
Augmented Commonsense Knowledge for Remote Object Grounding
Bahram Mohammadi
Yicong Hong
Yuankai Qi
Qi Wu
Shirui Pan
J. Shi
33
7
0
03 Jun 2024
Transformers for Image-Goal Navigation
Transformers for Image-Goal Navigation
Nikhilanj Pelluri
ViT
30
0
0
23 May 2024
From CNNs to Transformers in Multimodal Human Action Recognition: A
  Survey
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
35
9
0
22 May 2024
HELPER-X: A Unified Instructable Embodied Agent to Tackle Four
  Interactive Vision-Language Domains with Memory-Augmented Language Models
HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models
Gabriel H. Sarch
Sahil Somani
Raghav Kapoor
Michael J. Tarr
Katerina Fragkiadaki
LM&Ro
LLMAG
29
3
0
29 Apr 2024
A review of deep learning-based information fusion techniques for
  multimodal medical image classification
A review of deep learning-based information fusion techniques for multimodal medical image classification
Yi-Hsuan Li
Mostafa EL HABIB DAHO
Pierre-Henri Conze
Rachid Zeghlache
Hugo Le Boité
R. Tadayoni
B. Cochener
M. Lamard
G. Quellec
25
31
0
23 Apr 2024
Socratic Planner: Self-QA-Based Zero-Shot Planning for Embodied Instruction Following
Socratic Planner: Self-QA-Based Zero-Shot Planning for Embodied Instruction Following
Suyeon Shin
Sujin Jeon
Junghyun Kim
Gi-Cheon Kang
Byoung-Tak Zhang
LLMAG
34
1
0
21 Apr 2024
Lookahead Exploration with Neural Radiance Representation for Continuous
  Vision-Language Navigation
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
Zihan Wang
Xiangyang Li
Jiahao Yang
Yeqi Liu
Junjie Hu
Ming Jiang
Shuqiang Jiang
42
15
0
02 Apr 2024
Temporal-Spatial Object Relations Modeling for Vision-and-Language
  Navigation
Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation
Bowen Huang
Yanwei Zheng
Chuanlin Lan
Xinpeng Zhao
Yifei Zou
Dongxiao Yu
36
0
0
23 Mar 2024
Volumetric Environment Representation for Vision-Language Navigation
Volumetric Environment Representation for Vision-Language Navigation
Rui Liu
Wenguan Wang
Yi Yang
32
25
0
21 Mar 2024
Hierarchical Spatial Proximity Reasoning for Vision-and-Language
  Navigation
Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation
Ming Xu
Zilong Xie
33
2
0
18 Mar 2024
Scene-LLM: Extending Language Model for 3D Visual Understanding and
  Reasoning
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
Rao Fu
Jingyu Liu
Xilun Chen
Yixin Nie
Wenhan Xiong
LM&Ro
LRM
47
48
0
18 Mar 2024
Online Continual Learning For Interactive Instruction Following Agents
Online Continual Learning For Interactive Instruction Following Agents
Byeonghwi Kim
Minhyuk Seo
Jonghyun Choi
CLL
LM&Ro
63
7
0
12 Mar 2024
OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied
  Instruction Following
OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following
Haochen Shi
Zhiyuan Sun
Xingdi Yuan
Marc-Alexandre Côté
Bang Liu
LLMAG
27
10
0
05 Mar 2024
MemoNav: Working Memory Model for Visual Navigation
MemoNav: Working Memory Model for Visual Navigation
Hongxin Li
Zeyu Wang
Xueke Yang
Yu-Ren Yang
Shuqi Mei
Zhaoxiang Zhang
31
5
0
29 Feb 2024
Language-guided Skill Learning with Temporal Variational Inference
Language-guided Skill Learning with Temporal Variational Inference
Haotian Fu
Pratyusha Sharma
Elias Stengel-Eskin
G. Konidaris
Nicolas Le Roux
Marc-Alexandre Côté
Xingdi Yuan
33
7
0
26 Feb 2024
Learning Communication Policies for Different Follower Behaviors in a
  Collaborative Reference Game
Learning Communication Policies for Different Follower Behaviors in a Collaborative Reference Game
P. Sadler
Sherzod Hakimov
David Schlangen
21
1
0
07 Feb 2024
Multi-Object Navigation in real environments using hybrid policies
Multi-Object Navigation in real environments using hybrid policies
Assem Sadek
G. Bono
Boris Chidlovskii
A. Baskurt
Christian Wolf
45
5
0
24 Jan 2024
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile
  Devices
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo-Wen Zhang
Xiaolin Wei
Chunhua Shen
MLLM
31
33
0
28 Dec 2023
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
Guanxing Lu
Ziwei Wang
Changliu Liu
Jiwen Lu
Yansong Tang
LRM
25
8
0
12 Dec 2023
Planning as In-Painting: A Diffusion-Based Embodied Task Planning
  Framework for Environments under Uncertainty
Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty
Cheng-Fu Yang
Haoyang Xu
Te-Lin Wu
Xiaofeng Gao
Kai-Wei Chang
Feng Gao
DiffM
25
8
0
02 Dec 2023
RoboGPT: an intelligent agent of making embodied long-term decisions for
  daily instruction tasks
RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
Yaran Chen
Wenbo Cui
Yuanwen Chen
Mining Tan
Xinyao Zhang
Dong Zhao
He Wang
LM&Ro
LLMAG
31
0
0
27 Nov 2023
Interaction is all You Need? A Study of Robots Ability to Understand and
  Execute
Interaction is all You Need? A Study of Robots Ability to Understand and Execute
Kushal Koshti
Nidhir Bhavsar
45
1
0
13 Nov 2023
DialMAT: Dialogue-Enabled Transformer with Moment-Based Adversarial
  Training
DialMAT: Dialogue-Enabled Transformer with Moment-Based Adversarial Training
Kanta Kaneda
Ryosuke Korekata
Yuiga Wada
Shunya Nagashima
Motonari Kambara
Yui Iioka
Haruka Matsuo
Yuto Imai
T. Nishimura
K. Sugiura
35
0
0
12 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task
  Completion
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
24
7
0
07 Nov 2023
Emergence of Abstract State Representations in Embodied Sequence
  Modeling
Emergence of Abstract State Representations in Embodied Sequence Modeling
Tian Yun
Zilai Zeng
Kunal Handa
Ashish V. Thapliyal
Bo Pang
Ellie Pavlick
Chen Sun
LM&Ro
24
7
0
03 Nov 2023
tagE: Enabling an Embodied Agent to Understand Human Instructions
tagE: Enabling an Embodied Agent to Understand Human Instructions
Chayan Sarkar
Avik Mitra
Pradip Pramanick
Tapas Nayak
LM&Ro
36
1
0
24 Oct 2023
Open-Ended Instructable Embodied Agents with Memory-Augmented Large
  Language Models
Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Gabriel H. Sarch
Yue Wu
Michael J. Tarr
Katerina Fragkiadaki
LM&Ro
LLMAG
19
19
0
23 Oct 2023
LACMA: Language-Aligning Contrastive Learning with Meta-Actions for
  Embodied Instruction Following
LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Cheng Yang
Yen-Chun Chen
Jianwei Yang
Xiyang Dai
Lu Yuan
Yu-Chiang Frank Wang
Kai-Wei Chang
LM&Ro
12
9
0
18 Oct 2023
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large
  Language Model Guidance
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Jesse Zhang
Jiahui Zhang
Karl Pertsch
Ziyi Liu
Xiang Ren
Minsuk Chang
Shao-Hua Sun
Joseph J. Lim
LLMAG
LM&Ro
97
57
0
16 Oct 2023
LangNav: Language as a Perceptual Representation for Navigation
LangNav: Language as a Perceptual Representation for Navigation
Bowen Pan
Rameswar Panda
SouYoung Jin
Rogerio Feris
Aude Oliva
Phillip Isola
Yoon Kim
LM&Ro
28
18
0
11 Oct 2023
End-to-End (Instance)-Image Goal Navigation through Correspondence as an
  Emergent Phenomenon
End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon
G. Bono
L. Antsfeld
Boris Chidlovskii
Zhi Zheng
Christian Wolf
3DV
26
9
0
28 Sep 2023
Hierarchical Imitation Learning for Stochastic Environments
Hierarchical Imitation Learning for Stochastic Environments
Maximilian Igl
Punit Shah
Paul Mougin
S. Srinivasan
Tarun Gupta
Brandyn White
K. Shiarlis
Shimon Whiteson
OOD
14
2
0
25 Sep 2023
Discuss Before Moving: Visual Language Navigation via Multi-expert
  Discussions
Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions
Yuxing Long
Xiaoqi Li
Wenzhe Cai
Hao Dong
LLMAG
LM&Ro
19
43
0
20 Sep 2023
123
Next