ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.11684
  4. Cited By
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
v1v2 (latest)

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

IEEE International Conference on Robotics and Automation (ICRA), 2019
25 December 2019
Chuang Gan
Yiwei Zhang
Jiajun Wu
Boqing Gong
J. Tenenbaum
ArXiv (abs)PDFHTML

Papers citing "Look, Listen, and Act: Towards Audio-Visual Embodied Navigation"

50 / 77 papers shown
Title
Embodied Navigation with Auxiliary Task of Action Description Prediction
Embodied Navigation with Auxiliary Task of Action Description Prediction
Haru Kondoh
Asako Kanezaki
140
0
0
21 Oct 2025
Audio-Guided Visual Perception for Audio-Visual Navigation
Audio-Guided Visual Perception for Audio-Visual Navigation
Yi Wang
Yinfeng Yu
Fuchun Sun
Liejun Wang
Wendong Zheng
85
0
0
13 Oct 2025
Iterative Residual Cross-Attention Mechanism: An Integrated Approach for Audio-Visual Navigation Tasks
Iterative Residual Cross-Attention Mechanism: An Integrated Approach for Audio-Visual Navigation Tasks
Hailong Zhang
Yinfeng Yu
Liejun Wang
Fuchun Sun
Wendong Zheng
68
0
0
30 Sep 2025
Dynamic Multi-Target Fusion for Efficient Audio-Visual Navigation
Dynamic Multi-Target Fusion for Efficient Audio-Visual Navigation
Yinfeng Yu
Hailong Zhang
Meiling Zhu
56
0
0
23 Sep 2025
Advancing Audio-Visual Navigation Through Multi-Agent Collaboration in 3D Environments
Advancing Audio-Visual Navigation Through Multi-Agent Collaboration in 3D Environments
Hailong Zhang
Yinfeng Yu
Liejun Wang
Fuchun Sun
Wendong Zheng
92
0
0
21 Sep 2025
The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio
The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio
Renhao Wang
Haoran Geng
Tingle Li
Feishi Wang
Gopala Anumanchipalli
Trevor Darrell
Boyi Li
Pieter Abbeel
Jitendra Malik
Alexei A. Efros
VGen
202
1
0
03 Jul 2025
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Derong Jin
Ruohan Gao
299
2
0
30 Apr 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&RoLRM
321
1
0
22 Apr 2025
Hearing Anywhere in Any Environment
Hearing Anywhere in Any EnvironmentComputer Vision and Pattern Recognition (CVPR), 2025
Xiulong Liu
Anurag Kumar
P. Calamia
Sebastia V. Amengual
Calvin Murdock
Ishwarya Ananthabhotla
Philip Robinson
Eli Shlizerman
V. Ithapu
Ruohan Gao
258
6
0
14 Apr 2025
AI-Gadget Kit: Integrating Swarm User Interfaces with LLM-driven Agents
  for Rich Tabletop Game Applications
AI-Gadget Kit: Integrating Swarm User Interfaces with LLM-driven Agents for Rich Tabletop Game Applications
Yijie Guo
Zhenhan Huang
Ruhan Wang
Zhihao Yao
Tianyu Yu
Zhiling Xu
Xinyu Zhao
Xueqing Li
Haipeng Mi
146
4
0
24 Jul 2024
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Amandine Brunetto
Sascha Hornauer
Fabien Moutarde
401
9
0
28 May 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya Zhang
Yanfeng Wang
315
25
0
17 Mar 2024
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and
  Audio
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and AudioNeural Information Processing Systems (NeurIPS), 2023
Xudong Xu
Dejan Marković
Jacob Sandakly
Todd Keebler
Steven Krenn
Alexander Richard
125
8
0
01 Nov 2023
Find What You Want: Learning Demand-conditioned Object Attribute Space
  for Demand-driven Navigation
Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven NavigationNeural Information Processing Systems (NeurIPS), 2023
Hongchen Wang
Andy Guan Hong Chen
Xiaoqi Li
Mingdong Wu
Hao Dong
371
24
0
15 Sep 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Physics-Driven Diffusion Models for Impact Sound Synthesis from VideosComputer Vision and Pattern Recognition (CVPR), 2023
Kun Su
Kaizhi Qian
Eli Shlizerman
Antonio Torralba
Chuang Gan
VGenAI4CE
283
29
0
29 Mar 2023
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Chat2Map: Efficient Scene Mapping from Multi-Ego ConversationsComputer Vision and Pattern Recognition (CVPR), 2023
Sagnik Majumder
Hao Jiang
Pierre Moulon
E. Henderson
P. Calamia
Kristen Grauman
V. Ithapu
EgoV
273
10
0
04 Jan 2023
On Realization of Intelligent Decision-Making in the Real World: A
  Foundation Decision Model Perspective
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Ying Wen
Bo Liu
M. Zhou
Shufang Hou
Zhe Cao
Chenyang Le
Jingxiao Chen
Zheng Tian
Weinan Zhang
Jun Wang
AI4CE
195
12
0
24 Dec 2022
Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied
  Navigation
Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied Navigation
Gyan Tatiya
Jonathan M Francis
Luca Bondi
Ingrid Navarro
Eric Nyberg
Jivko Sinapov
Jean Oh
135
10
0
21 Dec 2022
A General Purpose Supervisory Signal for Embodied Agents
A General Purpose Supervisory Signal for Embodied Agents
Kunal Pratap Singh
Jordi Salvador
Luca Weihs
Aniruddha Kembhavi
SSL
216
4
0
01 Dec 2022
Ask4Help: Learning to Leverage an Expert for Embodied Tasks
Ask4Help: Learning to Leverage an Expert for Embodied TasksNeural Information Processing Systems (NeurIPS), 2022
Kunal Pratap Singh
Luca Weihs
Alvaro Herrasti
Jonghyun Choi
Aniruddha Kemhavi
Roozbeh Mottaghi
212
27
0
18 Nov 2022
HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
HUMANISE: Language-conditioned Human Motion Generation in 3D ScenesNeural Information Processing Systems (NeurIPS), 2022
Zan Wang
Yixin Chen
Tengyu Liu
Yixin Zhu
Wei Liang
Siyuan Huang
204
164
0
18 Oct 2022
AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
AVLEN: Audio-Visual-Language Embodied Navigation in 3D EnvironmentsNeural Information Processing Systems (NeurIPS), 2022
Sudipta Paul
Amit K. Roy-Chowdhury
A. Cherian
177
32
0
14 Oct 2022
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
  Navigation
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language NavigationNeural Information Processing Systems (NeurIPS), 2022
Peihao Chen
Dongyu Ji
Kun-Li Channing Lin
Runhao Zeng
Thomas H. Li
Zhuliang Yu
Chuang Gan
SSL
211
91
0
14 Oct 2022
Learning Active Camera for Multi-Object Navigation
Learning Active Camera for Multi-Object NavigationNeural Information Processing Systems (NeurIPS), 2022
Peihao Chen
Dongyu Ji
Kun-Li Channing Lin
Weiwen Hu
Wenbing Huang
Thomas H. Li
Ming Tan
Chuang Gan
221
33
0
14 Oct 2022
Retrospectives on the Embodied AI Workshop
Retrospectives on the Embodied AI Workshop
Matt Deitke
Dhruv Batra
Yonatan Bisk
Tommaso Campari
Angel X. Chang
...
Jesse Thomason
Alexander Toshev
Joanne Truong
Luca Weihs
Jiajun Wu
LM&Ro
361
53
0
13 Oct 2022
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio
  Visual Event Localization
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event LocalizationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tanvir Mahmud
Diana Marculescu
CLIP
187
40
0
11 Oct 2022
Pay Self-Attention to Audio-Visual Navigation
Pay Self-Attention to Audio-Visual NavigationBritish Machine Vision Conference (BMVC), 2022
Yinfeng Yu
Lele Cao
Gang Hua
Xiaohong Liu
Liejun Wang
299
12
0
04 Oct 2022
Anticipating the Unseen Discrepancy for Vision and Language Navigation
Anticipating the Unseen Discrepancy for Vision and Language Navigation
Yujie Lu
Huiliang Zhang
Ping Nie
Weixi Feng
Wenda Xu
Xinze Wang
William Yang Wang
228
2
0
10 Sep 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
268
69
0
20 Aug 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides
  Representations and Explorations
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and ExplorationsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
164
10
0
04 Aug 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Finding Fallen Objects Via Asynchronous Audio-Visual IntegrationComputer Vision and Pattern Recognition (CVPR), 2022
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
263
20
0
07 Jul 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningNeural Information Processing Systems (NeurIPS), 2022
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
278
114
0
16 Jun 2022
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Matt Deitke
Eli VanderBilt
Alvaro Herrasti
Luca Weihs
Jordi Salvador
...
Winson Han
Eric Kolve
Ali Farhadi
Aniruddha Kembhavi
Roozbeh Mottaghi
LM&Ro
310
364
0
14 Jun 2022
Few-Shot Audio-Visual Learning of Environment Acoustics
Few-Shot Audio-Visual Learning of Environment AcousticsNeural Information Processing Systems (NeurIPS), 2022
Sagnik Majumder
Changan Chen
Ziad Al-Halah
Kristen Grauman
252
67
0
08 Jun 2022
Towards Generalisable Audio Representations for Audio-Visual Navigation
Towards Generalisable Audio Representations for Audio-Visual Navigation
Shunqi Mao
Chaoyi Zhang
Heng Wang
Weidong (Tom) Cai
140
1
0
01 Jun 2022
Learning Neural Acoustic Fields
Learning Neural Acoustic FieldsNeural Information Processing Systems (NeurIPS), 2022
Andrew F. Luo
Yilun Du
Michael J. Tarr
J. Tenenbaum
Antonio Torralba
Chuang Gan
AI4CE
286
109
0
04 Apr 2022
Sound Adversarial Audio-Visual Navigation
Sound Adversarial Audio-Visual NavigationInternational Conference on Learning Representations (ICLR), 2022
Yinfeng Yu
Wenbing Huang
Gang Hua
Changan Chen
Yikai Wang
Xiaohong Liu
AAML
176
39
0
22 Feb 2022
Visual Acoustic Matching
Visual Acoustic MatchingComputer Vision and Pattern Recognition (CVPR), 2022
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
300
65
0
14 Feb 2022
Zero Experience Required: Plug & Play Modular Transfer Learning for
  Semantic Visual Navigation
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual NavigationComputer Vision and Pattern Recognition (CVPR), 2022
Ziad Al-Halah
Santhosh Kumar Ramakrishnan
Kristen Grauman
VLM
272
105
0
05 Feb 2022
Active Audio-Visual Separation of Dynamic Sound Sources
Active Audio-Visual Separation of Dynamic Sound SourcesEuropean Conference on Computer Vision (ECCV), 2022
Sagnik Majumder
Kristen Grauman
304
22
0
02 Feb 2022
PONI: Potential Functions for ObjectGoal Navigation with
  Interaction-free Learning
PONI: Potential Functions for ObjectGoal Navigation with Interaction-free LearningComputer Vision and Pattern Recognition (CVPR), 2022
Santhosh Kumar Ramakrishnan
Devendra Singh Chaplot
Ziad Al-Halah
Jitendra Malik
Kristen Grauman
403
202
0
25 Jan 2022
Symmetry-aware Neural Architecture for Embodied Visual Navigation
Symmetry-aware Neural Architecture for Embodied Visual Navigation
Shuang Liu
Takayuki Okatani
183
2
0
17 Dec 2021
Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped
  Environments with Moving Sounds
Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving SoundsIEEE Robotics and Automation Letters (RA-L), 2021
Abdelrahman Younes
Daniel Honerkamp
Tim Welschehold
Abhinav Valada
398
46
0
29 Nov 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from VideoBritish Machine Vision Conference (BMVC), 2021
Rishabh Garg
Ruohan Gao
Kristen Grauman
156
31
0
21 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Structure from Silence: Learning Scene Structure from Ambient SoundConference on Robot Learning (CoRL), 2021
Ziyang Chen
Xixi Hu
Andrew Owens
153
30
0
10 Nov 2021
Space-Time Memory Network for Sounding Object Localization in Videos
Space-Time Memory Network for Sounding Object Localization in VideosBritish Machine Vision Conference (BMVC), 2021
Sizhe Li
Yapeng Tian
Chenliang Xu
123
12
0
10 Nov 2021
Audio-Visual Grounding Referring Expression for Robotic Manipulation
Audio-Visual Grounding Referring Expression for Robotic ManipulationIEEE International Conference on Robotics and Automation (ICRA), 2021
Yefei Wang
Kaili Wang
Yi Wang
Di Guo
Huaping Liu
F. Sun
148
16
0
22 Sep 2021
Multi-Agent Embodied Visual Semantic Navigation with Scene Prior
  Knowledge
Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge
Xinzhu Liu
Di Guo
Huaping Liu
F. Sun
EgoV
207
30
0
20 Sep 2021
Communicative Learning with Natural Gestures for Embodied Navigation
  Agents with Human-in-the-Scene
Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-SceneIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2021
Qi Wu
Cheng-Ju Wu
Yixin Zhu
Jungseock Joo
225
18
0
05 Aug 2021
Improving Multi-Modal Learning with Uni-Modal Teachers
Improving Multi-Modal Learning with Uni-Modal Teachers
Chenzhuang Du
Tingle Li
Yichen Liu
Zixin Wen
Tianyu Hua
Yue Wang
Hang Zhao
107
69
0
21 Jun 2021
12
Next