Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1912.11684
Cited By
v1
v2 (latest)
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
IEEE International Conference on Robotics and Automation (ICRA), 2019
25 December 2019
Chuang Gan
Yiwei Zhang
Jiajun Wu
Boqing Gong
J. Tenenbaum
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Look, Listen, and Act: Towards Audio-Visual Embodied Navigation"
50 / 75 papers shown
Title
Embodied Navigation with Auxiliary Task of Action Description Prediction
Haru Kondoh
Asako Kanezaki
76
0
0
21 Oct 2025
Audio-Guided Visual Perception for Audio-Visual Navigation
Yi Wang
Yinfeng Yu
Fuchun Sun
Liejun Wang
Wendong Zheng
65
0
0
13 Oct 2025
Iterative Residual Cross-Attention Mechanism: An Integrated Approach for Audio-Visual Navigation Tasks
Hailong Zhang
Yinfeng Yu
Liejun Wang
Fuchun Sun
Wendong Zheng
52
0
0
30 Sep 2025
Dynamic Multi-Target Fusion for Efficient Audio-Visual Navigation
Yinfeng Yu
Hailong Zhang
Meiling Zhu
48
0
0
23 Sep 2025
Advancing Audio-Visual Navigation Through Multi-Agent Collaboration in 3D Environments
Hailong Zhang
Yinfeng Yu
Liejun Wang
Fuchun Sun
Wendong Zheng
72
0
0
21 Sep 2025
The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio
Renhao Wang
Haoran Geng
Tingle Li
Feishi Wang
Gopala Anumanchipalli
Trevor Darrell
Boyi Li
Pieter Abbeel
Jitendra Malik
Alexei A. Efros
VGen
150
0
0
03 Jul 2025
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Derong Jin
Ruohan Gao
239
2
0
30 Apr 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&Ro
LRM
245
0
0
22 Apr 2025
Hearing Anywhere in Any Environment
Computer Vision and Pattern Recognition (CVPR), 2025
Xiulong Liu
Anurag Kumar
P. Calamia
Sebastia V. Amengual
Calvin Murdock
Ishwarya Ananthabhotla
Philip Robinson
Eli Shlizerman
V. Ithapu
Ruohan Gao
218
6
0
14 Apr 2025
AI-Gadget Kit: Integrating Swarm User Interfaces with LLM-driven Agents for Rich Tabletop Game Applications
Yijie Guo
Zhenhan Huang
Ruhan Wang
Zhihao Yao
Tianyu Yu
Zhiling Xu
Xinyu Zhao
Xueqing Li
Haipeng Mi
114
4
0
24 Jul 2024
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Amandine Brunetto
Sascha Hornauer
Fabien Moutarde
389
9
0
28 May 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya Zhang
Yanfeng Wang
227
25
0
17 Mar 2024
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio
Neural Information Processing Systems (NeurIPS), 2023
Xudong Xu
Dejan Marković
Jacob Sandakly
Todd Keebler
Steven Krenn
Alexander Richard
113
8
0
01 Nov 2023
Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven Navigation
Neural Information Processing Systems (NeurIPS), 2023
Hongchen Wang
Andy Guan Hong Chen
Xiaoqi Li
Mingdong Wu
Hao Dong
283
23
0
15 Sep 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Kun Su
Kaizhi Qian
Eli Shlizerman
Antonio Torralba
Chuang Gan
VGen
AI4CE
203
27
0
29 Mar 2023
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Computer Vision and Pattern Recognition (CVPR), 2023
Sagnik Majumder
Hao Jiang
Pierre Moulon
E. Henderson
P. Calamia
Kristen Grauman
V. Ithapu
EgoV
229
10
0
04 Jan 2023
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Ying Wen
Bo Liu
M. Zhou
Shufang Hou
Zhe Cao
Chenyang Le
Jingxiao Chen
Zheng Tian
Weinan Zhang
Jun Wang
AI4CE
175
12
0
24 Dec 2022
Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied Navigation
Gyan Tatiya
Jonathan M Francis
Luca Bondi
Ingrid Navarro
Eric Nyberg
Jivko Sinapov
Jean Oh
115
10
0
21 Dec 2022
A General Purpose Supervisory Signal for Embodied Agents
Kunal Pratap Singh
Jordi Salvador
Luca Weihs
Aniruddha Kembhavi
SSL
192
4
0
01 Dec 2022
Ask4Help: Learning to Leverage an Expert for Embodied Tasks
Neural Information Processing Systems (NeurIPS), 2022
Kunal Pratap Singh
Luca Weihs
Alvaro Herrasti
Jonghyun Choi
Aniruddha Kemhavi
Roozbeh Mottaghi
188
27
0
18 Nov 2022
HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
Neural Information Processing Systems (NeurIPS), 2022
Zan Wang
Yixin Chen
Tengyu Liu
Yixin Zhu
Wei Liang
Siyuan Huang
184
164
0
18 Oct 2022
AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
Neural Information Processing Systems (NeurIPS), 2022
Sudipta Paul
Amit K. Roy-Chowdhury
A. Cherian
161
32
0
14 Oct 2022
Learning Active Camera for Multi-Object Navigation
Neural Information Processing Systems (NeurIPS), 2022
Peihao Chen
Dongyu Ji
Kun-Li Channing Lin
Weiwen Hu
Wenbing Huang
Thomas H. Li
Ming Tan
Chuang Gan
181
32
0
14 Oct 2022
Retrospectives on the Embodied AI Workshop
Matt Deitke
Dhruv Batra
Yonatan Bisk
Tommaso Campari
Angel X. Chang
...
Jesse Thomason
Alexander Toshev
Joanne Truong
Luca Weihs
Jiajun Wu
LM&Ro
293
62
0
13 Oct 2022
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tanvir Mahmud
Diana Marculescu
CLIP
143
39
0
11 Oct 2022
Pay Self-Attention to Audio-Visual Navigation
British Machine Vision Conference (BMVC), 2022
Yinfeng Yu
Lele Cao
Gang Hua
Xiaohong Liu
Liejun Wang
255
12
0
04 Oct 2022
Anticipating the Unseen Discrepancy for Vision and Language Navigation
Yujie Lu
Huiliang Zhang
Ping Nie
Weixi Feng
Wenda Xu
Xinze Wang
William Yang Wang
176
2
0
10 Sep 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
248
66
0
20 Aug 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
132
10
0
04 Aug 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Computer Vision and Pattern Recognition (CVPR), 2022
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
235
20
0
07 Jul 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Neural Information Processing Systems (NeurIPS), 2022
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
260
113
0
16 Jun 2022
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Matt Deitke
Eli VanderBilt
Alvaro Herrasti
Luca Weihs
Jordi Salvador
...
Winson Han
Eric Kolve
Ali Farhadi
Aniruddha Kembhavi
Roozbeh Mottaghi
LM&Ro
278
351
0
14 Jun 2022
Few-Shot Audio-Visual Learning of Environment Acoustics
Neural Information Processing Systems (NeurIPS), 2022
Sagnik Majumder
Changan Chen
Ziad Al-Halah
Kristen Grauman
208
67
0
08 Jun 2022
Towards Generalisable Audio Representations for Audio-Visual Navigation
Shunqi Mao
Chaoyi Zhang
Heng Wang
Weidong (Tom) Cai
136
1
0
01 Jun 2022
Learning Neural Acoustic Fields
Neural Information Processing Systems (NeurIPS), 2022
Andrew F. Luo
Yilun Du
Michael J. Tarr
J. Tenenbaum
Antonio Torralba
Chuang Gan
AI4CE
234
108
0
04 Apr 2022
Sound Adversarial Audio-Visual Navigation
International Conference on Learning Representations (ICLR), 2022
Yinfeng Yu
Wenbing Huang
Gang Hua
Changan Chen
Yikai Wang
Xiaohong Liu
AAML
160
39
0
22 Feb 2022
Visual Acoustic Matching
Computer Vision and Pattern Recognition (CVPR), 2022
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
256
65
0
14 Feb 2022
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation
Computer Vision and Pattern Recognition (CVPR), 2022
Ziad Al-Halah
Santhosh Kumar Ramakrishnan
Kristen Grauman
VLM
236
103
0
05 Feb 2022
Active Audio-Visual Separation of Dynamic Sound Sources
European Conference on Computer Vision (ECCV), 2022
Sagnik Majumder
Kristen Grauman
236
22
0
02 Feb 2022
PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Santhosh Kumar Ramakrishnan
Devendra Singh Chaplot
Ziad Al-Halah
Jitendra Malik
Kristen Grauman
315
197
0
25 Jan 2022
Symmetry-aware Neural Architecture for Embodied Visual Navigation
Shuang Liu
Takayuki Okatani
159
2
0
17 Dec 2021
Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds
IEEE Robotics and Automation Letters (RA-L), 2021
Abdelrahman Younes
Daniel Honerkamp
Tim Welschehold
Abhinav Valada
286
46
0
29 Nov 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
British Machine Vision Conference (BMVC), 2021
Rishabh Garg
Ruohan Gao
Kristen Grauman
132
30
0
21 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Conference on Robot Learning (CoRL), 2021
Ziyang Chen
Xixi Hu
Andrew Owens
149
30
0
10 Nov 2021
Audio-Visual Grounding Referring Expression for Robotic Manipulation
IEEE International Conference on Robotics and Automation (ICRA), 2021
Yefei Wang
Kaili Wang
Yi Wang
Di Guo
Huaping Liu
F. Sun
136
16
0
22 Sep 2021
Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge
Xinzhu Liu
Di Guo
Huaping Liu
F. Sun
EgoV
155
30
0
20 Sep 2021
Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2021
Qi Wu
Cheng-Ju Wu
Yixin Zhu
Jungseock Joo
201
17
0
05 Aug 2021
Improving Multi-Modal Learning with Uni-Modal Teachers
Chenzhuang Du
Tingle Li
Yichen Liu
Zixin Wen
Tianyu Hua
Yue Wang
Hang Zhao
107
65
0
21 Jun 2021
Learning Audio-Visual Dereverberation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Changan Chen
Wei-Ju Sun
David Harwath
Kristen Grauman
171
35
0
14 Jun 2021
RobustNav: Towards Benchmarking Robustness in Embodied Navigation
IEEE International Conference on Computer Vision (ICCV), 2021
Prithvijit Chattopadhyay
Judy Hoffman
Roozbeh Mottaghi
Aniruddha Kembhavi
216
64
0
08 Jun 2021
1
2
Next