ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.11583
  4. Cited By
Semantic Audio-Visual Navigation
v1v2 (latest)

Semantic Audio-Visual Navigation

Computer Vision and Pattern Recognition (CVPR), 2020
21 December 2020
Changan Chen
Ziad Al-Halah
Kristen Grauman
ArXiv (abs)PDFHTML

Papers citing "Semantic Audio-Visual Navigation"

50 / 68 papers shown
Embodied Navigation with Auxiliary Task of Action Description Prediction
Embodied Navigation with Auxiliary Task of Action Description Prediction
Haru Kondoh
Asako Kanezaki
184
2
0
21 Oct 2025
Audio-Guided Visual Perception for Audio-Visual Navigation
Audio-Guided Visual Perception for Audio-Visual Navigation
Yi Wang
Yinfeng Yu
Fuchun Sun
Liejun Wang
Wendong Zheng
156
0
0
13 Oct 2025
Audio-Guided Dynamic Modality Fusion with Stereo-Aware Attention for Audio-Visual Navigation
Audio-Guided Dynamic Modality Fusion with Stereo-Aware Attention for Audio-Visual Navigation
Jia Li
Yinfeng Yu
Liejun Wang
Fuchun Sun
Wendong Zheng
290
1
0
21 Sep 2025
Deep Learning for Personalized Binaural Audio Reproduction
Deep Learning for Personalized Binaural Audio Reproduction
Xikun Lu
Yunda Chen
Zehua Chen
Jie Wang
Mingxing Liu
Hongmei Hu
C. Zheng
Stefan Bleeck
Jinqiu Sang
264
2
0
30 Aug 2025
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
Siminfar Samakoush Galougah
Rishie Raj
Sanjoy Chowdhury
Sayan Nag
Ramani Duraiswami
259
4
0
10 Aug 2025
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
Mahnoor Fatima Saad
Ziad Al-Halah
VGen
132
2
0
04 Aug 2025
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
Sanjoy Chowdhury
Mohamed Elmoghany
Yohan Abeysinghe
Mahmoud Ahmed
Sayan Nag
Salman Khan
Mohamed Elhoseiny
Dinesh Manocha
515
7
0
08 Jun 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video ParsingComputer Vision and Pattern Recognition (CVPR), 2025
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François Germain
Michael Jeffrey Jones
Moitreya Chatterjee
246
1
0
14 May 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&RoLRM
430
1
0
22 Apr 2025
HomeEmergency -- Using Audio to Find and Respond to Emergencies in the Home
HomeEmergency -- Using Audio to Find and Respond to Emergencies in the HomeIEEE Robotics and Automation Letters (IEEE RA-L), 2025
James F. Mullen Jr
Dhruva Kumar
Xuewei Qi
R. Madhivanan
Arnie Sen
Dinesh Manocha
Richard Kim
359
0
0
01 Apr 2025
MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for
  Multi-object Demand-driven Navigation
MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-object Demand-driven NavigationNeural Information Processing Systems (NeurIPS), 2024
Hongcheng Wang
Peiqi Liu
Wenzhe Cai
Mingdong Wu
Zhengyu Qian
Hao Dong
394
7
0
04 Oct 2024
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Jie Yin
Andrew F. Luo
Yilun Du
A. Cherian
Tim K. Marks
Jonathan Le Roux
Chuang Gan
310
1
0
16 Jul 2024
SOAF: Scene Occlusion-aware Neural Acoustic Field
SOAF: Scene Occlusion-aware Neural Acoustic Field
Huiyu Gao
Jiahao Ma
David Ahmedt-Aristizabal
Chuong H. Nguyen
Miaomiao Liu
459
5
0
02 Jul 2024
Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive
  Acoustic Field Prediction
Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field PredictionIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024
Changan Chen
Jordi Ramos
Anshul Tomar
Kristen Grauman
317
11
0
05 May 2024
ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment
  Modeling
ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling
Arjun Somayazulu
Sagnik Majumder
Changan Chen
Kristen Grauman
269
2
0
24 Apr 2024
Leveraging Large Language Model-based Room-Object Relationships
  Knowledge for Enhancing Multimodal-Input Object Goal Navigation
Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal Navigation
Leyuan Sun
Asako Kanezaki
Guillaume Caron
Yusuke Yoshiyasu
LM&Ro
285
12
0
21 Mar 2024
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for
  Audio-Visual Source Localization
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Hu Su
Zhiqing Wang
Yuhao Zhao
Wei Zou
Siyang Sun
Yun Zheng
SSL
286
16
0
05 Mar 2024
Disentangled Counterfactual Learning for Physical Audiovisual
  Commonsense Reasoning
Disentangled Counterfactual Learning for Physical Audiovisual Commonsense ReasoningNeural Information Processing Systems (NeurIPS), 2023
Changsheng Lv
Shuai Zhang
Yapeng Tian
Mengshi Qi
Huadong Ma
CML
351
24
0
30 Oct 2023
Measuring Acoustics with Collaborative Multiple Agents
Measuring Acoustics with Collaborative Multiple AgentsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Yinfeng Yu
Changan Chen
Lele Cao
Fangkai Yang
Gang Hua
392
11
0
09 Oct 2023
XVO: Generalized Visual Odometry via Cross-Modal Self-Training
XVO: Generalized Visual Odometry via Cross-Modal Self-TrainingIEEE International Conference on Computer Vision (ICCV), 2023
Tohida Rehman
Ronit Mandal
Jimuyang Zhang
Debarshi Kumar Sanyal
SSL
454
27
0
28 Sep 2023
Find What You Want: Learning Demand-conditioned Object Attribute Space
  for Demand-driven Navigation
Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven NavigationNeural Information Processing Systems (NeurIPS), 2023
Hongchen Wang
Andy Guan Hong Chen
Xiaoqi Li
Mingdong Wu
Hao Dong
548
28
0
15 Sep 2023
AdVerb: Visually Guided Audio Dereverberation
AdVerb: Visually Guided Audio DereverberationIEEE International Conference on Computer Vision (ICCV), 2023
Sanjoy Chowdhury
Sreyan Ghosh
Subhrajyoti Dasgupta
Anton Ratnarajah
Utkarsh Tyagi
Tianyi Zhou
278
20
0
23 Aug 2023
Audio-Visual Class-Incremental Learning
Audio-Visual Class-Incremental LearningIEEE International Conference on Computer Vision (ICCV), 2023
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLLVLM
272
43
0
21 Aug 2023
Omnidirectional Information Gathering for Knowledge Transfer-based
  Audio-Visual Navigation
Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual NavigationIEEE International Conference on Computer Vision (ICCV), 2023
Jinyu Chen
Wenguan Wang
Siying Liu
Jiaming Song
Yi Yang
333
21
0
20 Aug 2023
Multi-goal Audio-visual Navigation using Sound Direction Map
Multi-goal Audio-visual Navigation using Sound Direction MapIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Haruo Kondoh
Asako Kanezaki
338
10
0
01 Aug 2023
Multi-Spectral Image Stitching via Spatial Graph Reasoning
Multi-Spectral Image Stitching via Spatial Graph ReasoningACM Multimedia (ACM MM), 2023
Zhiying Jiang
Zengxi Zhang
Jinyuan Liu
Xin-Yue Fan
Risheng Liu
191
12
0
31 Jul 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household
  Agents that See and Hear
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and HearIEEE International Conference on Robotics and Automation (ICRA), 2023
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
371
15
0
01 Jun 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event
  Parser
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event ParserNeural Information Processing Systems (NeurIPS), 2023
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
338
24
0
27 May 2023
Learning Semantic-Agnostic and Spatial-Aware Representation for
  Generalizable Visual-Audio Navigation
Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio NavigationIEEE Robotics and Automation Letters (RA-L), 2023
Hongchen Wang
Yuxuan Wang
Fangwei Zhong
Min-Yu Wu
Jianwei Zhang
Yizhou Wang
Hao Dong
503
10
0
21 Apr 2023
Sound Localization from Motion: Jointly Learning Sound Direction and
  Camera Rotation
Sound Localization from Motion: Jointly Learning Sound Direction and Camera RotationIEEE International Conference on Computer Vision (ICCV), 2023
Ziyang Chen
Shengyi Qian
Andrew Owens
333
21
0
20 Mar 2023
CASP-Net: Rethinking Video Saliency Prediction from an
  Audio-VisualConsistency Perceptual Perspective
CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual PerspectiveComputer Vision and Pattern Recognition (CVPR), 2023
Jun Xiong
Gang Wang
Peng Zhang
Wei Huang
Yufei Zha
Guangtao Zhai
190
23
0
11 Mar 2023
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
  Synthesis
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene SynthesisNeural Information Processing Systems (NeurIPS), 2023
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
VGen
437
63
0
04 Feb 2023
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Chat2Map: Efficient Scene Mapping from Multi-Ego ConversationsComputer Vision and Pattern Recognition (CVPR), 2023
Sagnik Majumder
Hao Jiang
Pierre Moulon
E. Henderson
P. Calamia
Kristen Grauman
V. Ithapu
EgoV
374
12
0
04 Jan 2023
On Transforming Reinforcement Learning by Transformer: The Development
  Trajectory
On Transforming Reinforcement Learning by Transformer: The Development TrajectoryIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Shengchao Hu
Li Shen
Ya Zhang
Yixin Chen
Dacheng Tao
OffRL
385
74
0
29 Dec 2022
Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied
  Navigation
Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied Navigation
Gyan Tatiya
Jonathan M Francis
Luca Bondi
Ingrid Navarro
Eric Nyberg
Jivko Sinapov
Jean Oh
183
10
0
21 Dec 2022
Towards Versatile Embodied Navigation
Towards Versatile Embodied NavigationNeural Information Processing Systems (NeurIPS), 2022
Hongru Wang
Wei Liang
Luc Van Gool
Wenguan Wang
LM&Ro
266
40
0
30 Oct 2022
AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
AVLEN: Audio-Visual-Language Embodied Navigation in 3D EnvironmentsNeural Information Processing Systems (NeurIPS), 2022
Sudipta Paul
Amit K. Roy-Chowdhury
A. Cherian
254
36
0
14 Oct 2022
Learning Active Camera for Multi-Object Navigation
Learning Active Camera for Multi-Object NavigationNeural Information Processing Systems (NeurIPS), 2022
Peihao Chen
Dongyu Ji
Kun-Li Channing Lin
Weiwen Hu
Wenbing Huang
Thomas H. Li
Ming Tan
Chuang Gan
305
36
0
14 Oct 2022
Retrospectives on the Embodied AI Workshop
Retrospectives on the Embodied AI Workshop
Matt Deitke
Dhruv Batra
Yonatan Bisk
Tommaso Campari
Angel X. Chang
...
Jesse Thomason
Alexander Toshev
Joanne Truong
Luca Weihs
Jiajun Wu
LM&Ro
416
53
0
13 Oct 2022
Pay Self-Attention to Audio-Visual Navigation
Pay Self-Attention to Audio-Visual NavigationBritish Machine Vision Conference (BMVC), 2022
Yinfeng Yu
Lele Cao
Gang Hua
Xiaohong Liu
Liejun Wang
365
17
0
04 Oct 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
332
76
0
20 Aug 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Finding Fallen Objects Via Asynchronous Audio-Visual IntegrationComputer Vision and Pattern Recognition (CVPR), 2022
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
358
20
0
07 Jul 2022
Beyond Visual Field of View: Perceiving 3D Environment with Echoes and
  Vision
Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision
Xiangjie Sui
Esa Rahtu
Hang Zhao
MDE
390
8
0
03 Jul 2022
What do navigation agents learn about their environment?
What do navigation agents learn about their environment?Computer Vision and Pattern Recognition (CVPR), 2022
Kshitij Dwivedi
Gemma Roig
Aniruddha Kembhavi
Roozbeh Mottaghi
214
13
0
17 Jun 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningNeural Information Processing Systems (NeurIPS), 2022
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
398
123
0
16 Jun 2022
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Matt Deitke
Eli VanderBilt
Alvaro Herrasti
Luca Weihs
Jordi Salvador
...
Winson Han
Eric Kolve
Ali Farhadi
Aniruddha Kembhavi
Roozbeh Mottaghi
LM&Ro
399
434
0
14 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
663
947
0
13 Jun 2022
Imagination-augmented Navigation Based on 2D Laser Sensor Observations
Imagination-augmented Navigation Based on 2D Laser Sensor Observations
Zhengcheng Shen
Linh Kästner
Magdalena Yordanova
Jens Lambrecht
195
1
0
12 Jun 2022
Human-Following and -guiding in Crowded Environments using Semantic
  Deep-Reinforcement-Learning for Mobile Service Robots
Human-Following and -guiding in Crowded Environments using Semantic Deep-Reinforcement-Learning for Mobile Service RobotsIEEE International Conference on Robotics and Automation (ICRA), 2022
Linh Kästner
Bassel Fatloun
Zhengcheng Shen
Daniel P Gawrisch
Jens Lambrecht
HAI
183
19
0
12 Jun 2022
Few-Shot Audio-Visual Learning of Environment Acoustics
Few-Shot Audio-Visual Learning of Environment AcousticsNeural Information Processing Systems (NeurIPS), 2022
Sagnik Majumder
Changan Chen
Ziad Al-Halah
Kristen Grauman
318
74
0
08 Jun 2022
12
Next
Page 1 of 2