ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.01616
  4. Cited By
VisualEchoes: Spatial Image Representation Learning through Echolocation
v1v2 (latest)

VisualEchoes: Spatial Image Representation Learning through Echolocation

European Conference on Computer Vision (ECCV), 2020
4 May 2020
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
    MDESSL
ArXiv (abs)PDFHTML

Papers citing "VisualEchoes: Spatial Image Representation Learning through Echolocation"

50 / 61 papers shown
Deep Learning for Personalized Binaural Audio Reproduction
Deep Learning for Personalized Binaural Audio Reproduction
Xikun Lu
Yunda Chen
Zehua Chen
Jie Wang
Mingxing Liu
Hongmei Hu
C. Zheng
Stefan Bleeck
Jinqiu Sang
264
2
0
30 Aug 2025
Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice
Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and VoiceIEEE International Conference on Automatic Face & Gesture Recognition (FG), 2024
Hugo Bohy
M. Tran
Kevin El Haddad
Thierry Dutoit
M. Soleymani
188
2
0
24 Aug 2025
Learning to Highlight Audio by Watching Movies
Learning to Highlight Audio by Watching MoviesComputer Vision and Pattern Recognition (CVPR), 2025
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
369
5
0
17 May 2025
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Derong Jin
Ruohan Gao
388
3
0
30 Apr 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&RoLRM
430
1
0
22 Apr 2025
Hearing Anywhere in Any Environment
Hearing Anywhere in Any EnvironmentComputer Vision and Pattern Recognition (CVPR), 2025
Xiulong Liu
Anurag Kumar
P. Calamia
Sebastia V. Amengual
Calvin Murdock
Ishwarya Ananthabhotla
Philip Robinson
Eli Shlizerman
V. Ithapu
Ruohan Gao
374
10
0
14 Apr 2025
AVS-Net: Audio-Visual Scale Net for Self-supervised Monocular Metric
  Depth Estimation
AVS-Net: Audio-Visual Scale Net for Self-supervised Monocular Metric Depth Estimation
Xiaohu Liu
Sascha Hornauer
Fabien Moutarde
Jialiang Lu
SSLMDE
385
1
0
02 Dec 2024
Estimating Indoor Scene Depth Maps from Ultrasonic Echoes
Estimating Indoor Scene Depth Maps from Ultrasonic EchoesInternational Conference on Information Photonics (ICIP), 2024
Junpei Honma
Akisato Kimura
Go Irie
MDE
295
1
0
05 Sep 2024
Spherical World-Locking for Audio-Visual Localization in Egocentric
  Videos
Spherical World-Locking for Audio-Visual Localization in Egocentric VideosEuropean Conference on Computer Vision (ECCV), 2024
Heeseung Yun
Ruohan Gao
Ishwarya Ananthabhotla
Anurag Kumar
Jacob Donley
Chao Li
Gunhee Kim
V. Ithapu
Calvin Murdock
241
7
0
09 Aug 2024
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Jie Yin
Andrew F. Luo
Yilun Du
A. Cherian
Tim K. Marks
Jonathan Le Roux
Chuang Gan
310
1
0
16 Jul 2024
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Amandine Brunetto
Sascha Hornauer
Fabien Moutarde
625
12
0
28 May 2024
EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air
  Sonar Images for Mobile Robotics
EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air Sonar Images for Mobile Robotics
Jan Steckel
W. Jansen
Nico Huebel
MDE
216
4
0
21 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
489
17
0
20 May 2024
SoundingActions: Learning How Actions Sound from Narrated Egocentric
  Videos
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen
Kumar Ashutosh
Rohit Girdhar
David Harwath
Kristen Grauman
EgoVSSL
291
12
0
08 Apr 2024
6DoF SELD: Sound Event Localization and Detection Using Microphones and
  Motion Tracking Sensors on self-motioning human
6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human
Masahiro Yasuda
Shoichiro Saito
Akira Nakayama
Noboru Harada
242
12
0
04 Mar 2024
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric
  Perspective
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
Wenqi Jia
Miao Liu
Hao Jiang
Ishwarya Ananthabhotla
James M. Rehg
V. Ithapu
Ruohan Gao
EgoV
300
17
0
20 Dec 2023
Segment Beyond View: Handling Partially Missing Modality for
  Audio-Visual Semantic Segmentation
Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic SegmentationAAAI Conference on Artificial Intelligence (AAAI), 2023
Renjie Wu
Hu Wang
Feras Dayoub
Hsiang-Ting Chen
287
11
0
14 Dec 2023
SoundCam: A Dataset for Finding Humans Using Room Acoustics
SoundCam: A Dataset for Finding Humans Using Room AcousticsNeural Information Processing Systems (NeurIPS), 2023
Mason Wang
Samuel Clarke
Jui-Hsien Wang
Ruohan Gao
Jiajun Wu
312
11
0
06 Nov 2023
Measuring Acoustics with Collaborative Multiple Agents
Measuring Acoustics with Collaborative Multiple AgentsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Yinfeng Yu
Changan Chen
Lele Cao
Fangkai Yang
Gang Hua
392
11
0
09 Oct 2023
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth
  Completion
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth CompletionInternational Journal of Computer Vision (IJCV), 2023
Zhiqiang Yan
Xiang Li
Le Hui
Ying Tai
Jun Yu Li
Jian Yang
VLM3DV
542
12
0
01 Sep 2023
AdVerb: Visually Guided Audio Dereverberation
AdVerb: Visually Guided Audio DereverberationIEEE International Conference on Computer Vision (ICCV), 2023
Sanjoy Chowdhury
Sreyan Ghosh
Subhrajyoti Dasgupta
Anton Ratnarajah
Utkarsh Tyagi
Tianyi Zhou
278
20
0
23 Aug 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric
  Videos
Learning Spatial Features from Audio-Visual Correspondence in Egocentric VideosComputer Vision and Pattern Recognition (CVPR), 2023
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSLEgoV
443
9
0
10 Jul 2023
RealImpact: A Dataset of Impact Sound Fields for Real Objects
RealImpact: A Dataset of Impact Sound Fields for Real ObjectsComputer Vision and Pattern Recognition (CVPR), 2023
Samuel Clarke
Ruohan Gao
Mason Wang
M. Rau
Julia Xu
Jui-Hsien Wang
Doug L. James
Jiajun Wu
249
13
0
16 Jun 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household
  Agents that See and Hear
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and HearIEEE International Conference on Robotics and Automation (ICRA), 2023
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
371
15
0
01 Jun 2023
Sound Localization from Motion: Jointly Learning Sound Direction and
  Camera Rotation
Sound Localization from Motion: Jointly Learning Sound Direction and Camera RotationIEEE International Conference on Computer Vision (ICCV), 2023
Ziyang Chen
Shengyi Qian
Andrew Owens
333
21
0
20 Mar 2023
The Audio-Visual BatVision Dataset for Research on Sight and Sound
The Audio-Visual BatVision Dataset for Research on Sight and SoundIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Amandine Brunetto
Sascha Hornauer
Stella X. Yu
Fabien Moutarde
362
7
0
13 Mar 2023
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Chat2Map: Efficient Scene Mapping from Multi-Ego ConversationsComputer Vision and Pattern Recognition (CVPR), 2023
Sagnik Majumder
Hao Jiang
Pierre Moulon
E. Henderson
P. Calamia
Kristen Grauman
V. Ithapu
EgoV
374
12
0
04 Jan 2023
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Motion and Context-Aware Audio-Visual Conditioned Video PredictionBritish Machine Vision Conference (BMVC), 2022
Yating Xu
Conghui Hu
G. Lee
VGen
433
1
0
09 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in MixturesComputer Vision and Pattern Recognition (CVPR), 2022
Xixi Hu
Ziyang Chen
Andrew Owens
289
67
0
28 Nov 2022
Pay Self-Attention to Audio-Visual Navigation
Pay Self-Attention to Audio-Visual NavigationBritish Machine Vision Conference (BMVC), 2022
Yinfeng Yu
Lele Cao
Gang Hua
Xiaohong Liu
Liejun Wang
365
17
0
04 Oct 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
332
76
0
20 Aug 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides
  Representations and Explorations
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and ExplorationsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
229
10
0
04 Aug 2022
Estimating Visual Information From Audio Through Manifold Learning
Estimating Visual Information From Audio Through Manifold Learning
Fabrizio Pedersoli
Dryden Wiebe
A. Banitalebi
Yong Zhang
George Tzanetakis
K. M. Yi
SSL
371
9
0
03 Aug 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Finding Fallen Objects Via Asynchronous Audio-Visual IntegrationComputer Vision and Pattern Recognition (CVPR), 2022
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
358
20
0
07 Jul 2022
Beyond Visual Field of View: Perceiving 3D Environment with Echoes and
  Vision
Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision
Xiangjie Sui
Esa Rahtu
Hang Zhao
MDE
390
8
0
03 Jul 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningNeural Information Processing Systems (NeurIPS), 2022
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
398
123
0
16 Jun 2022
Few-Shot Audio-Visual Learning of Environment Acoustics
Few-Shot Audio-Visual Learning of Environment AcousticsNeural Information Processing Systems (NeurIPS), 2022
Sagnik Majumder
Changan Chen
Ziad Al-Halah
Kristen Grauman
318
74
0
08 Jun 2022
GWA: A Large High-Quality Acoustic Dataset for Audio Processing
GWA: A Large High-Quality Acoustic Dataset for Audio ProcessingInternational Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 2022
Zhenyu Tang
R. Aralikatti
Anton Ratnarajah
Tianyi Zhou
433
48
0
04 Apr 2022
Echo-aware Adaptation of Sound Event Localization and Detection in
  Unknown Environments
Echo-aware Adaptation of Sound Event Localization and Detection in Unknown EnvironmentsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Masahiro Yasuda
Yasunori Ohishi
Shoichiro Saito
315
21
0
18 Feb 2022
Computational bioacoustics with deep learning: a review and roadmap
Computational bioacoustics with deep learning: a review and roadmap
D. Stowell
271
378
0
13 Dec 2021
Toward Practical Monocular Indoor Depth Estimation
Toward Practical Monocular Indoor Depth EstimationComputer Vision and Pattern Recognition (CVPR), 2021
Cho-Ying Wu
Jialiang Wang
Michael Hall
Ulrich Neumann
Shuochen Su
3DVMDE
313
91
0
04 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from VideoBritish Machine Vision Conference (BMVC), 2021
Rishabh Garg
Ruohan Gao
Kristen Grauman
209
33
0
21 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with
  Depth and Cross Modal Attention
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal AttentionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
275
33
0
15 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Structure from Silence: Learning Scene Structure from Ambient SoundConference on Robot Learning (CoRL), 2021
Ziyang Chen
Xixi Hu
Andrew Owens
257
31
0
10 Nov 2021
V-SlowFast Network for Efficient Visual Sound Separation
V-SlowFast Network for Efficient Visual Sound Separation
Xiangjie Sui
Esa Rahtu
264
12
0
18 Sep 2021
RigNet: Repetitive Image Guided Network for Depth Completion
RigNet: Repetitive Image Guided Network for Depth CompletionEuropean Conference on Computer Vision (ECCV), 2021
Zhiqiang Yan
Kun Wang
Xiang Li
Ying Tai
Jun Li
Jian Yang
3DVVLM
507
159
0
29 Jul 2021
Learning Audio-Visual Dereverberation
Learning Audio-Visual DereverberationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Changan Chen
Wei-Ju Sun
David Harwath
Kristen Grauman
270
36
0
14 Jun 2021
Move2Hear: Active Audio-Visual Source Separation
Move2Hear: Active Audio-Visual Source SeparationIEEE International Conference on Computer Vision (ICCV), 2021
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
286
47
0
15 May 2021
Collision Replay: What Does Bumping Into Things Tell You About Scene
  Geometry?
Collision Replay: What Does Bumping Into Things Tell You About Scene Geometry?British Machine Vision Conference (BMVC), 2021
Alexander Raistrick
Nilesh Kulkarni
David Fouhey
162
1
0
03 May 2021
Can audio-visual integration strengthen robustness under multimodal
  attacks?
Can audio-visual integration strengthen robustness under multimodal attacks?Computer Vision and Pattern Recognition (CVPR), 2021
Yapeng Tian
Chenliang Xu
AAML
365
41
0
05 Apr 2021
12
Next
Page 1 of 2