ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.01616
  4. Cited By
VisualEchoes: Spatial Image Representation Learning through Echolocation
v1v2 (latest)

VisualEchoes: Spatial Image Representation Learning through Echolocation

4 May 2020
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
    MDESSL
ArXiv (abs)PDFHTML

Papers citing "VisualEchoes: Spatial Image Representation Learning through Echolocation"

50 / 60 papers shown
Title
Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice
Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice
Hugo Bohy
M. Tran
Kevin El Haddad
Thierry Dutoit
M. Soleymani
4
2
0
24 Aug 2025
Learning to Highlight Audio by Watching Movies
Learning to Highlight Audio by Watching Movies
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
133
1
0
17 May 2025
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Derong Jin
Ruohan Gao
126
0
0
30 Apr 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&RoLRM
137
0
0
22 Apr 2025
Hearing Anywhere in Any Environment
Hearing Anywhere in Any Environment
Xiulong Liu
Anurag Kumar
P. Calamia
Sebastia V. Amengual
Calvin Murdock
Ishwarya Ananthabhotla
Philip Robinson
Eli Shlizerman
V. Ithapu
Ruohan Gao
89
1
0
14 Apr 2025
AVS-Net: Audio-Visual Scale Net for Self-supervised Monocular Metric
  Depth Estimation
AVS-Net: Audio-Visual Scale Net for Self-supervised Monocular Metric Depth Estimation
Xiaohu Liu
Sascha Hornauer
Fabien Moutarde
Jialiang Lu
SSLMDE
141
0
0
02 Dec 2024
Estimating Indoor Scene Depth Maps from Ultrasonic Echoes
Estimating Indoor Scene Depth Maps from Ultrasonic Echoes
Junpei Honma
Akisato Kimura
Go Irie
MDE
90
0
0
05 Sep 2024
Spherical World-Locking for Audio-Visual Localization in Egocentric
  Videos
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Heeseung Yun
Ruohan Gao
Ishwarya Ananthabhotla
Anurag Kumar
Jacob Donley
Chao Li
Gunhee Kim
V. Ithapu
Calvin Murdock
120
4
0
09 Aug 2024
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Jie Yin
Andrew F. Luo
Yilun Du
A. Cherian
Tim K. Marks
Jonathan Le Roux
Chuang Gan
114
1
0
16 Jul 2024
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Amandine Brunetto
Sascha Hornauer
Fabien Moutarde
197
5
0
28 May 2024
EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air
  Sonar Images for Mobile Robotics
EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air Sonar Images for Mobile Robotics
Jan Steckel
W. Jansen
Nico Huebel
MDE
89
0
0
21 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
208
11
0
20 May 2024
SoundingActions: Learning How Actions Sound from Narrated Egocentric
  Videos
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen
Kumar Ashutosh
Rohit Girdhar
David Harwath
Kristen Grauman
EgoVSSL
122
8
0
08 Apr 2024
6DoF SELD: Sound Event Localization and Detection Using Microphones and
  Motion Tracking Sensors on self-motioning human
6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human
Masahiro Yasuda
Shoichiro Saito
Akira Nakayama
Noboru Harada
106
5
0
04 Mar 2024
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric
  Perspective
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
Wenqi Jia
Miao Liu
Hao Jiang
Ishwarya Ananthabhotla
James M. Rehg
V. Ithapu
Ruohan Gao
EgoV
128
11
0
20 Dec 2023
Segment Beyond View: Handling Partially Missing Modality for
  Audio-Visual Semantic Segmentation
Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
Renjie Wu
Hu Wang
Feras Dayoub
Hsiang-Ting Chen
92
6
0
14 Dec 2023
SoundCam: A Dataset for Finding Humans Using Room Acoustics
SoundCam: A Dataset for Finding Humans Using Room Acoustics
Mason Wang
Samuel Clarke
Jui-Hsien Wang
Ruohan Gao
Jiajun Wu
104
9
0
06 Nov 2023
Measuring Acoustics with Collaborative Multiple Agents
Measuring Acoustics with Collaborative Multiple Agents
Yinfeng Yu
Changan Chen
Lele Cao
Fangkai Yang
Gang Hua
102
1
0
09 Oct 2023
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth
  Completion
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion
Zhiqiang Yan
Xiang Li
Le Hui
Ying Tai
Jun Yu Li
Jian Yang
VLM3DV
187
8
0
01 Sep 2023
AdVerb: Visually Guided Audio Dereverberation
AdVerb: Visually Guided Audio Dereverberation
Sanjoy Chowdhury
Sreyan Ghosh
Subhrajyoti Dasgupta
Anton Ratnarajah
Utkarsh Tyagi
Tianyi Zhou
94
16
0
23 Aug 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric
  Videos
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSLEgoV
133
6
0
10 Jul 2023
RealImpact: A Dataset of Impact Sound Fields for Real Objects
RealImpact: A Dataset of Impact Sound Fields for Real Objects
Samuel Clarke
Ruohan Gao
Mason Wang
M. Rau
Julia Xu
Jui-Hsien Wang
Doug L. James
Jiajun Wu
137
10
0
16 Jun 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household
  Agents that See and Hear
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
209
11
0
01 Jun 2023
Sound Localization from Motion: Jointly Learning Sound Direction and
  Camera Rotation
Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
Ziyang Chen
Shengyi Qian
Andrew Owens
148
14
0
20 Mar 2023
The Audio-Visual BatVision Dataset for Research on Sight and Sound
The Audio-Visual BatVision Dataset for Research on Sight and Sound
Amandine Brunetto
Sascha Hornauer
Stella X. Yu
Fabien Moutarde
132
4
0
13 Mar 2023
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Sagnik Majumder
Hao Jiang
Pierre Moulon
E. Henderson
P. Calamia
Kristen Grauman
V. Ithapu
EgoV
111
7
0
04 Jan 2023
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Yating Xu
Conghui Hu
G. Lee
VGen
152
0
0
09 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
118
61
0
28 Nov 2022
Pay Self-Attention to Audio-Visual Navigation
Pay Self-Attention to Audio-Visual Navigation
Yinfeng Yu
Lele Cao
Gang Hua
Xiaohong Liu
Liejun Wang
131
5
0
04 Oct 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
184
60
0
20 Aug 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides
  Representations and Explorations
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
108
10
0
04 Aug 2022
Estimating Visual Information From Audio Through Manifold Learning
Estimating Visual Information From Audio Through Manifold Learning
Fabrizio Pedersoli
Dryden Wiebe
A. Banitalebi
Yong Zhang
George Tzanetakis
K. M. Yi
SSL
159
7
0
03 Aug 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
179
20
0
07 Jul 2022
Beyond Visual Field of View: Perceiving 3D Environment with Echoes and
  Vision
Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision
Lingyu Zhu
Esa Rahtu
Hang Zhao
MDE
121
6
0
03 Jul 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
172
95
0
16 Jun 2022
Few-Shot Audio-Visual Learning of Environment Acoustics
Few-Shot Audio-Visual Learning of Environment Acoustics
Sagnik Majumder
Changan Chen
Ziad Al-Halah
Kristen Grauman
136
59
0
08 Jun 2022
GWA: A Large High-Quality Acoustic Dataset for Audio Processing
GWA: A Large High-Quality Acoustic Dataset for Audio Processing
Zhenyu Tang
R. Aralikatti
Anton Ratnarajah
Tianyi Zhou
161
36
0
04 Apr 2022
Echo-aware Adaptation of Sound Event Localization and Detection in
  Unknown Environments
Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments
Masahiro Yasuda
Yasunori Ohishi
Shoichiro Saito
162
12
0
18 Feb 2022
Computational bioacoustics with deep learning: a review and roadmap
Computational bioacoustics with deep learning: a review and roadmap
D. Stowell
134
280
0
13 Dec 2021
Toward Practical Monocular Indoor Depth Estimation
Toward Practical Monocular Indoor Depth Estimation
Cho-Ying Wu
Jialiang Wang
Michael Hall
Ulrich Neumann
Shuochen Su
3DVMDE
141
72
0
04 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
104
28
0
21 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with
  Depth and Cross Modal Attention
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
88
25
0
15 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Structure from Silence: Learning Scene Structure from Ambient Sound
Ziyang Chen
Xixi Hu
Andrew Owens
125
27
0
10 Nov 2021
V-SlowFast Network for Efficient Visual Sound Separation
V-SlowFast Network for Efficient Visual Sound Separation
Lingyu Zhu
Esa Rahtu
130
11
0
18 Sep 2021
RigNet: Repetitive Image Guided Network for Depth Completion
RigNet: Repetitive Image Guided Network for Depth Completion
Zhiqiang Yan
Kun Wang
Xiang Li
Ying Tai
Jun Li
Jian Yang
3DVVLM
217
132
0
29 Jul 2021
Learning Audio-Visual Dereverberation
Learning Audio-Visual Dereverberation
Changan Chen
Wei-Ju Sun
David Harwath
Kristen Grauman
118
33
0
14 Jun 2021
Move2Hear: Active Audio-Visual Source Separation
Move2Hear: Active Audio-Visual Source Separation
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
93
44
0
15 May 2021
Collision Replay: What Does Bumping Into Things Tell You About Scene
  Geometry?
Collision Replay: What Does Bumping Into Things Tell You About Scene Geometry?
Alexander Raistrick
Nilesh Kulkarni
David Fouhey
76
1
0
03 May 2021
Can audio-visual integration strengthen robustness under multimodal
  attacks?
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
128
39
0
05 Apr 2021
Discriminative Semantic Transitive Consistency for Cross-Modal Learning
Discriminative Semantic Transitive Consistency for Cross-Modal Learning
Kranti K. Parida
Gaurav Sharma
117
1
0
25 Mar 2021
12
Next