Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2005.01616
Cited By
v1
v2 (latest)
VisualEchoes: Spatial Image Representation Learning through Echolocation
4 May 2020
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VisualEchoes: Spatial Image Representation Learning through Echolocation"
50 / 60 papers shown
Title
Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice
Hugo Bohy
M. Tran
Kevin El Haddad
Thierry Dutoit
M. Soleymani
4
2
0
24 Aug 2025
Learning to Highlight Audio by Watching Movies
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
133
1
0
17 May 2025
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Derong Jin
Ruohan Gao
126
0
0
30 Apr 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&Ro
LRM
137
0
0
22 Apr 2025
Hearing Anywhere in Any Environment
Xiulong Liu
Anurag Kumar
P. Calamia
Sebastia V. Amengual
Calvin Murdock
Ishwarya Ananthabhotla
Philip Robinson
Eli Shlizerman
V. Ithapu
Ruohan Gao
89
1
0
14 Apr 2025
AVS-Net: Audio-Visual Scale Net for Self-supervised Monocular Metric Depth Estimation
Xiaohu Liu
Sascha Hornauer
Fabien Moutarde
Jialiang Lu
SSL
MDE
141
0
0
02 Dec 2024
Estimating Indoor Scene Depth Maps from Ultrasonic Echoes
Junpei Honma
Akisato Kimura
Go Irie
MDE
90
0
0
05 Sep 2024
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Heeseung Yun
Ruohan Gao
Ishwarya Ananthabhotla
Anurag Kumar
Jacob Donley
Chao Li
Gunhee Kim
V. Ithapu
Calvin Murdock
120
4
0
09 Aug 2024
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Jie Yin
Andrew F. Luo
Yilun Du
A. Cherian
Tim K. Marks
Jonathan Le Roux
Chuang Gan
114
1
0
16 Jul 2024
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Amandine Brunetto
Sascha Hornauer
Fabien Moutarde
197
5
0
28 May 2024
EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air Sonar Images for Mobile Robotics
Jan Steckel
W. Jansen
Nico Huebel
MDE
89
0
0
21 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
208
11
0
20 May 2024
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen
Kumar Ashutosh
Rohit Girdhar
David Harwath
Kristen Grauman
EgoV
SSL
122
8
0
08 Apr 2024
6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human
Masahiro Yasuda
Shoichiro Saito
Akira Nakayama
Noboru Harada
106
5
0
04 Mar 2024
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
Wenqi Jia
Miao Liu
Hao Jiang
Ishwarya Ananthabhotla
James M. Rehg
V. Ithapu
Ruohan Gao
EgoV
128
11
0
20 Dec 2023
Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
Renjie Wu
Hu Wang
Feras Dayoub
Hsiang-Ting Chen
92
6
0
14 Dec 2023
SoundCam: A Dataset for Finding Humans Using Room Acoustics
Mason Wang
Samuel Clarke
Jui-Hsien Wang
Ruohan Gao
Jiajun Wu
104
9
0
06 Nov 2023
Measuring Acoustics with Collaborative Multiple Agents
Yinfeng Yu
Changan Chen
Lele Cao
Fangkai Yang
Gang Hua
102
1
0
09 Oct 2023
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion
Zhiqiang Yan
Xiang Li
Le Hui
Ying Tai
Jun Yu Li
Jian Yang
VLM
3DV
187
8
0
01 Sep 2023
AdVerb: Visually Guided Audio Dereverberation
Sanjoy Chowdhury
Sreyan Ghosh
Subhrajyoti Dasgupta
Anton Ratnarajah
Utkarsh Tyagi
Tianyi Zhou
94
16
0
23 Aug 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSL
EgoV
133
6
0
10 Jul 2023
RealImpact: A Dataset of Impact Sound Fields for Real Objects
Samuel Clarke
Ruohan Gao
Mason Wang
M. Rau
Julia Xu
Jui-Hsien Wang
Doug L. James
Jiajun Wu
137
10
0
16 Jun 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
209
11
0
01 Jun 2023
Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
Ziyang Chen
Shengyi Qian
Andrew Owens
148
14
0
20 Mar 2023
The Audio-Visual BatVision Dataset for Research on Sight and Sound
Amandine Brunetto
Sascha Hornauer
Stella X. Yu
Fabien Moutarde
132
4
0
13 Mar 2023
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Sagnik Majumder
Hao Jiang
Pierre Moulon
E. Henderson
P. Calamia
Kristen Grauman
V. Ithapu
EgoV
111
7
0
04 Jan 2023
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Yating Xu
Conghui Hu
G. Lee
VGen
152
0
0
09 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
118
61
0
28 Nov 2022
Pay Self-Attention to Audio-Visual Navigation
Yinfeng Yu
Lele Cao
Gang Hua
Xiaohong Liu
Liejun Wang
131
5
0
04 Oct 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
184
60
0
20 Aug 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
108
10
0
04 Aug 2022
Estimating Visual Information From Audio Through Manifold Learning
Fabrizio Pedersoli
Dryden Wiebe
A. Banitalebi
Yong Zhang
George Tzanetakis
K. M. Yi
SSL
159
7
0
03 Aug 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
179
20
0
07 Jul 2022
Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision
Lingyu Zhu
Esa Rahtu
Hang Zhao
MDE
121
6
0
03 Jul 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
172
95
0
16 Jun 2022
Few-Shot Audio-Visual Learning of Environment Acoustics
Sagnik Majumder
Changan Chen
Ziad Al-Halah
Kristen Grauman
136
59
0
08 Jun 2022
GWA: A Large High-Quality Acoustic Dataset for Audio Processing
Zhenyu Tang
R. Aralikatti
Anton Ratnarajah
Tianyi Zhou
161
36
0
04 Apr 2022
Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments
Masahiro Yasuda
Yasunori Ohishi
Shoichiro Saito
162
12
0
18 Feb 2022
Computational bioacoustics with deep learning: a review and roadmap
D. Stowell
134
280
0
13 Dec 2021
Toward Practical Monocular Indoor Depth Estimation
Cho-Ying Wu
Jialiang Wang
Michael Hall
Ulrich Neumann
Shuochen Su
3DV
MDE
141
72
0
04 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
104
28
0
21 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
88
25
0
15 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Ziyang Chen
Xixi Hu
Andrew Owens
125
27
0
10 Nov 2021
V-SlowFast Network for Efficient Visual Sound Separation
Lingyu Zhu
Esa Rahtu
130
11
0
18 Sep 2021
RigNet: Repetitive Image Guided Network for Depth Completion
Zhiqiang Yan
Kun Wang
Xiang Li
Ying Tai
Jun Li
Jian Yang
3DV
VLM
217
132
0
29 Jul 2021
Learning Audio-Visual Dereverberation
Changan Chen
Wei-Ju Sun
David Harwath
Kristen Grauman
118
33
0
14 Jun 2021
Move2Hear: Active Audio-Visual Source Separation
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
93
44
0
15 May 2021
Collision Replay: What Does Bumping Into Things Tell You About Scene Geometry?
Alexander Raistrick
Nilesh Kulkarni
David Fouhey
76
1
0
03 May 2021
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
128
39
0
05 Apr 2021
Discriminative Semantic Transitive Consistency for Cross-Modal Learning
Kranti K. Parida
Gaurav Sharma
117
1
0
25 Mar 2021
1
2
Next