v1v2 (latest)

VisualEchoes: Spatial Image Representation Learning through Echolocation

European Conference on Computer Vision (ECCV), 2020

4 May 2020

Papers citing "VisualEchoes: Spatial Image Representation Learning through Echolocation"

50 / 61 papers shown

Deep Learning for Personalized Binaural Audio Reproduction

264

30 Aug 2025

Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and VoiceIEEE International Conference on Automatic Face & Gesture Recognition (FG), 2024

188

24 Aug 2025

Learning to Highlight Audio by Watching MoviesComputer Vision and Pattern Recognition (CVPR), 2025

369

17 May 2025

Differentiable Room Acoustic Rendering with Multi-View Vision Priors

Derong Jin

Ruohan Gao

388

30 Apr 2025

Multimodal Perception for Goal-oriented Navigation: A Survey

I-Tak Ieong

Hao Tang

LM&Ro LRM

430

22 Apr 2025

Hearing Anywhere in Any EnvironmentComputer Vision and Pattern Recognition (CVPR), 2025

Ishwarya Ananthabhotla

374

14 Apr 2025

AVS-Net: Audio-Visual Scale Net for Self-supervised Monocular Metric Depth Estimation

385

02 Dec 2024

Estimating Indoor Scene Depth Maps from Ultrasonic EchoesInternational Conference on Information Photonics (ICIP), 2024

295

05 Sep 2024

Spherical World-Locking for Audio-Visual Localization in Egocentric VideosEuropean Conference on Computer Vision (ECCV), 2024

Heeseung Yun

Ruohan Gao

Ishwarya Ananthabhotla

Gunhee Kim

241

09 Aug 2024

Disentangled Acoustic Fields For Multimodal Physical Scene Understanding

Chuang Gan

310

16 Jul 2024

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

Amandine Brunetto

Sascha Hornauer

Fabien Moutarde

625

28 May 2024

EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air Sonar Images for Mobile Robotics

216

21 May 2024

Images that Sound: Composing Images and Sounds on a Single Canvas

Ziyang Chen

Daniel Geng

Andrew Owens

DiffM

489

20 May 2024

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos

291

08 Apr 2024

6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human

242

04 Mar 2024

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

Wenqi Jia

Miao Liu

Hao Jiang

Ishwarya Ananthabhotla

300

20 Dec 2023

Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic SegmentationAAAI Conference on Artificial Intelligence (AAAI), 2023

Renjie Wu

Hu Wang

Feras Dayoub

Hsiang-Ting Chen

287

14 Dec 2023

SoundCam: A Dataset for Finding Humans Using Room AcousticsNeural Information Processing Systems (NeurIPS), 2023

Jiajun Wu

312

06 Nov 2023

Measuring Acoustics with Collaborative Multiple AgentsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

392

09 Oct 2023

RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth CompletionInternational Journal of Computer Vision (IJCV), 2023

Zhiqiang Yan

Xiang Li

Le Hui

Ying Tai

Jun Yu Li

Jian Yang

VLM 3DV

542

01 Sep 2023

AdVerb: Visually Guided Audio DereverberationIEEE International Conference on Computer Vision (ICCV), 2023

278

23 Aug 2023

Learning Spatial Features from Audio-Visual Correspondence in Egocentric VideosComputer Vision and Pattern Recognition (CVPR), 2023

443

10 Jul 2023

RealImpact: A Dataset of Impact Sound Fields for Real ObjectsComputer Vision and Pattern Recognition (CVPR), 2023

Jiajun Wu

249

16 Jun 2023

Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and HearIEEE International Conference on Robotics and Automation (ICRA), 2023

Silvio Savarese

Li Fei-Fei

Jiajun Wu

371

01 Jun 2023

Sound Localization from Motion: Jointly Learning Sound Direction and Camera RotationIEEE International Conference on Computer Vision (ICCV), 2023

Ziyang Chen

Shengyi Qian

Andrew Owens

333

20 Mar 2023

The Audio-Visual BatVision Dataset for Research on Sight and SoundIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023

362

13 Mar 2023

Chat2Map: Efficient Scene Mapping from Multi-Ego ConversationsComputer Vision and Pattern Recognition (CVPR), 2023

374

04 Jan 2023

Motion and Context-Aware Audio-Visual Conditioned Video PredictionBritish Machine Vision Conference (BMVC), 2022

433

09 Dec 2022

Mix and Localize: Localizing Sound Sources in MixturesComputer Vision and Pattern Recognition (CVPR), 2022

Xixi Hu

Ziyang Chen

Andrew Owens

289

28 Nov 2022

Pay Self-Attention to Audio-Visual NavigationBritish Machine Vision Conference (BMVC), 2022

365

04 Oct 2022

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

332

20 Aug 2022

Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and ExplorationsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022

Xufeng Zhao

C. Weber

Muhammad Burhan Hafez

S. Wermter

229

04 Aug 2022

Estimating Visual Information From Audio Through Manifold Learning

Yong Zhang

371

03 Aug 2022

Finding Fallen Objects Via Asynchronous Audio-Visual IntegrationComputer Vision and Pattern Recognition (CVPR), 2022

Chuang Gan

Antonio Torralba

358

07 Jul 2022

Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision

Xiangjie Sui

Esa Rahtu

Hang Zhao

MDE

390

03 Jul 2022

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningNeural Information Processing Systems (NeurIPS), 2022

398

123

16 Jun 2022

Few-Shot Audio-Visual Learning of Environment AcousticsNeural Information Processing Systems (NeurIPS), 2022

318

08 Jun 2022

GWA: A Large High-Quality Acoustic Dataset for Audio ProcessingInternational Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 2022

433

04 Apr 2022

Echo-aware Adaptation of Sound Event Localization and Detection in Unknown EnvironmentsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Masahiro Yasuda

Yasunori Ohishi

Shoichiro Saito

315

18 Feb 2022

Computational bioacoustics with deep learning: a review and roadmap

D. Stowell

271

378

13 Dec 2021

Toward Practical Monocular Indoor Depth EstimationComputer Vision and Pattern Recognition (CVPR), 2021

Cho-Ying Wu

Ulrich Neumann

313

04 Dec 2021

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from VideoBritish Machine Vision Conference (BMVC), 2021

Rishabh Garg

Ruohan Gao

Kristen Grauman

209

21 Nov 2021

Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal AttentionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021

275

15 Nov 2021

Structure from Silence: Learning Scene Structure from Ambient SoundConference on Robot Learning (CoRL), 2021

Ziyang Chen

Xixi Hu

Andrew Owens

257

10 Nov 2021

V-SlowFast Network for Efficient Visual Sound Separation

Xiangjie Sui

Esa Rahtu

264

18 Sep 2021

RigNet: Repetitive Image Guided Network for Depth CompletionEuropean Conference on Computer Vision (ECCV), 2021

Zhiqiang Yan

Kun Wang

Xiang Li

Ying Tai

Jun Li

Jian Yang

3DV VLM

507

159

29 Jul 2021

Learning Audio-Visual DereverberationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

270

14 Jun 2021

Move2Hear: Active Audio-Visual Source SeparationIEEE International Conference on Computer Vision (ICCV), 2021

Sagnik Majumder

Ziad Al-Halah

Kristen Grauman

286

15 May 2021

Collision Replay: What Does Bumping Into Things Tell You About Scene Geometry?British Machine Vision Conference (BMVC), 2021

Alexander Raistrick

Nilesh Kulkarni

David Fouhey

162

03 May 2021

Can audio-visual integration strengthen robustness under multimodal attacks?Computer Vision and Pattern Recognition (CVPR), 2021

Yapeng Tian

Chenliang Xu

AAML

365

05 Apr 2021