Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1809.02587
Cited By
Self-Supervised Generation of Spatial Audio for 360 Video
7 September 2018
Pedro Morgado
Nuno Vasconcelos
Timothy R. Langlois
Oliver Wang
MDE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-Supervised Generation of Spatial Audio for 360 Video"
50 / 118 papers shown
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
Mengchen Zhang
Qi Chen
Tong Wu
Zihan Liu
Dahua Lin
VGen
147
0
0
02 Dec 2025
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
Wenxiang Guo
Changhao Pan
Zhiyuan Zhu
Xintong Hu
Yu Zhang
...
Z. Chen
Yanhao Yu
Qiange Huang
Fei Wu
Zhou Zhao
223
0
0
12 Oct 2025
StereoSync: Spatially-Aware Stereo Audio Generation from Video
Christian Marinoni
R. F. Gramaccioni
Kazuki Shimada
Takashi Shibuya
Yuki Mitsufuji
Danilo Comminiello
DiffM
VGen
106
2
0
07 Oct 2025
Text2Move: Text-to-moving sound generation via trajectory prediction and temporal alignment
Y. Liu
Shaofan Yang
Kai Li
Xu Li
116
1
0
26 Sep 2025
Lightweight Implicit Neural Network for Binaural Audio Synthesis
Xikun Lu
Fang Liu
Weizhi Shi
Jinqiu Sang
128
0
0
17 Sep 2025
Deep Learning for Personalized Binaural Audio Reproduction
Xikun Lu
Yunda Chen
Zehua Chen
Jie Wang
Mingxing Liu
Hongmei Hu
C. Zheng
Stefan Bleeck
Jinqiu Sang
179
2
0
30 Aug 2025
Spherical Vision Transformers for Audio-Visual Saliency Prediction in 360-Degree Videos
Mert Cokelek
Halit Ozsoy
Nevrez Imamoglu
C. Ozcinar
Inci Ayhan
Erkut Erdem
Aykut Erdem
MDE
180
1
0
27 Aug 2025
ASAudio: A Survey of Advanced Spatial Audio Research
Zhiyuan Zhu
Yu Zhang
Wenxiang Guo
Changhao Pan
Zhou Zhao
198
1
0
08 Aug 2025
ViSAGe: Video-to-Spatial Audio Generation
International Conference on Learning Representations (ICLR), 2025
Jaeyeon Kim
Heeseung Yun
Gunhee Kim
VGen
217
9
0
13 Jun 2025
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Theodore Barfoot
Luis C. Garcia-Peraza-Herrera
Samet Akcay
Ben Glocker
Tom Vercauteren
UQCV
450
1
0
04 Jun 2025
In-the-wild Audio Spatialization with Flexible Text-guided Localization
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Tianrui Pan
J. Tang
Longxiang Zhang
Jie Tang
Gangshan Wu
169
2
0
01 Jun 2025
Learning to Highlight Audio by Watching Movies
Computer Vision and Pattern Recognition (CVPR), 2025
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
257
4
0
17 May 2025
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Derong Jin
Ruohan Gao
303
2
0
30 Apr 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
Xin Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
451
10
0
21 Apr 2025
Hearing Anywhere in Any Environment
Computer Vision and Pattern Recognition (CVPR), 2025
Xiulong Liu
Anurag Kumar
P. Calamia
Sebastia V. Amengual
Calvin Murdock
Ishwarya Ananthabhotla
Philip Robinson
Eli Shlizerman
V. Ithapu
Ruohan Gao
266
6
0
14 Apr 2025
AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis
Hadam Baek
Hannie Shin
Jiyoung Seo
Chanwoo Kim
Saerom Kim
Hyeongbok Kim
Sangpil Kim
198
1
0
17 Mar 2025
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Neural Information Processing Systems (NeurIPS), 2024
Shentong Mo
Yibing Song
249
2
0
30 Oct 2024
Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Saksham Singh Kushwaha
Jianbo Ma
Mark R. P. Thomas
Yapeng Tian
Avery Bruni
143
7
0
15 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Neural Information Processing Systems (NeurIPS), 2024
Rory Young
Nicolas Pugeault
AAML
359
20
0
14 Oct 2024
End-to-end multi-channel speaker extraction and binaural speech synthesis
Cheng Chi
Xiaoyu Li
Andong Li
Yuxuan Ke
Yao Ge
Xiaodong Li
C. Zheng
165
0
0
08 Oct 2024
Self-Supervised Audio-Visual Soundscape Stylization
European Conference on Computer Vision (ECCV), 2024
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffM
SSL
248
7
0
22 Sep 2024
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Shentong Mo
Haofan Wang
260
3
0
31 Aug 2024
How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model
IEEE Transactions on Image Processing (TIP), 2024
Yuxin Zhu
Huiyu Duan
Kaiwei Zhang
Yucheng Zhu
Xilei Zhu
Long Teng
Xiongkuo Min
Guangtao Zhai
270
7
0
10 Aug 2024
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo
Pedro Morgado
239
7
0
18 Jul 2024
Modeling and Driving Human Body Soundfields through Acoustic Primitives
Chao Huang
Dejan Marković
Chenliang Xu
Alexander Richard
286
12
0
18 Jul 2024
Semantic Grouping Network for Audio Source Separation
Shentong Mo
Yapeng Tian
227
5
0
04 Jul 2024
SOAF: Scene Occlusion-aware Neural Acoustic Field
Huiyu Gao
Jiahao Ma
David Ahmedt-Aristizabal
Chuong H. Nguyen
Miaomiao Liu
396
5
0
02 Jul 2024
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation
Neural Information Processing Systems (NeurIPS), 2024
Ning-Hsu Wang
Yu-Lun Liu
MDE
256
27
0
18 Jun 2024
AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis
Swapnil Bhosale
Haosen Yang
Helen Treharne
Jiankang Deng
Xiatian Zhu
329
10
0
13 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
365
14
0
06 Jun 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
405
15
0
20 May 2024
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
268
2
0
12 May 2024
MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos
Zheng Ning
Zheng Zhang
Jerrick Ban
Kaiwen Jiang
Ruohong Gan
Yapeng Tian
Tao Li
VGen
125
9
0
23 Apr 2024
Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance Propagation
European Signal Processing Conference (EUSIPCO), 2024
Luca Comanducci
Fabio Antonacci
Augusto Sarti
139
1
0
04 Apr 2024
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
191
29
0
08 Mar 2024
Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization
Davide Berghi
Philip J. B. Jackson
217
5
0
21 Dec 2023
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Computer Vision and Pattern Recognition (CVPR), 2023
Shentong Mo
Pedro Morgado
254
30
0
02 Dec 2023
Weakly-Supervised Audio-Visual Segmentation
Neural Information Processing Systems (NeurIPS), 2023
Shentong Mo
Bhiksha Raj
VOS
279
20
0
25 Nov 2023
Cross-modal Generative Model for Visual-Guided Binaural Stereo Generation
Knowledge-Based Systems (KBS), 2023
Zhaojian Li
Jiangwei Zhong
Yuan Yuan
234
9
0
13 Nov 2023
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio
Neural Information Processing Systems (NeurIPS), 2023
Xudong Xu
Dejan Marković
Jacob Sandakly
Todd Keebler
Steven Krenn
Alexander Richard
145
8
0
01 Nov 2023
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yuxin Ye
Wenming Yang
Yapeng Tian
210
12
0
31 Oct 2023
Audio-Visual Instance Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
358
11
0
28 Oct 2023
Measuring Acoustics with Collaborative Multiple Agents
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Yinfeng Yu
Changan Chen
Lele Cao
Fangkai Yang
Gang Hua
263
7
0
09 Oct 2023
Class-Incremental Grouping Network for Continual Audio-Visual Learning
IEEE International Conference on Computer Vision (ICCV), 2023
Shentong Mo
Weiguo Pian
Yapeng Tian
CLL
VLM
195
31
0
11 Sep 2023
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
ACM Symposium on User Interface Software and Technology (UIST), 2023
Zheng Zhang
Zheng Ning
Chenliang Xu
Yapeng Tian
Toby Jia-Jun Li
231
11
0
27 Jul 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSL
EgoV
359
8
0
10 Jul 2023
RealImpact: A Dataset of Impact Sound Fields for Real Objects
Computer Vision and Pattern Recognition (CVPR), 2023
Samuel Clarke
Ruohan Gao
Mason Wang
M. Rau
Julia Xu
Jui-Hsien Wang
Doug L. James
Jiajun Wu
210
13
0
16 Jun 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
IEEE International Conference on Robotics and Automation (ICRA), 2023
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
325
14
0
01 Jun 2023
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
International Conference on Machine Learning (ICML), 2023
Shentong Mo
Pedro Morgado
210
25
0
30 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
136
18
0
22 May 2023
1
2
3
Next