Self-Supervised Generation of Spatial Audio for 360 Video

7 September 2018

Papers citing "Self-Supervised Generation of Spatial Audio for 360 Video"

50 / 117 papers shown

Title
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations Wenxiang Guo Changhao Pan Zhiyuan Zhu Xintong Hu Yu Zhang ... Z. Chen Yanhao Yu Qiange Huang Fei Wu Zhou Zhao 163 0 0 12 Oct 2025
StereoSync: Spatially-Aware Stereo Audio Generation from Video Christian Marinoni R. F. Gramaccioni Kazuki Shimada Takashi Shibuya Yuki Mitsufuji Danilo Comminiello DiffM VGen 74 2 0 07 Oct 2025
Text2Move: Text-to-moving sound generation via trajectory prediction and temporal alignment Y. Liu Shaofan Yang Kai Li Xu Li 77 1 0 26 Sep 2025
Lightweight Implicit Neural Network for Binaural Audio Synthesis Xikun Lu Fang Liu Weizhi Shi Jinqiu Sang 92 0 0 17 Sep 2025
Deep Learning for Personalized Binaural Audio Reproduction Xikun Lu Yunda Chen Zehua Chen Jie Wang Mingxing Liu Hongmei Hu C. Zheng Stefan Bleeck Jinqiu Sang 120 2 0 30 Aug 2025
Spherical Vision Transformers for Audio-Visual Saliency Prediction in 360-Degree Videos Mert Cokelek Halit Ozsoy Nevrez Imamoglu C. Ozcinar Inci Ayhan Erkut Erdem Aykut Erdem MDE 116 1 0 27 Aug 2025
ASAudio: A Survey of Advanced Spatial Audio Research Zhiyuan Zhu Yu Zhang Wenxiang Guo Changhao Pan Zhou Zhao 121 1 0 08 Aug 2025
ViSAGe: Video-to-Spatial Audio GenerationInternational Conference on Learning Representations (ICLR), 2025 Jaeyeon Kim Heeseung Yun Gunhee Kim VGen 165 9 0 13 Jun 2025
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation Theodore Barfoot Luis C. Garcia-Peraza-Herrera Samet Akcay Ben Glocker Tom Vercauteren UQCV 333 0 0 04 Jun 2025
In-the-wild Audio Spatialization with Flexible Text-guided LocalizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Tianrui Pan J. Tang Longxiang Zhang Jie Tang Gangshan Wu 141 2 0 01 Jun 2025
Learning to Highlight Audio by Watching MoviesComputer Vision and Pattern Recognition (CVPR), 2025 Chao Huang Ruohan Gao J. M. F. Tsang Jan Kurcius Cagdas Bilen Chenliang Xu Anurag Kumar Sanjeel Parekh VGen 217 3 0 17 May 2025
Differentiable Room Acoustic Rendering with Multi-View Vision Priors Derong Jin Ruohan Gao 247 2 0 30 Apr 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video Huadai Liu Tianyi Luo Qikai Jiang Kaicheng Luo Peiwen Sun ... Xin Li Shiliang Zhang Zhijie Yan Zhou Zhao Wei Xue VGen 372 10 0 21 Apr 2025
Hearing Anywhere in Any EnvironmentComputer Vision and Pattern Recognition (CVPR), 2025 Xiulong Liu Anurag Kumar P. Calamia Sebastia V. Amengual Calvin Murdock Ishwarya Ananthabhotla Philip Robinson Eli Shlizerman V. Ithapu Ruohan Gao 218 6 0 14 Apr 2025
AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis Hadam Baek Hannie Shin Jiyoung Seo Chanwoo Kim Saerom Kim Hyeongbok Kim Sangpil Kim 167 1 0 17 Mar 2025
Aligning Audio-Visual Joint Representations with an Agentic WorkflowNeural Information Processing Systems (NeurIPS), 2024 Shentong Mo Yibing Song 201 2 0 30 Oct 2024
Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 Saksham Singh Kushwaha Jianbo Ma Mark R. P. Thomas Yapeng Tian Avery Bruni 111 7 0 15 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent ApproachNeural Information Processing Systems (NeurIPS), 2024 Rory Young Nicolas Pugeault AAML 298 20 0 14 Oct 2024
End-to-end multi-channel speaker extraction and binaural speech synthesis Cheng Chi Xiaoyu Li Andong Li Yuxuan Ke Yao Ge Xiaodong Li C. Zheng 117 0 0 08 Oct 2024
Self-Supervised Audio-Visual Soundscape StylizationEuropean Conference on Computer Vision (ECCV), 2024 Tingle Li Renhao Wang Po-Yao Huang Andrew Owens Gopala Anumanchipalli DiffM SSL 219 7 0 22 Sep 2024
Multi-scale Multi-instance Visual Sound Localization and Segmentation Shentong Mo Haofan Wang 220 3 0 31 Aug 2024
How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and ModelIEEE Transactions on Image Processing (TIP), 2024 Yuxin Zhu Huiyu Duan Kaiwei Zhang Yucheng Zhu Xilei Zhu Long Teng Xiongkuo Min Guangtao Zhai 203 6 0 10 Aug 2024
Audio-visual Generalized Zero-shot Learning the Easy Way Shentong Mo Pedro Morgado 207 7 0 18 Jul 2024
Modeling and Driving Human Body Soundfields through Acoustic Primitives Chao Huang Dejan Marković Chenliang Xu Alexander Richard 214 12 0 18 Jul 2024
Semantic Grouping Network for Audio Source Separation Shentong Mo Yapeng Tian 178 5 0 04 Jul 2024
SOAF: Scene Occlusion-aware Neural Acoustic Field Huiyu Gao Jiahao Ma David Ahmedt-Aristizabal Chuong H. Nguyen Miaomiao Liu 340 5 0 02 Jul 2024
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data AugmentationNeural Information Processing Systems (NeurIPS), 2024 Ning-Hsu Wang Yu-Lun Liu MDE 212 19 0 18 Jun 2024
AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis Swapnil Bhosale Haosen Yang Helen Treharne Jiankang Deng Xiatian Zhu 283 10 0 13 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound Rishit Dagli Shivesh Prakash Robert Wu H. Khosravani 305 14 0 06 Jun 2024
Images that Sound: Composing Images and Sounds on a Single Canvas Ziyang Chen Daniel Geng Andrew Owens DiffM 349 15 0 20 May 2024
Unified Video-Language Pre-training with Synchronized Audio Shentong Mo Haofan Wang Huaxia Li Xu Tang 228 2 0 12 May 2024
MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos Zheng Ning Zheng Zhang Jerrick Ban Kaiwen Jiang Ruohong Gan Yapeng Tian Tao Li VGen 89 9 0 23 Apr 2024
Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance PropagationEuropean Signal Processing Conference (EUSIPCO), 2024 Luca Comanducci Fabio Antonacci Augusto Sarti 119 1 0 04 Apr 2024
Text-to-Audio Generation Synchronized with Videos Shentong Mo Jing Shi Yapeng Tian DiffM VGen 155 26 0 08 Mar 2024
Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization Davide Berghi Philip J. B. Jackson 183 5 0 21 Dec 2023
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked ModelingComputer Vision and Pattern Recognition (CVPR), 2023 Shentong Mo Pedro Morgado 206 27 0 02 Dec 2023
Weakly-Supervised Audio-Visual SegmentationNeural Information Processing Systems (NeurIPS), 2023 Shentong Mo Bhiksha Raj VOS 231 18 0 25 Nov 2023
Cross-modal Generative Model for Visual-Guided Binaural Stereo GenerationKnowledge-Based Systems (KBS), 2023 Zhaojian Li Jiangwei Zhong Yuan Yuan 182 9 0 13 Nov 2023
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and AudioNeural Information Processing Systems (NeurIPS), 2023 Xudong Xu Dejan Marković Jacob Sandakly Todd Keebler Steven Krenn Alexander Richard 113 8 0 01 Nov 2023
LAVSS: Location-Guided Audio-Visual Spatial Audio SeparationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023 Yuxin Ye Wenming Yang Yapeng Tian 169 12 0 31 Oct 2023
Audio-Visual Instance SegmentationComputer Vision and Pattern Recognition (CVPR), 2023 Ruohao Guo Yaru Chen Yanyu Qi Wenzhen Yue Dantong Niu ... Wenzhen Yue Ji Shi Qixun Wang Peiliang Zhang Buwen Liang VLM VOS 285 11 0 28 Oct 2023
Measuring Acoustics with Collaborative Multiple AgentsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023 Yinfeng Yu Changan Chen Lele Cao Fangkai Yang Gang Hua 198 7 0 09 Oct 2023
Class-Incremental Grouping Network for Continual Audio-Visual LearningIEEE International Conference on Computer Vision (ICCV), 2023 Shentong Mo Weiguo Pian Yapeng Tian CLL VLM 144 31 0 11 Sep 2023
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual DataACM Symposium on User Interface Software and Technology (UIST), 2023 Zheng Zhang Zheng Ning Chenliang Xu Yapeng Tian Toby Jia-Jun Li 211 11 0 27 Jul 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric VideosComputer Vision and Pattern Recognition (CVPR), 2023 Sagnik Majumder Ziad Al-Halah Kristen Grauman SSL EgoV 273 7 0 10 Jul 2023
RealImpact: A Dataset of Impact Sound Fields for Real ObjectsComputer Vision and Pattern Recognition (CVPR), 2023 Samuel Clarke Ruohan Gao Mason Wang M. Rau Julia Xu Jui-Hsien Wang Doug L. James Jiajun Wu 193 13 0 16 Jun 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and HearIEEE International Conference on Robotics and Automation (ICRA), 2023 Ruohan Gao Hao Li Gokul Dharan Zhuzhu Wang Chengshu Li Fei Xia Silvio Savarese Li Fei-Fei Jiajun Wu 289 14 0 01 Jun 2023
A Unified Audio-Visual Learning Framework for Localization, Separation, and RecognitionInternational Conference on Machine Learning (ICML), 2023 Shentong Mo Pedro Morgado 162 25 0 30 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment Shentong Mo Jing Shi Yapeng Tian 100 17 0 22 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond Chunhui Zhang Li Liu Yawen Cui Guanjie Huang Weilin Lin Yiqian Yang Yuehong Hu VLM 316 127 0 14 May 2023