Co-Separating Sounds of Visual Objects

16 April 2019

Papers citing "Co-Separating Sounds of Visual Objects"

41 / 41 papers shown

Title
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation Akam Rahimi Triantafyllos Afouras Andrew Zisserman 40 28 0 02 Jan 2025
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio Xavier Juanola Gloria Haro Magdalena Fuentes 31 2 0 01 Oct 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation Yuanhong Chen Chong Wang Yuyuan Liu Hu Wang Gustavo Carneiro 40 2 0 07 Jul 2024
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models David Kurzendörfer Otniel-Bogdan Mercea A. Sophia Koepke Zeynep Akata VLM CLIP 26 2 0 09 Apr 2024
Robust Active Speaker Detection in Noisy Environments Siva Sai Nagender Vasireddy Chenxu Zhang Xiaohu Guo Yapeng Tian 32 0 0 27 Mar 2024
Sound Source Localization is All about Cross-Modal Alignment Arda Senocak H. Ryu Junsik Kim Tae-Hyun Oh Hanspeter Pfister Joon Son Chung 19 18 0 19 Sep 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition Samir Sadok Simon Leglaive Renaud Séguier SSL 79 6 0 05 May 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos Hanlin Wang Yilu Wu Sheng Guo Limin Wang VGen DiffM 63 30 0 26 Mar 2023
Motion and Context-Aware Audio-Visual Conditioned Video Prediction Yating Xu Conghui Hu G. Lee VGen 35 0 0 09 Dec 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization Shentong Mo Pedro Morgado 79 64 0 30 Aug 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation Efthymios Tzinis Scott Wisdom Tal Remez J. Hershey 31 29 0 20 Jul 2022
Audio-Visual Segmentation Jinxing Zhou Jianyuan Wang J. Zhang Weixuan Sun Jing Zhang Stan Birchfield Dan Guo Lingpeng Kong Meng Wang Yiran Zhong VOS 28 110 0 11 Jul 2022
Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation Jiangyu Han Yanhua Long 20 6 0 23 Apr 2022
Text-Driven Separation of Arbitrary Sounds Kevin Kilgour Beat Gfeller Qingqing Huang A. Jansen Scott Wisdom Marco Tagliasacchi 22 30 0 12 Apr 2022
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning Marc Delcroix Jorge Bennasar Vázquez Tsubasa Ochiai K. Kinoshita Yasunori Ohishi S. Araki VLM 22 31 0 08 Apr 2022
The Sound of Bounding-Boxes Takashi Oya Shohei Iwase Shigeo Morishima 16 2 0 30 Mar 2022
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer Juan F. Montesinos V. S. Kadandale G. Haro ViT 23 19 0 08 Mar 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing Xian Liu Rui Qian Hang Zhou Di Hu Weiyao Lin Ziwei Liu Bolei Zhou Xiaowei Zhou 6 25 0 13 Feb 2022
Active Audio-Visual Separation of Dynamic Sound Sources Sagnik Majumder Kristen Grauman 19 21 0 02 Feb 2022
Soundify: Matching Sound Effects to Video David Chuan-En Lin Anastasis Germanidis Cristobal Valenzuela Yining Shi Nikolas Martelaro 25 16 0 17 Dec 2021
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound Zhijian Yang Xiaoran Fan Volkan Isler H. Park 3DH 16 6 0 01 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video Rishabh Garg Ruohan Gao Kristen Grauman 13 27 0 21 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation Tanzila Rahman Mengyu Yang Leonid Sigal ViT 21 8 0 26 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,018 0 13 Oct 2021
A cappella: Audio-visual Singing Voice Separation Juan F. Montesinos V. S. Kadandale G. Haro 38 16 0 20 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios Xudong Xu Hang Zhou Ziwei Liu Bo Dai Xiaogang Wang Dahua Lin DiffM 13 53 0 13 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning Yan-Bo Lin Hung-Yu Tseng Hsin-Ying Lee Yen-Yu Lin Ming-Hsuan Yang SSL 19 34 0 01 Apr 2021
Beyond Image to Depth: Improving Depth Prediction using Echoes Kranti K. Parida Siddharth Srivastava Gaurav Sharma MDE 26 37 0 15 Mar 2021
Music source separation conditioned on 3D point clouds Francesc Lluís V. Chatziioannou A. Hofmann 3DPC 24 5 0 03 Feb 2021
Learning Representations from Audio-Visual Spatial Alignment Pedro Morgado Yi Li Nuno Vasconcelos SSL 13 121 0 03 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds Efthymios Tzinis Scott Wisdom A. Jansen Shawn Hershey Tal Remez D. Ellis J. Hershey 26 68 0 02 Nov 2020
Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision Yun-Ning Hung G. Wichern Jonathan Le Roux 17 12 0 22 Oct 2020
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention Bin Duan Hao Tang Wei Wang Ziliang Zong Guowei Yang Yan Yan 25 59 0 14 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video Triantafyllos Afouras Andrew Owens Joon Son Chung Andrew Zisserman SSL 17 250 0 10 Aug 2020
Multiple Sound Sources Localization from Coarse to Fine Rui Qian Di Hu Heinrich Dinkel Mengyue Wu N. Xu Weiyao Lin 23 153 0 13 Jul 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Andrew Rouditchenko Angie Boggust David F. Harwath Brian Chen D. Joshi ... Rogerio Feris Brian Kingsbury M. Picheny Antonio Torralba James R. Glass SSL 22 141 0 16 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound Karren D. Yang Bryan C. Russell Justin Salamon SSL 16 75 0 11 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network Lingyu Zhu Esa Rahtu 14 23 0 04 Jun 2020
Conditioned Source Separation for Music Instrument Performances Olga Slizovskaia G. Haro E. Gómez 22 38 0 08 Apr 2020
The State of Lifelong Learning in Service Robots: Current Bottlenecks in Object Perception and Manipulation S. Kasaei J. Melsen Floris van Beers Christiaan Steenkist K. Vončina 11 12 0 18 Mar 2020
Listen to Look: Action Recognition by Previewing Audio Ruohan Gao Tae-Hyun Oh Kristen Grauman Lorenzo Torresani VLM 27 251 0 10 Dec 2019