Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

30 June 2018

Papers citing "Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization"

50 / 137 papers shown

Title
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment Edson Araujo Andrew Rouditchenko Yuan Gong Saurabhchand Bhati Samuel Thomas Brian Kingsbury Leonid Karlinsky Rogerio Feris James Glass Hilde Kuehne 44 0 0 02 May 2025
The Sound of Water: Inferring Physical Properties from Pouring Liquids Piyush Bagad Makarand Tapaswi Cees G. M. Snoek Andrew Zisserman 45 0 0 18 Nov 2024
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data Hugo Thimonier José Lucas De Melo Costa Fabrice Popineau Arpad Rimmel Bich-Liên Doan 53 1 0 07 Oct 2024
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment Arda Senocak H. Ryu Junsik Kim Tae-Hyun Oh Hanspeter Pfister Joon Son Chung 38 3 0 18 Jul 2024
Sequential Contrastive Audio-Visual Learning Ioannis Tsiamas Santiago Pascual Chunghsin Yeh Joan Serrà 47 2 0 08 Jul 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation Yuxuan Wang Feng Dong Jinchao Zhu Shuyue Zhu VOS 56 0 0 04 Jun 2024
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models David Kurzendörfer Otniel-Bogdan Mercea A. Sophia Koepke Zeynep Akata VLM CLIP 33 2 0 09 Apr 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition Jie Zhang Zhifan Wan Lanqing Hu Stephen Lin Shuzhe Wu Shiguang Shan TTA 67 1 0 15 Jan 2024
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA Asmar Nadeem Adrian Hilton R. Dawes Graham A. Thomas A. Mustafa 33 9 0 25 Oct 2023
Sound Source Localization is All about Cross-Modal Alignment Arda Senocak H. Ryu Junsik Kim Tae-Hyun Oh Hanspeter Pfister Joon Son Chung 36 18 0 19 Sep 2023
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition Shentong Mo Pedro Morgado 38 21 0 30 May 2023
How does Contrastive Learning Organize Images? Yunzhe Zhang Yao Lu Qi Xuan SSL 32 0 0 17 May 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning Nikhil Singh Chih-Wei Wu Iroro Orife Mahdi M. Kalayeh 25 2 0 12 Apr 2023
Egocentric Auditory Attention Localization in Conversations Fiona Ryan Hao Jiang Abhinav Shukla James M. Rehg V. Ithapu EgoV 29 16 0 28 Mar 2023
DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images K. Yu Li Sun Junxiang Chen Maxwell Reynolds Tigmanshu Chaudhary Kayhan Batmanghelich 25 1 0 21 Feb 2023
Audio-Visual Contrastive Learning with Temporal Self-Supervision Simon Jenni Alexander Black John Collomosse SSL 31 15 0 15 Feb 2023
Confidence-Aware Calibration and Scoring Functions for Curriculum Learning Shuang Ao Stefan Rueger Advaith Siddharthan UQCV 34 0 0 29 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos Kumar Ashutosh Rohit Girdhar Lorenzo Torresani Kristen Grauman 29 4 0 05 Jan 2023
EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding Shuhan Tan Tushar Nagarajan Kristen Grauman 26 21 0 05 Jan 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition Hasan Hammoud Shuming Liu Mohammad Alkhrashi Fahad Albalawi Guohao Li AAML 34 8 0 03 Jan 2023
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos Hao-Wen Dong Naoya Takahashi Yuki Mitsufuji Julian McAuley Taylor Berg-Kirkpatrick VLM CLIP 31 25 0 14 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw Data A. Haliassos Pingchuan Ma Rodrigo Mira Stavros Petridis M. Pantic SSL 45 49 0 12 Dec 2022
Audiovisual Masked Autoencoders Mariana-Iuliana Georgescu Eduardo Fonseca Radu Tudor Ionescu Mario Lucic Cordelia Schmid Anurag Arnab SSL 39 43 0 09 Dec 2022
Motion and Context-Aware Audio-Visual Conditioned Video Prediction Yating Xu Conghui Hu G. Lee VGen 48 0 0 09 Dec 2022
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight Yunhua Zhang Hazel Doughty Cees G. M. Snoek VLM 50 0 0 05 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures Xixi Hu Ziyang Chen Andrew Owens 33 51 0 28 Nov 2022
MarginNCE: Robust Sound Localization with a Negative Margin Sooyoung Park Arda Senocak Joon Son Chung SSL 27 13 0 03 Nov 2022
ViFiCon: Vision and Wireless Association Via Self-Supervised Contrastive Learning Nicholas Meegan Hansi Liu Bryan Bo Cao Abrar Alali Kristin J. Dana Marco Gruteser Shubham Jain A. Ashok 10 1 0 11 Oct 2022
HiCo: Hierarchical Contrastive Learning for Ultrasound Video Model Pretraining Chunhui Zhang Yixiong Chen Li Liu Qiong Liu Xiaoping Zhou VLM 45 8 0 10 Oct 2022
Contrastive Audio-Visual Masked Autoencoder Yuan Gong Andrew Rouditchenko Alexander H. Liu David Harwath Leonid Karlinsky Hilde Kuehne James R. Glass 39 120 0 02 Oct 2022
TVLT: Textless Vision-Language Transformer Zineng Tang Jaemin Cho Yixin Nie Joey Tianyi Zhou VLM 51 28 0 28 Sep 2022
Learning State-Aware Visual Representations from Audible Interactions Himangi Mittal Pedro Morgado Unnat Jain Abhinav Gupta 78 23 0 27 Sep 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization Shentong Mo Pedro Morgado 83 64 0 30 Aug 2022
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey Yanbei Chen Massimiliano Mancini Xiatian Zhu Zeynep Akata 45 114 0 24 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective Yake Wei Di Hu Yapeng Tian Xuelong Li 46 55 0 20 Aug 2022
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization Zdravko Marinov Alina Roitberg David Schneider Rainer Stiefelhagen 24 4 0 19 Aug 2022
PCC: Paraphrasing with Bottom-k Sampling and Cyclic Learning for Curriculum Data Augmentation Hongyuan Lu W. Lam 16 9 0 17 Aug 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation Efthymios Tzinis Scott Wisdom Tal Remez J. Hershey 41 30 0 20 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning Otniel-Bogdan Mercea Thomas Hummel A. Sophia Koepke Zeynep Akata 38 25 0 20 Jul 2022
LAVA: Language Audio Vision Alignment for Contrastive Video Pre-Training Sumanth Gurram An Fang David M. Chan John F. Canny VLM AI4TS 38 1 0 16 Jul 2022
Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection Jiashuo Yu Jin-Yuan Liu Ying Cheng Rui Feng Yuejie Zhang 27 35 0 12 Jul 2022
Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization Jiashuo Yu Junfu Pu Ying Cheng Rui Feng Ying Shan 21 5 0 07 Jul 2022
Self-Supervised Learning for Videos: A Survey Madeline Chantry Schiappa Yogesh S Rawat M. Shah SSL 36 131 0 18 Jun 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning Changan Chen Carl Schissler Sanchit Garg Philip Kobernik Alexander Clegg P. Calamia Dhruv Batra Philip Robinson Kristen Grauman 3DGS 39 80 0 16 Jun 2022
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model Xinya Ji Hang Zhou Kaisiyuan Wang Qianyi Wu Wayne Wu Feng Xu Xun Cao CVBM 63 157 0 30 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches Anirudh S. Sundar Larry Heck 43 29 0 13 May 2022
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition Haodong Duan Nanxuan Zhao Kai-xiang Chen Dahua Lin ViT AI4TS 33 19 0 04 May 2022
On Negative Sampling for Audio-Visual Contrastive Learning from Movies Mahdi M. Kalayeh Shervin Ardeshir Lingyi Liu Nagendra Kamath Ashok Chandrashekar SSL 35 3 0 29 Apr 2022
Sound Localization by Self-Supervised Time Delay Estimation Ziyang Chen David Fouhey Andrew Owens SSL 27 19 0 26 Apr 2022
Probabilistic Representations for Video Contrastive Learning Jungin Park Jiyoung Lee Ig-Jae Kim Kwanghoon Sohn SSL 33 43 0 08 Apr 2022