ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.12667
  4. Cited By
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
v1v2v3 (latest)

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Neural Information Processing Systems (NeurIPS), 2019
28 November 2019
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
    SSL
ArXiv (abs)PDFHTML

Papers citing "Self-Supervised Learning by Cross-Modal Audio-Video Clustering"

50 / 280 papers shown
Title
Self-supervised Video Representation Learning with Motion-Aware Masked
  Autoencoders
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang
Deng Huang
Bin Wen
Jiannan Wu
Huanjin Yao
Yi Jiang
Xiatian Zhu
Zehuan Yuan
123
28
0
09 Oct 2022
Learning State-Aware Visual Representations from Audible Interactions
Learning State-Aware Visual Representations from Audible InteractionsNeural Information Processing Systems (NeurIPS), 2022
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
182
28
0
27 Sep 2022
Non-Linguistic Supervision for Contrastive Learning of Sentence
  Embeddings
Non-Linguistic Supervision for Contrastive Learning of Sentence EmbeddingsNeural Information Processing Systems (NeurIPS), 2022
Yiren Jian
Chongyang Gao
Soroush Vosoughi
SSL
221
16
0
20 Sep 2022
ImageArg: A Multi-modal Tweet Dataset for Image Persuasiveness Mining
ImageArg: A Multi-modal Tweet Dataset for Image Persuasiveness MiningWorkshop on Argument Mining (ArgMining), 2022
Zhexiong Liu
M. Guo
Y. Dai
Diane Litman
119
19
0
14 Sep 2022
Modality Mixer for Multi-modal Action Recognition
Modality Mixer for Multi-modal Action RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Sumin Lee
Sangmin Woo
Yeonju Park
Muhammad Adi Nugroho
Changick Kim
141
12
0
24 Aug 2022
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
Semi-Supervised and Unsupervised Deep Visual Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yanbei Chen
Goran Frehse
Xiatian Zhu
Zeynep Akata
275
160
0
24 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
248
66
0
20 Aug 2022
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain
  Generalization
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization
Zdravko Marinov
Alina Roitberg
David Schneider
Rainer Stiefelhagen
176
6
0
19 Aug 2022
COCOA: Cross Modality Contrastive Learning for Sensor Data
COCOA: Cross Modality Contrastive Learning for Sensor DataProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2022
Shohreh Deldari
Hao Xue
Aaqib Saeed
Daniel V. Smith
Flora D. Salim
SSL
143
49
0
31 Jul 2022
LocVTP: Video-Text Pre-training for Temporal Localization
LocVTP: Video-Text Pre-training for Temporal LocalizationEuropean Conference on Computer Vision (ECCV), 2022
Meng Cao
Tianyu Yang
Junwu Weng
Can Zhang
Jue Wang
Yuexian Zou
169
69
0
21 Jul 2022
GOCA: Guided Online Cluster Assignment for Self-Supervised Video
  Representation Learning
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation LearningEuropean Conference on Computer Vision (ECCV), 2022
Huseyin Coskun
Alireza Zareian
Joshua L. Moore
F. Tombari
Chen Wang
SSL
161
3
0
20 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Temporal and cross-modal attention for audio-visual zero-shot learningEuropean Conference on Computer Vision (ECCV), 2022
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
183
32
0
20 Jul 2022
SVGraph: Learning Semantic Graphs from Instructional Videos
SVGraph: Learning Semantic Graphs from Instructional VideosIEEE International Conference on Multimedia Big Data (ICMBD), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
189
5
0
16 Jul 2022
Visually-aware Acoustic Event Detection using Heterogeneous Graphs
Visually-aware Acoustic Event Detection using Heterogeneous GraphsInterspeech (Interspeech), 2022
A. Shirian
Krishna Somandepalli
Victor Sanchez
T. Guha
138
5
0
16 Jul 2022
Semi-Supervised Temporal Action Detection with Proposal-Free Masking
Semi-Supervised Temporal Action Detection with Proposal-Free MaskingEuropean Conference on Computer Vision (ECCV), 2022
Sauradip Nag
Xiatian Zhu
Yi-Zhe Song
Tao Xiang
119
20
0
14 Jul 2022
Dual Contrastive Learning for Spatio-temporal Representation
Dual Contrastive Learning for Spatio-temporal RepresentationACM Multimedia (ACM MM), 2022
Shuangrui Ding
Rui Qian
H. Xiong
AI4TSSSL
114
25
0
12 Jul 2022
Learning Music-Dance Representations through Explicit-Implicit Rhythm
  Synchronization
Learning Music-Dance Representations through Explicit-Implicit Rhythm SynchronizationIEEE transactions on multimedia (IEEE TMM), 2022
Jiashuo Yu
Junfu Pu
Ying Cheng
Rui Feng
Ying Shan
214
7
0
07 Jul 2022
SLIC: Self-Supervised Learning with Iterative Clustering for Human
  Action Videos
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action VideosComputer Vision and Pattern Recognition (CVPR), 2022
S. H. Khorasgani
Yuxuan Chen
Florian Shkurti
SSL
166
29
0
25 Jun 2022
ProtoCLIP: Prototypical Contrastive Language Image Pretraining
ProtoCLIP: Prototypical Contrastive Language Image PretrainingIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Delong Chen
Zhao Wu
Fan Liu
Zaiquan Yang
Huaxi Huang
Ying Tan
Erjin Zhou
VLMCLIP
202
28
0
22 Jun 2022
Bi-Calibration Networks for Weakly-Supervised Video Representation
  Learning
Bi-Calibration Networks for Weakly-Supervised Video Representation LearningInternational Journal of Computer Vision (IJCV), 2022
Fuchen Long
Ting Yao
Zhaofan Qiu
Xinmei Tian
Jiebo Luo
Tao Mei
190
8
0
21 Jun 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
188
34
0
20 Jun 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A SurveyACM Computing Surveys (ACM CSUR), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
394
163
0
18 Jun 2022
iBoot: Image-bootstrapped Self-Supervised Video Representation Learning
iBoot: Image-bootstrapped Self-Supervised Video Representation Learning
F. Saleh
Fuwen Tan
Adrian Bulat
Georgios Tzimiropoulos
Brais Martínez
SSL
215
1
0
16 Jun 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningNeural Information Processing Systems (NeurIPS), 2022
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
260
113
0
16 Jun 2022
A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and
  Future Directions
A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future DirectionsACM Computing Surveys (ACM CSUR), 2022
Sheng Zhou
Hongjia Xu
Zhuonan Zheng
Jiawei Chen
Zhao Li
Jiajun Bu
Jia Wu
Xin Eric Wang
Wenwu Zhu
Martin Ester
204
145
0
15 Jun 2022
It's Time for Artistic Correspondence in Music and Video
It's Time for Artistic Correspondence in Music and VideoComputer Vision and Pattern Recognition (CVPR), 2022
Dídac Surís
Carl Vondrick
Bryan C. Russell
Justin Salamon
122
42
0
14 Jun 2022
Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual
  Correspondence
Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual CorrespondenceComputer Vision and Pattern Recognition (CVPR), 2022
Mohammed Alloulah
Maximilian Arnold
SSL
276
2
0
13 Jun 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning
  on Multimodal and Temporal Data
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data
Shohreh Deldari
Hao Xue
Aaqib Saeed
Jiayuan He
Daniel V. Smith
Flora D. Salim
AI4TS
191
43
0
06 Jun 2022
Noise-Tolerant Learning for Audio-Visual Action Recognition
Noise-Tolerant Learning for Audio-Visual Action RecognitionIEEE transactions on multimedia (IEEE TMM), 2022
Haocheng Han
Qinghua Zheng
Minnan Luo
Kaiyao Miao
Feng Tian
Yuanchun Chen
NoLa
254
15
0
16 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
146
32
0
13 May 2022
AVCAffe: A Large Scale Audio-Visual Dataset of Cognitive Load and Affect
  for Remote Work
AVCAffe: A Large Scale Audio-Visual Dataset of Cognitive Load and Affect for Remote WorkAAAI Conference on Artificial Intelligence (AAAI), 2022
Pritam Sarkar
A. Posen
Ali Etemad
210
15
0
13 May 2022
TransRank: Self-supervised Video Representation Learning via
  Ranking-based Transformation Recognition
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation RecognitionComputer Vision and Pattern Recognition (CVPR), 2022
Haodong Duan
Nanxuan Zhao
Kai-xiang Chen
Dahua Lin
ViTAI4TS
158
25
0
04 May 2022
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
Mahdi M. Kalayeh
Shervin Ardeshir
Lingyi Liu
Nagendra Kamath
Ashok Chandrashekar
SSL
115
3
0
29 Apr 2022
Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype
  Contrast
Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype ContrastInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Boqing Zhu
Kele Xu
Changjian Wang
Zheng Qin
Tao Sun
Huaimin Wang
Yuxing Peng
SSL
164
22
0
28 Apr 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for
  Video-text Retrieval
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalEuropean Conference on Computer Vision (ECCV), 2022
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
133
47
0
26 Apr 2022
Contrastive Language-Action Pre-training for Temporal Localization
Contrastive Language-Action Pre-training for Temporal Localization
Mengmeng Xu
Erhan Gundogdu
⋆⋆ Maksim
Guohao Li
M. Donoser
Loris Bazzani
150
26
0
26 Apr 2022
Adversarial Contrastive Learning by Permuting Cluster Assignments
Adversarial Contrastive Learning by Permuting Cluster Assignments
Muntasir Wahed
Afrina Tabassum
Ismini Lourentzou
SSL
108
6
0
21 Apr 2022
A Survey of Video-based Action Quality Assessment
A Survey of Video-based Action Quality Assessment
Shunli Wang
Dingkang Yang
Peng Zhai
Qing Yu
Tao Suo
Zhan Sun
Ka Li
Lihua Zhang
125
20
0
20 Apr 2022
Less than Few: Self-Shot Video Instance Segmentation
Less than Few: Self-Shot Video Instance SegmentationEuropean Conference on Computer Vision (ECCV), 2022
Pengwan Yang
Yuki M. Asano
Pascal Mettes
Cees G. M. Snoek
SSL
140
2
0
19 Apr 2022
Rumor Detection with Self-supervised Learning on Texts and Social Graph
Rumor Detection with Self-supervised Learning on Texts and Social Graph
Yuan Gao
Xiang Wang
Xiangnan He
Huamin Feng
Yongdong Zhang
SSL
90
57
0
19 Apr 2022
SETTI: A Self-supervised Adversarial Malware Detection Architecture in
  an IoT Environment
SETTI: A Self-supervised Adversarial Malware Detection Architecture in an IoT Environment
Marjan Golmaryami
R. Taheri
Zahra Pooranian
Mohammad Shojafar
Pei Xiao
140
18
0
16 Apr 2022
How to Listen? Rethinking Visual Sound Localization
How to Listen? Rethinking Visual Sound LocalizationInterspeech (Interspeech), 2022
Ho-Hsiang Wu
Magdalena Fuentes
Prem Seetharaman
J. P. Bello
ObjD
90
5
0
11 Apr 2022
Frequency Selective Augmentation for Video Representation Learning
Frequency Selective Augmentation for Video Representation LearningAAAI Conference on Artificial Intelligence (AAAI), 2022
Jinhyung Kim
Taeoh Kim
Minho Shim
Dongyoon Han
Dongyoon Wee
Junmo Kim
AI4TS
187
5
0
08 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and SoundEuropean Conference on Computer Vision (ECCV), 2022
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
262
52
0
06 Apr 2022
Controllable Augmentations for Video Representation Learning
Controllable Augmentations for Video Representation Learning
Rui Qian
Weiyao Lin
John See
Dian Li
SSLAI4TS
168
14
0
30 Mar 2022
Balanced Multimodal Learning via On-the-fly Gradient Modulation
Balanced Multimodal Learning via On-the-fly Gradient ModulationComputer Vision and Pattern Recognition (CVPR), 2022
Xiaokang Peng
Yake Wei
Andong Deng
Dong Wang
Di Hu
233
322
0
29 Mar 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language PretrainingEuropean Conference on Computer Vision (ECCV), 2022
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIPVLM
252
21
0
27 Mar 2022
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?European Conference on Computer Vision (ECCV), 2022
Fida Mohammad Thoker
Hazel Doughty
Piyush Bagad
Cees G. M. Snoek
SSL
170
21
0
27 Mar 2022
Self-Supervised Predictive Learning: A Negative-Free Method for Sound
  Source Localization in Visual Scenes
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Zengjie Song
Yuxi Wang
Junsong Fan
Tieniu Tan
Zhaoxiang Zhang
SSL
143
47
0
25 Mar 2022
Reshaping Robot Trajectories Using Natural Language Commands: A Study of
  Multi-Modal Data Alignment Using Transformers
Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using TransformersIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
A. Bucker
Luis F. C. Figueredo
Sami Haddadin
Ashish Kapoor
Shuang Ma
Rogerio Bonatti
LM&Ro
197
58
0
25 Mar 2022
Previous
123456
Next