ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.12667
  4. Cited By
Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

28 November 2019
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Bernard Ghanem
Du Tran
    SSL
ArXivPDFHTML

Papers citing "Self-Supervised Learning by Cross-Modal Audio-Video Clustering"

41 / 91 papers shown
Title
Exploring Temporal Granularity in Self-Supervised Video Representation
  Learning
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge J. Belongie
Ming-Hsuan Yang
Hartwig Adam
Yin Cui
AI4TS
41
6
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David F. Harwath
James R. Glass
Hilde Kuehne
ViT
23
129
0
08 Dec 2021
Time-Equivariant Contrastive Video Representation Learning
Time-Equivariant Contrastive Video Representation Learning
Simon Jenni
Hailin Jin
SSL
AI4TS
135
60
0
07 Dec 2021
TCGL: Temporal Contrastive Graph for Self-supervised Video
  Representation Learning
TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning
Yang Liu
Keze Wang
Lingbo Liu
Hao Lan
Liang Lin
SSL
AI4TS
48
113
0
07 Dec 2021
Self-supervised Video Transformer
Self-supervised Video Transformer
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
F. Khan
Michael S. Ryoo
ViT
26
84
0
02 Dec 2021
Iterative Contrast-Classify For Semi-supervised Temporal Action
  Segmentation
Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation
Dipika Singhania
R. Rahaman
Angela Yao
19
23
0
02 Dec 2021
Latent Structure Mining with Contrastive Modality Fusion for Multimedia
  Recommendation
Latent Structure Mining with Contrastive Modality Fusion for Multimedia Recommendation
Jinghao Zhang
Yanqiao Zhu
Qiang Liu
Mengqi Zhang
Shu Wu
Liang Wang
22
34
0
01 Nov 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
31
267
0
21 Oct 2021
Self-Supervised Representation Learning: Introduction, Advances and
  Challenges
Self-Supervised Representation Learning: Introduction, Advances and Challenges
Linus Ericsson
H. Gouk
Chen Change Loy
Timothy M. Hospedales
SSL
OOD
AI4TS
27
270
0
18 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised
  Audiovisual Representation Learning
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
27
0
0
13 Oct 2021
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual
  Representation Learning
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
Chongjian Ge
Youwei Liang
Yibing Song
Jianbo Jiao
Jue Wang
Ping Luo
ViT
16
36
0
11 Oct 2021
Motion-aware Contrastive Video Representation Learning via
  Foreground-background Merging
Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
Shuangrui Ding
Maomao Li
Tianyu Yang
Rui Qian
Haohang Xu
Qingyi Chen
Jue Wang
Hongkai Xiong
SSL
18
49
0
30 Sep 2021
Multi-level Feature Learning for Contrastive Multi-view Clustering
Multi-level Feature Learning for Contrastive Multi-view Clustering
Jie Xu
Huayi Tang
Yazhou Ren
Liang Peng
Xiao-lan Zhu
Lifang He
24
160
0
21 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream
  Prototypical Contrasting
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Martine Toering
Ioannis Gatopoulos
M. Stol
Vincent Tao Hu
SSL
25
11
0
18 Jun 2021
LiRA: Learning Visual Speech Representations from Audio through
  Self-supervision
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
M. Pantic
SSL
16
53
0
16 Jun 2021
Signal Transformer: Complex-valued Attention and Meta-Learning for
  Signal Recognition
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition
Yihong Dong
Ying Peng
Muqiao Yang
Songtao Lu
Qingjiang Shi
38
9
0
05 Jun 2021
WiCluster: Passive Indoor 2D/3D Positioning using WiFi without Precise
  Labels
WiCluster: Passive Indoor 2D/3D Positioning using WiFi without Precise Labels
I. Karmanov
F. G. Zanjani
S. Merlin
I. Kadampot
Daniel Dijkman
11
14
0
31 May 2021
Divide and Contrast: Self-supervised Learning from Uncurated Data
Divide and Contrast: Self-supervised Learning from Uncurated Data
Yonglong Tian
Olivier J. Hénaff
Aaron van den Oord
SSL
51
96
0
17 May 2021
CoCon: Cooperative-Contrastive Learning
CoCon: Cooperative-Contrastive Learning
Nishant Rai
Ehsan Adeli
Kuan-Hui Lee
Adrien Gaidon
Juan Carlos Niebles
SSL
18
18
0
30 Apr 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation
  Learning
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
23
257
0
29 Apr 2021
Joint Representation Learning and Novel Category Discovery on Single-
  and Multi-modal Data
Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data
Xu Jia
Kai Han
Yukun Zhu
Bradley Green
147
57
0
26 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
13
53
0
13 Apr 2021
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Bo Xiong
Haoqi Fan
Kristen Grauman
Christoph Feichtenhofer
SSL
19
49
0
01 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
19
34
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
23
127
0
30 Mar 2021
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action
  Localization
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization
Mengmeng Xu
Juan-Manuel Perez-Rua
Xiatian Zhu
Bernard Ghanem
Brais Martinez
15
27
0
28 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
16
33
0
18 Mar 2021
A Comprehensive Study of Deep Video Action Recognition
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLM
AI4TS
30
184
0
11 Dec 2020
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization
  Tasks
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Bernard Ghanem
28
123
0
23 Nov 2020
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised
  Video Representation Learning
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning
Zehua Zhang
David J. Crandall
AI4TS
SSL
23
23
0
23 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
13
121
0
03 Nov 2020
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio
  and Tags
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags
Xavier Favory
K. Drossos
Tuomas Virtanen
Xavier Serra
18
15
0
27 Oct 2020
Hard Negative Mixing for Contrastive Learning
Hard Negative Mixing for Contrastive Learning
Yannis Kalantidis
Mert Bulent Sariyildiz
Noé Pion
Philippe Weinzaepfel
Diane Larlus
SSL
27
628
0
02 Oct 2020
Understanding Self-supervised Learning with Dual Deep Networks
Understanding Self-supervised Learning with Dual Deep Networks
Yuandong Tian
Lantao Yu
Xinlei Chen
Surya Ganguli
SSL
13
78
0
01 Oct 2020
Delving into Inter-Image Invariance for Unsupervised Visual
  Representations
Delving into Inter-Image Invariance for Unsupervised Visual Representations
Jiahao Xie
Xiaohang Zhan
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
SSL
VLM
13
58
0
26 Aug 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised
  Audio-Visual Representation Learning
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
14
106
0
13 Aug 2020
What Should Not Be Contrastive in Contrastive Learning
What Should Not Be Contrastive in Contrastive Learning
Tete Xiao
Xiaolong Wang
Alexei A. Efros
Trevor Darrell
SSL
DRL
8
298
0
13 Aug 2020
Learning Video Representations from Textual Web Supervision
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
24
48
0
29 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
371
0
29 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from
  Instructional Videos
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David F. Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter
  Network
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
14
23
0
04 Jun 2020
Previous
12