ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.00230
  4. Cited By
Cooperative Learning of Audio and Video Models from Self-Supervised
  Synchronization

Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

30 June 2018
Bruno Korbar
Du Tran
Lorenzo Torresani
ArXivPDFHTML

Papers citing "Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization"

50 / 137 papers shown
Title
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
54
39
0
06 Apr 2022
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
V. S. Kadandale
Juan F. Montesinos
G. Haro
27
23
0
05 Apr 2022
The Sound of Bounding-Boxes
The Sound of Bounding-Boxes
Takashi Oya
Shohei Iwase
Shigeo Morishima
19
2
0
30 Mar 2022
Localizing Visual Sounds the Easy Way
Localizing Visual Sounds the Easy Way
Shentong Mo
Pedro Morgado
26
78
0
17 Mar 2022
Object discovery and representation networks
Object discovery and representation networks
Olivier J. Hénaff
Skanda Koppula
Evan Shelhamer
Daniel Zoran
Andrew Jaegle
Andrew Zisserman
João Carreira
Relja Arandjelović
44
87
0
16 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
40
106
0
02 Mar 2022
Learning Contextually Fused Audio-visual Representations for
  Audio-visual Speech Recognition
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
Zitian Zhang
Jie Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
41
10
0
15 Feb 2022
Visual Acoustic Matching
Visual Acoustic Matching
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
21
56
0
14 Feb 2022
Visual Sound Localization in the Wild by Cross-Modal Interference
  Erasing
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
18
25
0
13 Feb 2022
Real-time Emergency Vehicle Event Detection Using Audio Data
Real-time Emergency Vehicle Event Detection Using Audio Data
Zubayer Islam
Mohamed Abdel-Aty
16
5
0
03 Feb 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery
  Detection
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
A. Haliassos
Rodrigo Mira
Stavros Petridis
M. Pantic
CVBM
40
126
0
18 Jan 2022
SS-3DCapsNet: Self-supervised 3D Capsule Networks for Medical
  Segmentation on Less Labeled Data
SS-3DCapsNet: Self-supervised 3D Capsule Networks for Medical Segmentation on Less Labeled Data
Minh-Khoi Tran
Loi Ly
Binh-Son Hua
Ngan Le
3DPC
MedIm
30
17
0
15 Jan 2022
Robust Contrastive Learning against Noisy Views
Robust Contrastive Learning against Noisy Views
Ching-Yao Chuang
R. Devon Hjelm
Xin Wang
Vibhav Vineet
Neel Joshi
Antonio Torralba
Stefanie Jegelka
Ya-heng Song
NoLa
16
68
0
12 Jan 2022
Progressive Video Summarization via Multimodal Self-supervised Learning
Progressive Video Summarization via Multimodal Self-supervised Learning
Haopeng Li
Qiuhong Ke
Mingming Gong
Tom Drummond
AI4TS
39
18
0
07 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster
  Prediction
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
52
306
0
05 Jan 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks
Sound and Visual Representation Learning with Multiple Pretraining Tasks
A. Vasudevan
Dengxin Dai
Luc Van Gool
SSL
38
6
0
04 Jan 2022
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction
  and Lip Reading
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
38
23
0
09 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation
  Learning
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge Belongie
Ming-Hsuan Yang
Hartwig Adam
Huayu Chen
AI4TS
61
6
0
08 Dec 2021
Time-Equivariant Contrastive Video Representation Learning
Time-Equivariant Contrastive Video Representation Learning
Simon Jenni
Hailin Jin
SSL
AI4TS
143
60
0
07 Dec 2021
Boosting Discriminative Visual Representation Learning with
  Scenario-Agnostic Mixup
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup
Siyuan Li
Zicheng Liu
Zedong Wang
Di Wu
Zihan Liu
Stan Z. Li
37
26
0
30 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with
  Depth and Cross Modal Attention
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
36
20
0
15 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning
  for Visual Sound Separation
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Tanzila Rahman
Mengyu Yang
Leonid Sigal
ViT
29
8
0
26 Oct 2021
Constrained Mean Shift for Representation Learning
Constrained Mean Shift for Representation Learning
Ajinkya Tejankar
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
SSL
45
0
0
19 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised
  Audiovisual Representation Learning
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
34
0
0
13 Oct 2021
Modelling Neighbor Relation in Joint Space-Time Graph for Video
  Correspondence Learning
Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
Zixu Zhao
Yueming Jin
Pheng-Ann Heng
SSL
42
21
0
28 Sep 2021
Temporal Knowledge Consistency for Unsupervised Visual Representation
  Learning
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning
Wei Feng
Yuanjiang Wang
Lihua Ma
Ye Yuan
Chi Zhang
SSL
21
13
0
24 Aug 2021
Learning to Cut by Watching Movies
Learning to Cut by Watching Movies
Alejandro Pardo
Fabian Caba Heilbron
Juan Carlos León Alcázar
Ali K. Thabet
Guohao Li
VGen
58
20
0
09 Aug 2021
Towards Long-Form Video Understanding
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
54
166
0
21 Jun 2021
LiRA: Learning Visual Speech Representations from Audio through
  Self-supervision
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
M. Pantic
SSL
24
53
0
16 Jun 2021
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
Yikang Shen
Chun-Fu Chen
Quanfu Fan
Ximeng Sun
Kate Saenko
A. Oliva
Rogerio Feris
36
47
0
11 May 2021
Contrastive Attraction and Contrastive Repulsion for Representation
  Learning
Contrastive Attraction and Contrastive Repulsion for Representation Learning
Huangjie Zheng
Xu Chen
Jiangchao Yao
Hongxia Yang
Chunyuan Li
Ya Zhang
Hao Zhang
Ivor Tsang
Jingren Zhou
Mingyuan Zhou
SSL
42
12
0
08 May 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation
  Learning
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
39
257
0
29 Apr 2021
Visually Guided Sound Source Separation and Localization using
  Self-Supervised Motion Representations
Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations
Lingyu Zhu
Esa Rahtu
26
25
0
17 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
13
55
0
13 Apr 2021
Can audio-visual integration strengthen robustness under multimodal
  attacks?
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
36
37
0
05 Apr 2021
Cross-Modal learning for Audio-Visual Video Parsing
Cross-Modal learning for Audio-Visual Video Parsing
Jatin Lamba
Abhishek
Jayaprakash Akula
Rishabh Dabral
P. Jyothi
Ganesh Ramakrishnan
13
7
0
03 Apr 2021
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Bo Xiong
Haoqi Fan
Kristen Grauman
Christoph Feichtenhofer
SSL
22
49
0
01 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
27
34
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
33
127
0
30 Mar 2021
Robust Audio-Visual Instance Discrimination
Robust Audio-Visual Instance Discrimination
Pedro Morgado
Ishan Misra
Nuno Vasconcelos
SSL
22
110
0
29 Mar 2021
Vectorization and Rasterization: Self-Supervised Learning for Sketch and
  Handwriting
Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting
A. Bhunia
Pinaki Nath Chowdhury
Yongxin Yang
Timothy M. Hospedales
Tao Xiang
Yi-Zhe Song
SSL
20
59
0
25 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
29
33
0
18 Mar 2021
Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations
Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations
Xiang Gao
Wei Hu
Guo-Jun Qi
32
6
0
01 Mar 2021
Learning Audio-Visual Correlations from Variational Cross-Modal
  Generation
Learning Audio-Visual Correlations from Variational Cross-Modal Generation
Ye Zhu
Yu Wu
Hugo Latapie
Yi Yang
Yan Yan
SSL
44
20
0
05 Feb 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual
  Video Representation Learning
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
Sangho Lee
Jiwan Chung
Youngjae Yu
Gunhee Kim
Thomas Breuel
Gal Chechik
Yale Song
71
45
0
26 Jan 2021
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Kunpeng Li
Zizhao Zhang
Guanhang Wu
Xuehan Xiong
Chen-Yu Lee
Zhichao Lu
Y. Fu
Tomas Pfister
34
5
0
11 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,434
0
04 Jan 2021
Semantic Audio-Visual Navigation
Semantic Audio-Visual Navigation
Changan Chen
Ziad Al-Halah
Kristen Grauman
50
104
0
21 Dec 2020
InferCode: Self-Supervised Learning of Code Representations by
  Predicting Subtrees
InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees
Nghi D. Q. Bui
Yijun Yu
Lingxiao Jiang
SSL
44
104
0
13 Dec 2020
A Comprehensive Study of Deep Video Action Recognition
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLM
AI4TS
38
185
0
11 Dec 2020
Previous
123
Next