ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.03641
  4. Cited By
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
v1v2 (latest)

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

10 April 2018
Andrew Owens
Alexei A. Efros
    SSL
ArXiv (abs)PDFHTML

Papers citing "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"

50 / 491 papers shown
Conditioned Source Separation for Music Instrument Performances
Conditioned Source Separation for Music Instrument PerformancesIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Olga Slizovskaia
G. Haro
E. Gómez
244
43
0
08 Apr 2020
Deep Multimodal Feature Encoding for Video Ordering
Deep Multimodal Feature Encoding for Video Ordering
Vivek Sharma
Makarand Tapaswi
Rainer Stiefelhagen
171
11
0
05 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Speech2Action: Cross-modal Supervision for Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2020
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
163
59
0
30 Mar 2020
A Metric Learning Reality Check
A Metric Learning Reality CheckEuropean Conference on Computer Vision (ECCV), 2020
Kevin Musgrave
Serge J. Belongie
Ser-Nam Lim
435
504
0
18 Mar 2020
Watching the World Go By: Representation Learning from Unlabeled Videos
Watching the World Go By: Representation Learning from Unlabeled Videos
Daniel Gordon
Kiana Ehsani
Dieter Fox
Ali Farhadi
SSLAI4TS
183
92
0
18 Mar 2020
Cross modal video representations for weakly supervised active speaker
  localization
Cross modal video representations for weakly supervised active speaker localizationIEEE transactions on multimedia (TMM), 2020
Rahul Sharma
Krishna Somandepalli
Shrikanth Narayanan
175
8
0
09 Mar 2020
On Compositions of Transformations in Contrastive Self-Supervised
  Learning
On Compositions of Transformations in Contrastive Self-Supervised LearningIEEE International Conference on Computer Vision (ICCV), 2020
Mandela Patrick
Yuki M. Asano
Polina Kuznetsova
Ruth C. Fong
João F. Henriques
Geoffrey Zweig
Andrea Vedaldi
236
53
0
09 Mar 2020
Semantic Object Prediction and Spatial Sound Super-Resolution with
  Binaural Sounds
Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural SoundsEuropean Conference on Computer Vision (ECCV), 2020
A. Vasudevan
Dengxin Dai
Luc Van Gool
ObjD
206
50
0
09 Mar 2020
Evolving Losses for Unsupervised Video Representation Learning
Evolving Losses for Unsupervised Video Representation LearningComputer Vision and Pattern Recognition (CVPR), 2020
A. Piergiovanni
A. Angelova
Michael S. Ryoo
SSL
217
145
0
26 Feb 2020
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent
  Videos with Deep Learning
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep LearningIEEE transactions on multimedia (TMM), 2020
Sanchita Ghose
John J. Prevost
VGen
171
50
0
21 Feb 2020
AlignNet: A Unifying Approach to Audio-Visual Alignment
AlignNet: A Unifying Approach to Audio-Visual AlignmentIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Jianren Wang
Zhaoyuan Fang
Hang Zhao
148
42
0
12 Feb 2020
Self-Supervised Joint Encoding of Motion and Appearance for First Person
  Action Recognition
Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition
M. Planamente
A. Bottino
Barbara Caputo
EgoV
139
3
0
10 Feb 2020
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
Multi-Modal Domain Adaptation for Fine-Grained Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2020
Jonathan Munro
Dima Damen
EgoV
267
227
0
27 Jan 2020
Curriculum Audiovisual Learning
Curriculum Audiovisual Learning
Di Hu
Zechuan Wang
Haoyi Xiong
Dong Wang
Feiping Nie
Dejing Dou
SSL
129
33
0
26 Jan 2020
Audiovisual SlowFast Networks for Video Recognition
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
593
230
0
23 Jan 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A SurveyInternational Journal of Automation and Computing (IJAC), 2020
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
204
177
0
14 Jan 2020
Unsupervised Audiovisual Synthesis via Exemplar Autoencoders
Unsupervised Audiovisual Synthesis via Exemplar AutoencodersInternational Conference on Learning Representations (ICLR), 2020
Kangle Deng
Aayush Bansal
Deva Ramanan
SSLVGen
163
17
0
13 Jan 2020
Visually Guided Self Supervised Learning of Speech Representations
Visually Guided Self Supervised Learning of Speech RepresentationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Abhinav Shukla
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Maja Pantic
SSL
166
30
0
13 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
STAViS: Spatio-Temporal AudioVisual Saliency NetworkComputer Vision and Pattern Recognition (CVPR), 2020
A. Tsiami
Petros Koutras
Petros Maragos
229
81
0
09 Jan 2020
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
Look, Listen, and Act: Towards Audio-Visual Embodied NavigationIEEE International Conference on Robotics and Automation (ICRA), 2019
Chuang Gan
Yiwei Zhang
Jiajun Wu
Boqing Gong
J. Tenenbaum
214
150
0
25 Dec 2019
SoundSpaces: Audio-Visual Navigation in 3D Environments
SoundSpaces: Audio-Visual Navigation in 3D Environments
Changan Chen
Unnat Jain
Carl Schissler
S. V. A. Garí
Ziad Al-Halah
V. Ithapu
Philip Robinson
Kristen Grauman
268
28
0
24 Dec 2019
Multimodal Self-Supervised Learning for Medical Image Analysis
Multimodal Self-Supervised Learning for Medical Image AnalysisInformation Processing in Medical Imaging (IPMI), 2019
Aiham Taleb
Christoph Lippert
T. Klein
Moin Nabi
SSL
344
122
0
11 Dec 2019
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing AudioComputer Vision and Pattern Recognition (CVPR), 2019
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
322
284
0
10 Dec 2019
Self-Supervised Learning of Pretext-Invariant Representations
Self-Supervised Learning of Pretext-Invariant RepresentationsComputer Vision and Pattern Recognition (CVPR), 2019
Ishan Misra
Laurens van der Maaten
SSLVLM
343
1,561
0
04 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Self-Supervised Learning by Cross-Modal Audio-Video ClusteringNeural Information Processing Systems (NeurIPS), 2019
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
493
461
0
28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and
  Applications
Learning to Localize Sound Sources in Visual Scenes: Analysis and ApplicationsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
166
62
0
20 Nov 2019
MMTM: Multimodal Transfer Module for CNN Fusion
MMTM: Multimodal Transfer Module for CNN FusionComputer Vision and Pattern Recognition (CVPR), 2019
Hamid Reza Vaezi Joze
Amirreza Shaban
Michael L. Iuzzolino
K. Koishida
400
344
0
20 Nov 2019
Dancing to Music
Dancing to MusicNeural Information Processing Systems (NeurIPS), 2019
Hsin-Ying Lee
Xiaodong Yang
Xuan Li
Ting-Chun Wang
Yu-Ding Lu
Ming-Hsuan Yang
Jan Kautz
195
15
0
05 Nov 2019
DEPA: Self-Supervised Audio Embedding for Depression Detection
DEPA: Self-Supervised Audio Embedding for Depression DetectionACM Multimedia (ACM MM), 2019
Pingyue Zhang
Mengyue Wu
Heinrich Dinkel
Kai Yu
209
74
0
29 Oct 2019
PRNet: Self-Supervised Learning for Partial-to-Partial Registration
PRNet: Self-Supervised Learning for Partial-to-Partial RegistrationNeural Information Processing Systems (NeurIPS), 2019
Yue Wang
Justin Solomon
SSL3DPC
261
434
0
27 Oct 2019
Self-supervised Moving Vehicle Tracking with Stereo Sound
Self-supervised Moving Vehicle Tracking with Stereo SoundIEEE International Conference on Computer Vision (ICCV), 2019
Chuang Gan
Hang Zhao
Peihao Chen
David D. Cox
Antonio Torralba
165
156
0
25 Oct 2019
Controllable Attention for Structured Layered Video Decomposition
Controllable Attention for Structured Layered Video DecompositionIEEE International Conference on Computer Vision (ICCV), 2019
Jean-Baptiste Alayrac
João Carreira
Relja Arandjelović
Andrew Zisserman
102
10
0
24 Oct 2019
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio InpaintingIEEE International Conference on Computer Vision (ICCV), 2019
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
298
92
0
24 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual
  Zeroshot Classification and Retrieval of Videos
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2019
Kranti K. Parida
Neeraj Matiyali
T. Guha
Gaurav Sharma
VLM
151
48
0
19 Oct 2019
Learning to Generalize One Sample at a Time with Self-Supervision
Learning to Generalize One Sample at a Time with Self-Supervision
A. DÍnnocente
S. Bucci
Barbara Caputo
Tatiana Tommasi
SSLOOD
199
4
0
09 Oct 2019
Learning to Have an Ear for Face Super-Resolution
Learning to Have an Ear for Face Super-ResolutionComputer Vision and Pattern Recognition (CVPR), 2019
Givi Meishvili
Simon Jenni
Paolo Favaro
SupRCVBM
206
24
0
27 Sep 2019
CochleaNet: A Robust Language-independent Audio-Visual Model for Speech
  Enhancement
CochleaNet: A Robust Language-independent Audio-Visual Model for Speech EnhancementInformation Fusion (Inf. Fusion), 2019
M. Gogate
K. Dashtipour
Ahsan Adeel
Amir Hussain
150
58
0
23 Sep 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event
  Captioning
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event CaptioningIEEE International Conference on Computer Vision (ICCV), 2019
Tanzila Rahman
Bicheng Xu
Leonid Sigal
193
86
0
22 Sep 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Recursive Visual Sound Separation Using Minus-Plus NetIEEE International Conference on Computer Vision (ICCV), 2019
Xudong Xu
Bo Dai
Dahua Lin
245
93
0
30 Aug 2019
Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian
  Mixture Model
Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture ModelInternational Workshop on Machine Learning for Signal Processing (MLSP), 2019
Yoshiaki Bando
Y. Sasaki
Kazuyoshi Yoshii
BDL
106
9
0
29 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action
  Recognition
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2019
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
181
379
0
22 Aug 2019
Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality
Towards Generating Ambisonics Using Audio-Visual Cue for Virtual RealityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
A. Rana
C. Ozcinar
A. Smolic
107
31
0
16 Aug 2019
Charting the Right Manifold: Manifold Mixup for Few-shot Learning
Charting the Right Manifold: Manifold Mixup for Few-shot LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2019
Puneet Mangla
M. Singh
Abhishek Sinha
Nupur Kumari
V. Balasubramanian
Balaji Krishnamurthy
SSL
377
363
0
28 Jul 2019
Making Sense of Vision and Touch: Learning Multimodal Representations
  for Contact-Rich Tasks
Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich TasksIEEE Transactions on robotics (TRO), 2019
Michelle A. Lee
Yuke Zhu
Peter Zachares
Matthew Tan
K. Srinivasan
Silvio Savarese
Fei-Fei Li
Animesh Garg
Jeannette Bohg
SSL
246
247
0
28 Jul 2019
Multi-task Self-Supervised Learning for Human Activity Detection
Multi-task Self-Supervised Learning for Human Activity DetectionProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2019
Aaqib Saeed
T. Ozcelebi
J. Lukkien
SSL
297
312
0
27 Jul 2019
Adaptive Regularization via Residual Smoothing in Deep Learning
  Optimization
Adaptive Regularization via Residual Smoothing in Deep Learning OptimizationIEEE Access (IEEE Access), 2019
Jung-Kyun Cho
Junseok Kwon
Byung-Woo Hong
218
1
0
23 Jul 2019
My lips are concealed: Audio-visual speech enhancement through
  obstructions
My lips are concealed: Audio-visual speech enhancement through obstructionsInterspeech (Interspeech), 2019
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
167
98
0
11 Jul 2019
LPaintB: Learning to Paint from Self-Supervision
LPaintB: Learning to Paint from Self-SupervisionPacific Conference on Computer Graphics and Applications (PG), 2019
Biao Jia
Jonathan Brandt
R. Měch
Byungmoon Kim
Tianyi Zhou
SSL
122
12
0
17 Jun 2019
What Makes Training Multi-Modal Classification Networks Hard?
What Makes Training Multi-Modal Classification Networks Hard?Computer Vision and Pattern Recognition (CVPR), 2019
Weiyao Wang
Du Tran
Matt Feiszli
571
566
0
29 May 2019
Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of
  Lombard Effect
Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of Lombard EffectSpeech Communication (Speech Commun.), 2019
Daniel Michelsanti
Zheng-Hua Tan
S. Sigurðsson
Jesper Jensen
161
42
0
29 May 2019
Previous
123...1089
Next