ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.03641
  4. Cited By
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
v1v2 (latest)

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

10 April 2018
Andrew Owens
Alexei A. Efros
    SSL
ArXiv (abs)PDFHTML

Papers citing "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"

41 / 491 papers shown
Self-supervised audio representation learning for mobile devices
Self-supervised audio representation learning for mobile devices
Marco Tagliasacchi
Beat Gfeller
Félix de Chaumont Quitry
Dominik Roblek
SSLAI4TS
157
47
0
24 May 2019
Speech2Face: Learning the Face Behind a Voice
Speech2Face: Learning the Face Behind a VoiceComputer Vision and Pattern Recognition (CVPR), 2019
Tae-Hyun Oh
Tali Dekel
Changil Kim
Inbar Mosseri
William T. Freeman
Michael Rubinstein
Wojciech Matusik
SSLCVBM
197
173
0
23 May 2019
Synthetic Defocus and Look-Ahead Autofocus for Casual Videography
Synthetic Defocus and Look-Ahead Autofocus for Casual VideographyACM Transactions on Graphics (TOG), 2019
X. Zhang
Kevin Blackburn-Matzen
Vivien Nguyen
Dillon Yao
You Zhang
Ren Ng
VGen
160
48
0
15 May 2019
Self-supervised Audio Spatialization with Correspondence Classifier
Self-supervised Audio Spatialization with Correspondence ClassifierInternational Conference on Information Photonics (ICIP), 2019
Yu-Ding Lu
Hsin-Ying Lee
Hung-Yu Tseng
Ming-Hsuan Yang
124
26
0
14 May 2019
Machine learning in acoustics: theory and applications
Machine learning in acoustics: theory and applicationsJournal of the Acoustical Society of America (JASA), 2019
Michael J. Bianco
Peter Gerstoft
James Traer
Emma Ozanich
M. Roch
Sharon Gannot
Charles-Alban Deledalle
AI4CE
305
437
0
11 May 2019
S4L: Self-Supervised Semi-Supervised Learning
S4L: Self-Supervised Semi-Supervised LearningIEEE International Conference on Computer Vision (ICCV), 2019
Xiaohua Zhai
Avital Oliver
Alexander Kolesnikov
Lucas Beyer
SSLVLM
313
844
0
09 May 2019
Latent Variable Algorithms for Multimodal Learning and Sensor Fusion
Latent Variable Algorithms for Multimodal Learning and Sensor Fusion
Lijiang Guo
DRL
85
1
0
23 Apr 2019
Self-Supervised Audio-Visual Co-Segmentation
Self-Supervised Audio-Visual Co-Segmentation
Andrew Rouditchenko
Hang Zhao
Chuang Gan
Josh H. McDermott
Antonio Torralba
VLMSSL
120
107
0
18 Apr 2019
Audio-Visual Model Distillation Using Acoustic Images
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
155
30
0
16 Apr 2019
Co-Separating Sounds of Visual Objects
Co-Separating Sounds of Visual Objects
Ruohan Gao
Kristen Grauman
309
220
0
16 Apr 2019
An Analysis of Speech Enhancement and Recognition Losses in Limited
  Resources Multi-talker Single Channel Audio-Visual ASR
An Analysis of Speech Enhancement and Recognition Losses in Limited Resources Multi-talker Single Channel Audio-Visual ASR
Luca Pasa
Giovanni Morrone
Leonardo Badino
129
3
0
16 Apr 2019
The Sound of Motions
The Sound of Motions
Hang Zhao
Chuang Gan
Wei-Chiu Ma
Antonio Torralba
162
268
0
11 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
Alex Schwing
Tamir Hazan
200
79
0
11 Apr 2019
SCSampler: Sampling Salient Clips from Video for Efficient Action
  Recognition
SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition
Bruno Korbar
Du Tran
Lorenzo Torresani
187
249
0
08 Apr 2019
Learning Affective Correspondence between Music and Image
Learning Affective Correspondence between Music and Image
Gaurav Verma
Eeshan Gunesh Dhekane
T. Guha
CVBM
251
26
0
30 Mar 2019
Consistent Dialogue Generation with Self-supervised Feature Learning
Consistent Dialogue Generation with Self-supervised Feature Learning
Yizhe Zhang
Yantao Du
Sungjin Lee
Chris Brockett
Michel Galley
Jianfeng Gao
W. Dolan
247
28
0
13 Mar 2019
Self-supervised Visual Feature Learning with Deep Neural Networks: A
  Survey
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
Longlong Jing
Yingli Tian
SSL
416
1,906
0
16 Feb 2019
Revisiting Self-Supervised Visual Representation Learning
Revisiting Self-Supervised Visual Representation Learning
Alexander Kolesnikov
Xiaohua Zhai
Lucas Beyer
SSL
462
747
0
25 Jan 2019
Class Activation Map Generation by Representative Class Selection and
  Multi-Layer Feature Fusion
Class Activation Map Generation by Representative Class Selection and Multi-Layer Feature Fusion
Fanman Meng
Kaixu Huang
Hongliang Li
Qi Wu
VLMWSOL
96
10
0
23 Jan 2019
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Joseph Roth
Sourish Chaudhuri
Ondˇrej Klejch
Radhika Marvin
Andrew C. Gallagher
...
S. Ramaswamy
Arkadiusz Stopczynski
Cordelia Schmid
Zhonghua Xi
C. Pantofaru
551
164
0
05 Jan 2019
On Attention Modules for Audio-Visual Synchronization
On Attention Modules for Audio-Visual Synchronization
Naji Khosravan
Shervin Ardeshir
R. Puri
92
24
0
14 Dec 2018
2.5D Visual Sound
2.5D Visual Sound
Ruohan Gao
Kristen Grauman
VGen
284
143
0
11 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
168
21
0
07 Dec 2018
Uncertainty aware audiovisual activity recognition using deep Bayesian
  variational inference
Uncertainty aware audiovisual activity recognition using deep Bayesian variational inference
Mahesh Subedar
R. Krishnan
P. López-Meyer
Omesh Tickoo
Jonathan Huang
BDLEDLUQCV
182
0
0
27 Nov 2018
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement
  in Multi-Talker Environments
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker EnvironmentsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018
Yufei Wang
Luca Pasa
Lantao Yu
Rohit Singh
Luciano Fadiga
L. Joppa
CVBM
192
64
0
06 Nov 2018
Bootstrapping single-channel source separation via unsupervised spatial
  clustering on stereo mixtures
Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixturesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018
Prem Seetharaman
Gordon Wichern
Jonathan Le Roux
Bryan Pardo
129
38
0
06 Nov 2018
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal
  Representations for Contact-Rich Tasks
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
Michelle A. Lee
Yuke Zhu
K. Srinivasan
Parth Shah
Silvio Savarese
Li Fei-Fei
Animesh Garg
Jeannette Bohg
SSL
256
403
0
24 Oct 2018
Perfect match: Improved cross-modal embeddings for audio-visual
  synchronisation
Perfect match: Improved cross-modal embeddings for audio-visual synchronisation
Soo-Whan Chung
Joon Son Chung
Hong-Goo Kang
198
129
0
21 Sep 2018
Self-Supervised Generation of Spatial Audio for 360 Video
Self-Supervised Generation of Spatial Audio for 360 Video
Pedro Morgado
Nuno Vasconcelos
Timothy R. Langlois
Oliver Wang
MDE
173
191
0
07 Sep 2018
Single-Microphone Speech Enhancement and Separation Using Deep Learning
Single-Microphone Speech Enhancement and Separation Using Deep Learning
Morten Kolbaek
180
7
0
31 Aug 2018
Dynamic Temporal Alignment of Speech to Lips
Dynamic Temporal Alignment of Speech to Lips
Tavi Halperin
Ariel Ephrat
Shmuel Peleg
124
43
0
19 Aug 2018
Deep Multimodal Clustering for Unsupervised Audiovisual Learning
Deep Multimodal Clustering for Unsupervised Audiovisual Learning
Di Hu
Feiping Nie
Xuelong Li
SSL
173
8
0
09 Jul 2018
Cooperative Learning of Audio and Video Models from Self-Supervised
  Synchronization
Cooperative Learning of Audio and Video Models from Self-Supervised SynchronizationNeural Information Processing Systems (NeurIPS), 2018
Bruno Korbar
Du Tran
Lorenzo Torresani
366
499
0
30 Jun 2018
Fast forwarding Egocentric Videos by Listening and Watching
Fast forwarding Egocentric Videos by Listening and Watching
V. Furlan
R. Bajcsy
Erickson R. Nascimento
EgoV
128
7
0
12 Jun 2018
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Nayyer Aafaq
Lin Wang
Wen Liu
Syed Zulqarnain Gilani
Mubarak Shah
478
100
0
01 Jun 2018
The Conversation: Deep Audio-Visual Speech Enhancement
The Conversation: Deep Audio-Visual Speech Enhancement
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
261
387
0
11 Apr 2018
The Sound of Pixels
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
413
575
0
09 Apr 2018
Learning to Separate Object Sounds by Watching Unlabeled Video
Learning to Separate Object Sounds by Watching Unlabeled Video
Ruohan Gao
Rogerio Feris
Kristen Grauman
SSL
226
296
0
05 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
358
532
0
23 Mar 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjDVOS
331
554
0
18 Dec 2017
Visual to Sound: Generating Natural Sound for Videos in the Wild
Visual to Sound: Generating Natural Sound for Videos in the Wild
Yipin Zhou
Zhaowen Wang
Chen Fang
Trung Bui
Tamara L. Berg
VGen
210
226
0
04 Dec 2017
Previous
123...1089