v1v2 (latest)

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

10 April 2018

Papers citing "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"

41 / 491 papers shown

Self-supervised audio representation learning for mobile devices

Marco Tagliasacchi

Beat Gfeller

Félix de Chaumont Quitry

Dominik Roblek

SSL AI4TS

157

24 May 2019

Speech2Face: Learning the Face Behind a VoiceComputer Vision and Pattern Recognition (CVPR), 2019

William T. Freeman

Michael Rubinstein

Wojciech Matusik

SSL CVBM

197

173

23 May 2019

Synthetic Defocus and Look-Ahead Autofocus for Casual VideographyACM Transactions on Graphics (TOG), 2019

X. Zhang

Kevin Blackburn-Matzen

You Zhang

160

15 May 2019

Self-supervised Audio Spatialization with Correspondence ClassifierInternational Conference on Information Photonics (ICIP), 2019

Yu-Ding Lu

Hsin-Ying Lee

Hung-Yu Tseng

Ming-Hsuan Yang

124

14 May 2019

Machine learning in acoustics: theory and applicationsJournal of the Acoustical Society of America (JASA), 2019

Charles-Alban Deledalle

AI4CE

305

437

11 May 2019

S4L: Self-Supervised Semi-Supervised LearningIEEE International Conference on Computer Vision (ICCV), 2019

313

844

09 May 2019

Latent Variable Algorithms for Multimodal Learning and Sensor Fusion

Lijiang Guo

DRL

23 Apr 2019

Self-Supervised Audio-Visual Co-Segmentation

Andrew Rouditchenko

Hang Zhao

Chuang Gan

Josh H. McDermott

Antonio Torralba

VLM SSL

120

107

18 Apr 2019

Audio-Visual Model Distillation Using Acoustic Images

155

16 Apr 2019

Co-Separating Sounds of Visual Objects

Ruohan Gao

Kristen Grauman

309

220

16 Apr 2019

An Analysis of Speech Enhancement and Recognition Losses in Limited Resources Multi-talker Single Channel Audio-Visual ASR

Luca Pasa

Giovanni Morrone

Leonardo Badino

129

16 Apr 2019

The Sound of Motions

Hang Zhao

Chuang Gan

Wei-Chiu Ma

Antonio Torralba

162

268

11 Apr 2019

A Simple Baseline for Audio-Visual Scene-Aware Dialog

Idan Schwartz

Alex Schwing

Tamir Hazan

200

11 Apr 2019

SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition

Bruno Korbar

Du Tran

Lorenzo Torresani

187

249

08 Apr 2019

Learning Affective Correspondence between Music and Image

Gaurav Verma

Eeshan Gunesh Dhekane

T. Guha

CVBM

251

30 Mar 2019

Consistent Dialogue Generation with Self-supervised Feature Learning

247

13 Mar 2019

Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

Longlong Jing

Yingli Tian

SSL

416

1,906

16 Feb 2019

Revisiting Self-Supervised Visual Representation Learning

462

747

25 Jan 2019

Class Activation Map Generation by Representative Class Selection and Multi-Layer Feature Fusion

Fanman Meng

23 Jan 2019

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

...

Arkadiusz Stopczynski

Cordelia Schmid

Zhonghua Xi

C. Pantofaru

551

164

05 Jan 2019

On Attention Modules for Audio-Visual Synchronization

14 Dec 2018

284

143

11 Dec 2018

An Attempt towards Interpretable Audio-Visual Video Captioning

168

07 Dec 2018

Uncertainty aware audiovisual activity recognition using deep Bayesian variational inference

182

27 Nov 2018

Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker EnvironmentsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018

192

06 Nov 2018

Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixturesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018

129

06 Nov 2018

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

Silvio Savarese

Li Fei-Fei

Animesh Garg

Jeannette Bohg

SSL

256

403

24 Oct 2018

Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

Soo-Whan Chung

Joon Son Chung

Hong-Goo Kang

198

129

21 Sep 2018

Self-Supervised Generation of Spatial Audio for 360 Video

173

191

07 Sep 2018

Single-Microphone Speech Enhancement and Separation Using Deep Learning

Morten Kolbaek

180

31 Aug 2018

Dynamic Temporal Alignment of Speech to Lips

Tavi Halperin

Ariel Ephrat

Shmuel Peleg

124

19 Aug 2018

Deep Multimodal Clustering for Unsupervised Audiovisual Learning

Di Hu

Feiping Nie

Xuelong Li

SSL

173

09 Jul 2018

Cooperative Learning of Audio and Video Models from Self-Supervised SynchronizationNeural Information Processing Systems (NeurIPS), 2018

Bruno Korbar

Du Tran

Lorenzo Torresani

366

499

30 Jun 2018

Fast forwarding Egocentric Videos by Listening and Watching

V. Furlan

R. Bajcsy

Erickson R. Nascimento

EgoV

128

12 Jun 2018

Video Description: A Survey of Methods, Datasets and Evaluation Metrics

Nayyer Aafaq

Lin Wang

Wen Liu

Syed Zulqarnain Gilani

Mubarak Shah

478

100

01 Jun 2018

The Conversation: Deep Audio-Visual Speech Enhancement

Triantafyllos Afouras

Joon Son Chung

Andrew Zisserman

261

387

11 Apr 2018

The Sound of Pixels

Hang Zhao

Chuang Gan

Andrew Rouditchenko

Carl Vondrick

Josh H. McDermott

Antonio Torralba

VLM

413

575

09 Apr 2018

Learning to Separate Object Sounds by Watching Unlabeled Video

226

296

05 Apr 2018

Audio-Visual Event Localization in Unconstrained Videos

Yapeng Tian

Jing Shi

Bochen Li

Zhiyao Duan

Chenliang Xu

358

532

23 Mar 2018

331

554

18 Dec 2017

Visual to Sound: Generating Natural Sound for Videos in the Wild

210

226

04 Dec 2017