v1v2 (latest)

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

10 April 2018

Papers citing "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"

50 / 491 papers shown

Conditioned Source Separation for Music Instrument PerformancesIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020

Olga Slizovskaia

G. Haro

E. Gómez

244

08 Apr 2020

Deep Multimodal Feature Encoding for Video Ordering

Vivek Sharma

Makarand Tapaswi

Rainer Stiefelhagen

171

05 Apr 2020

Speech2Action: Cross-modal Supervision for Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2020

163

30 Mar 2020

A Metric Learning Reality CheckEuropean Conference on Computer Vision (ECCV), 2020

Kevin Musgrave

Serge J. Belongie

Ser-Nam Lim

435

504

18 Mar 2020

Watching the World Go By: Representation Learning from Unlabeled Videos

183

18 Mar 2020

Cross modal video representations for weakly supervised active speaker localizationIEEE transactions on multimedia (TMM), 2020

Rahul Sharma

Krishna Somandepalli

Shrikanth Narayanan

175

09 Mar 2020

On Compositions of Transformations in Contrastive Self-Supervised LearningIEEE International Conference on Computer Vision (ICCV), 2020

João F. Henriques

Andrea Vedaldi

236

09 Mar 2020

Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural SoundsEuropean Conference on Computer Vision (ECCV), 2020

A. Vasudevan

Dengxin Dai

Luc Van Gool

ObjD

206

09 Mar 2020

Evolving Losses for Unsupervised Video Representation LearningComputer Vision and Pattern Recognition (CVPR), 2020

217

145

26 Feb 2020

AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep LearningIEEE transactions on multimedia (TMM), 2020

Sanchita Ghose

John J. Prevost

VGen

171

21 Feb 2020

AlignNet: A Unifying Approach to Audio-Visual AlignmentIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020

Jianren Wang

Zhaoyuan Fang

Hang Zhao

148

12 Feb 2020

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

139

10 Feb 2020

Multi-Modal Domain Adaptation for Fine-Grained Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2020

Jonathan Munro

Dima Damen

EgoV

267

227

27 Jan 2020

Curriculum Audiovisual Learning

Di Hu

Zechuan Wang

Haoyi Xiong

Dong Wang

Feiping Nie

Dejing Dou

SSL

129

26 Jan 2020

Audiovisual SlowFast Networks for Video Recognition

Christoph Feichtenhofer

593

230

23 Jan 2020

Deep Audio-Visual Learning: A SurveyInternational Journal of Automation and Computing (IJAC), 2020

204

177

14 Jan 2020

Unsupervised Audiovisual Synthesis via Exemplar AutoencodersInternational Conference on Learning Representations (ICLR), 2020

163

13 Jan 2020

Visually Guided Self Supervised Learning of Speech RepresentationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

Abhinav Shukla

Konstantinos Vougioukas

166

13 Jan 2020

STAViS: Spatio-Temporal AudioVisual Saliency NetworkComputer Vision and Pattern Recognition (CVPR), 2020

A. Tsiami

Petros Koutras

Petros Maragos

229

09 Jan 2020

Look, Listen, and Act: Towards Audio-Visual Embodied NavigationIEEE International Conference on Robotics and Automation (ICRA), 2019

Chuang Gan

Yiwei Zhang

Jiajun Wu

Boqing Gong

J. Tenenbaum

214

150

25 Dec 2019

SoundSpaces: Audio-Visual Navigation in 3D Environments

268

24 Dec 2019

Multimodal Self-Supervised Learning for Medical Image AnalysisInformation Processing in Medical Imaging (IPMI), 2019

344

122

11 Dec 2019

Listen to Look: Action Recognition by Previewing AudioComputer Vision and Pattern Recognition (CVPR), 2019

322

284

10 Dec 2019

Self-Supervised Learning of Pretext-Invariant RepresentationsComputer Vision and Pattern Recognition (CVPR), 2019

Ishan Misra

Laurens van der Maaten

SSL VLM

343

1,561

04 Dec 2019

Self-Supervised Learning by Cross-Modal Audio-Video ClusteringNeural Information Processing Systems (NeurIPS), 2019

493

461

28 Nov 2019

Learning to Localize Sound Sources in Visual Scenes: Analysis and ApplicationsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019

Arda Senocak

Tae-Hyun Oh

Junsik Kim

Ming-Hsuan Yang

In So Kweon

SSL

166

20 Nov 2019

MMTM: Multimodal Transfer Module for CNN FusionComputer Vision and Pattern Recognition (CVPR), 2019

Hamid Reza Vaezi Joze

Amirreza Shaban

Michael L. Iuzzolino

K. Koishida

400

344

20 Nov 2019

Dancing to MusicNeural Information Processing Systems (NeurIPS), 2019

Hsin-Ying Lee

Ming-Hsuan Yang

195

05 Nov 2019

DEPA: Self-Supervised Audio Embedding for Depression DetectionACM Multimedia (ACM MM), 2019

209

29 Oct 2019

PRNet: Self-Supervised Learning for Partial-to-Partial RegistrationNeural Information Processing Systems (NeurIPS), 2019

Yue Wang

Justin Solomon

SSL 3DPC

261

434

27 Oct 2019

Self-supervised Moving Vehicle Tracking with Stereo SoundIEEE International Conference on Computer Vision (ICCV), 2019

Chuang Gan

Hang Zhao

Peihao Chen

David D. Cox

Antonio Torralba

165

156

25 Oct 2019

Controllable Attention for Structured Layered Video DecompositionIEEE International Conference on Computer Vision (ICCV), 2019

Jean-Baptiste Alayrac

João Carreira

Relja Arandjelović

Andrew Zisserman

102

24 Oct 2019

Vision-Infused Deep Audio InpaintingIEEE International Conference on Computer Vision (ICCV), 2019

298

24 Oct 2019

Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2019

151

19 Oct 2019

Learning to Generalize One Sample at a Time with Self-Supervision

199

09 Oct 2019

Learning to Have an Ear for Face Super-ResolutionComputer Vision and Pattern Recognition (CVPR), 2019

206

27 Sep 2019

CochleaNet: A Robust Language-independent Audio-Visual Model for Speech EnhancementInformation Fusion (Inf. Fusion), 2019

M. Gogate

K. Dashtipour

Ahsan Adeel

Amir Hussain

150

23 Sep 2019

Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event CaptioningIEEE International Conference on Computer Vision (ICCV), 2019

Tanzila Rahman

Bicheng Xu

Leonid Sigal

193

22 Sep 2019

Recursive Visual Sound Separation Using Minus-Plus NetIEEE International Conference on Computer Vision (ICCV), 2019

Xudong Xu

Bo Dai

Dahua Lin

245

30 Aug 2019

Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture ModelInternational Workshop on Machine Learning for Signal Processing (MLSP), 2019

106

29 Aug 2019

EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2019

Dima Damen

181

379

22 Aug 2019

Towards Generating Ambisonics Using Audio-Visual Cue for Virtual RealityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

A. Rana

C. Ozcinar

A. Smolic

107

16 Aug 2019

Charting the Right Manifold: Manifold Mixup for Few-shot LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2019

377

363

28 Jul 2019

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich TasksIEEE Transactions on robotics (TRO), 2019

Silvio Savarese

246

247

28 Jul 2019

Multi-task Self-Supervised Learning for Human Activity DetectionProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2019

297

312

27 Jul 2019

Adaptive Regularization via Residual Smoothing in Deep Learning OptimizationIEEE Access (IEEE Access), 2019

Jung-Kyun Cho

Junseok Kwon

Byung-Woo Hong

218

23 Jul 2019

My lips are concealed: Audio-visual speech enhancement through obstructionsInterspeech (Interspeech), 2019

Triantafyllos Afouras

Joon Son Chung

Andrew Zisserman

167

11 Jul 2019

LPaintB: Learning to Paint from Self-SupervisionPacific Conference on Computer Graphics and Applications (PG), 2019

122

17 Jun 2019

What Makes Training Multi-Modal Classification Networks Hard?Computer Vision and Pattern Recognition (CVPR), 2019

Weiyao Wang

Du Tran

Matt Feiszli

571

566

29 May 2019

Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of Lombard EffectSpeech Communication (Speech Commun.), 2019

161

29 May 2019