Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1610.09001
Cited By
SoundNet: Learning Sound Representations from Unlabeled Video
27 October 2016
Y. Aytar
Carl Vondrick
Antonio Torralba
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SoundNet: Learning Sound Representations from Unlabeled Video"
50 / 120 papers shown
Title
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
13
27
0
21 Nov 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
31
267
0
21 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
19
0
0
13 Oct 2021
Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or ....
Prateek Verma
AI4TS
21
2
0
07 Oct 2021
Understanding and Improving Usability of Data Dashboards for Simplified Privacy Control of Voice Assistant Data (Extended Version)
Vandit Sharma
Mainack Mondal
9
3
0
06 Oct 2021
Parsing Birdsong with Deep Audio Embeddings
Irina Tolkova
Brian Chu
Marcel Hedman
Stefan Kahl
Holger Klinck
28
10
0
20 Aug 2021
LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector
Xiaoyang Guo
Shaoshuai Shi
Xiaogang Wang
Hongsheng Li
3DPC
23
106
0
18 Aug 2021
Cross-modal Spectrum Transformation Network For Acoustic Scene classification
Yang Liu
A. Neophytou
Sunando Sengupta
Eric Sommerlade
11
9
0
13 Aug 2021
DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs
J. Nistal
Stefan Lattner
G. Richard
19
8
0
03 Aug 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
25
539
0
30 Jun 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
19
34
0
01 Apr 2021
Slow-Fast Auditory Streams For Audio Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
8
66
0
05 Mar 2021
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
Ryo Masumura
13
8
0
02 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Francisco Rivera Valverde
Juana Valeria Hurtado
Abhinav Valada
26
72
0
01 Mar 2021
Environment Transfer for Distributed Systems
Chunheng Jiang
Jae-wook Ahn
N. Desai
26
1
0
06 Jan 2021
Learning to dance: A graph convolutional adversarial network to generate realistic dance motions from audio
João P. Ferreira
Thiago M. Coutinho
Thiago L. Gomes
J. F. Neto
Rafael Azevedo
Renato Martins
Erickson R. Nascimento
GAN
28
68
0
25 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
11
121
0
03 Nov 2020
Listening to Sounds of Silence for Speech Denoising
Ruilin Xu
Rundi Wu
Y. Ishiwaka
Carl Vondrick
Changxi Zheng
15
32
0
22 Oct 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
19
48
0
29 Jul 2020
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
19
101
0
28 Jul 2020
Rethinking CNN Models for Audio Classification
Kamalesh Palanisamy
Dipika Singhania
Angela Yao
SSL
17
144
0
22 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
23
153
0
13 Jul 2020
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
371
0
29 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David F. Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
14
23
0
04 Jun 2020
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
Kazi Nazmul Haque
R. Rana
Björn W Schuller
DRL
24
12
0
01 Jun 2020
Multimodal Target Speech Separation with Voice and Face References
Leyuan Qu
C. Weber
S. Wermter
CVBM
19
19
0
17 May 2020
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
M. S. Saeed
Shah Nawaz
Pietro Morerio
Arif Mahmood
I. Gallo
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
19
25
0
28 Apr 2020
Conditioned Source Separation for Music Instrument Performances
Olga Slizovskaia
G. Haro
E. Gómez
22
38
0
08 Apr 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
11
88
0
20 Feb 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
192
205
0
23 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
A. Tsiami
Petros Koutras
Petros Maragos
16
73
0
09 Jan 2020
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
27
251
0
10 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Bernard Ghanem
Du Tran
SSL
17
428
0
28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
17
52
0
20 Nov 2019
Deep Long Audio Inpainting
Ya-Liang Chang
Kuan-Ying Lee
Po-Yu Wu
Hung-yi Lee
Winston H. Hsu
17
33
0
15 Nov 2019
DEPA: Self-Supervised Audio Embedding for Depression Detection
Pingyue Zhang
Mengyue Wu
Heinrich Dinkel
Kai Yu
11
51
0
29 Oct 2019
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
21
88
0
24 Oct 2019
Contrastive Representation Distillation
Yonglong Tian
Dilip Krishnan
Phillip Isola
24
1,029
0
23 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos
Kranti K. Parida
Neeraj Matiyali
T. Guha
Gaurav Sharma
VLM
14
41
0
19 Oct 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
17
77
0
22 Sep 2019
Multimodal Deep Models for Predicting Affective Responses Evoked by Movies
Ha Thi Phuong Thao
Dorien Herremans
Gemma Roig
21
16
0
16 Sep 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
13
91
0
30 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
14
0
0
22 Aug 2019
Multi-task Self-Supervised Learning for Human Activity Detection
Aaqib Saeed
T. Ozcelebi
J. Lukkien
SSL
6
268
0
27 Jul 2019
Adaptive Regularization via Residual Smoothing in Deep Learning Optimization
Jung-Kyun Cho
Junseok Kwon
Byung-Woo Hong
21
1
0
23 Jul 2019
Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction
Mohammed Senoussaoui
P. Cardinal
Alessandro Lameiras Koerich
14
2
0
06 Jul 2019
Learning Video Representations using Contrastive Bidirectional Transformer
Chen Sun
Fabien Baradel
Kevin Patrick Murphy
Cordelia Schmid
SSL
ViT
8
133
0
13 Jun 2019
Learning Individual Styles of Conversational Gesture
Shiry Ginosar
Amir Bar
Gefen Kohavi
Caroline Chan
Andrew Owens
Jitendra Malik
SLR
12
326
0
10 Jun 2019
End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network
Sajjad Abdoli
P. Cardinal
Alessandro Lameiras Koerich
34
269
0
18 Apr 2019
Previous
1
2
3
Next