Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.07271
Cited By
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
20 December 2017
Andrew Owens
Jiajun Wu
Josh H. McDermott
William T. Freeman
Antonio Torralba
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning"
43 / 43 papers shown
Title
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models
David Kurzendörfer
Otniel-Bogdan Mercea
A. Sophia Koepke
Zeynep Akata
VLM
CLIP
26
2
0
09 Apr 2024
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
23
51
0
28 Nov 2022
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David F. Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
27
119
0
02 Oct 2022
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Mohit Bansal
VLM
49
28
0
28 Sep 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo
Pedro Morgado
79
64
0
30 Aug 2022
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
Yanbei Chen
Massimiliano Mancini
Xiatian Zhu
Zeynep Akata
33
113
0
24 Aug 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
30
25
0
20 Jul 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
40
19
0
07 Jul 2022
Sound Localization by Self-Supervised Time Delay Estimation
Ziyang Chen
David Fouhey
Andrew Owens
SSL
19
19
0
26 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Mohit Bansal
Gedas Bertasius
33
39
0
06 Apr 2022
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
26
106
0
02 Mar 2022
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
19
7
0
02 Feb 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
103
0
16 Jan 2022
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
24
0
0
13 Oct 2021
Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or ....
Prateek Verma
AI4TS
24
2
0
07 Oct 2021
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos
Sanchita Ghose
John J. Prevost
GAN
16
26
0
20 Jul 2021
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
M. Pantic
SSL
16
53
0
16 Jun 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
19
34
0
01 Apr 2021
Listening to Sounds of Silence for Speech Denoising
Ruilin Xu
Rundi Wu
Y. Ishiwaka
Carl Vondrick
Changxi Zheng
15
32
0
22 Oct 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
14
106
0
13 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
17
250
0
10 Aug 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
21
48
0
29 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
23
153
0
13 Jul 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
14
23
0
04 Jun 2020
S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation
Yizhe Zhu
Martin Renqiang Min
Asim Kadav
H. Graf
CoGe
DRL
6
95
0
23 May 2020
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Bernard Ghanem
Du Tran
SSL
20
428
0
28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
17
52
0
20 Nov 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
16
0
0
22 Aug 2019
Learning Video Representations using Contrastive Bidirectional Transformer
Chen Sun
Fabien Baradel
Kevin Patrick Murphy
Cordelia Schmid
SSL
ViT
13
133
0
13 Jun 2019
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
15
27
0
16 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
A. Schwing
Tamir Hazan
19
69
0
11 Apr 2019
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
17
20
0
07 Dec 2018
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Samuel Albanie
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
CVBM
22
270
0
16 Aug 2018
Unsupervised learning of foreground object detection
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
OCL
16
48
0
14 Aug 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
16
743
0
10 Apr 2018
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
24
528
0
18 Dec 2017
Interpreting Deep Visual Representations via Network Dissection
Bolei Zhou
David Bau
A. Oliva
Antonio Torralba
FAtt
MILM
29
323
0
15 Nov 2017
Unsupervised Representation Learning by Sorting Sequences
Hsin-Ying Lee
Jia-Bin Huang
Maneesh Kumar Singh
Ming-Hsuan Yang
SSL
DRL
17
533
0
03 Aug 2017
Active Decision Boundary Annotation with Deep Generative Models
Miriam W. Huijser
J. C. V. Gemert
12
46
0
20 Mar 2017
Colorization as a Proxy Task for Visual Understanding
Gustav Larsson
Michael Maire
Gregory Shakhnarovich
SSL
19
493
0
11 Mar 2017
Learning Features by Watching Objects Move
Deepak Pathak
Ross B. Girshick
Piotr Dollár
Trevor Darrell
Bharath Hariharan
SSL
VOS
OCL
16
521
0
19 Dec 2016
Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction
Richard Y. Zhang
Phillip Isola
Alexei A. Efros
SSL
DRL
17
665
0
29 Nov 2016
Generating Videos with Scene Dynamics
Carl Vondrick
Hamed Pirsiavash
Antonio Torralba
GAN
VGen
69
1,460
0
08 Sep 2016
1