Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.03160
Cited By
The Sound of Pixels
9 April 2018
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Sound of Pixels"
43 / 93 papers shown
Title
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
19
34
0
01 Apr 2021
Beyond Image to Depth: Improving Depth Prediction using Echoes
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
26
37
0
15 Mar 2021
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
Ryo Masumura
11
8
0
02 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Francisco Rivera Valverde
Juana Valeria Hurtado
Abhinav Valada
26
72
0
01 Mar 2021
Music source separation conditioned on 3D point clouds
Francesc Lluís
V. Chatziioannou
A. Hofmann
3DPC
24
5
0
03 Feb 2021
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
11
121
0
03 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis
Scott Wisdom
A. Jansen
Shawn Hershey
Tal Remez
D. Ellis
J. Hershey
26
68
0
02 Nov 2020
Listening to Sounds of Silence for Speech Denoising
Ruilin Xu
Rundi Wu
Y. Ishiwaka
Carl Vondrick
Changxi Zheng
15
32
0
22 Oct 2020
Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision
Yun-Ning Hung
G. Wichern
Jonathan Le Roux
17
12
0
22 Oct 2020
Speaker Separation Using Speaker Inventories and Estimated Speech
Peidong Wang
Zhuo Chen
DeLiang Wang
Jinyu Li
Y. Gong
30
11
0
20 Oct 2020
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
25
59
0
14 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
17
250
0
10 Aug 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
19
48
0
29 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
23
153
0
13 Jul 2020
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation
Chuang Gan
Jeremy Schwartz
S. Alter
Damian Mrowca
Martin Schrimpf
...
Antonio Torralba
J. DiCarlo
J. Tenenbaum
Josh H. McDermott
Daniel L. K. Yamins
VGen
28
303
0
09 Jul 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David F. Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
11
75
0
11 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
14
23
0
04 Jun 2020
On the Role of Visual Cues in Audiovisual Speech Enhancement
Zakaria Aldeneh
Anushree Prasanna Kumar
B. Theobald
Erik Marchi
S. Kajarekar
Devang Naik
Ahmed Hussen Abdelaziz
20
6
0
25 Apr 2020
Conditioned Source Separation for Music Instrument Performances
Olga Slizovskaia
G. Haro
E. Gómez
22
38
0
08 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
20
54
0
30 Mar 2020
Multi-channel U-Net for Music Source Separation
V. S. Kadandale
Juan F. Montesinos
G. Haro
Emilia Gómez
22
8
0
23 Mar 2020
The State of Lifelong Learning in Service Robots: Current Bottlenecks in Object Perception and Manipulation
S. Kasaei
J. Melsen
Floris van Beers
Christiaan Steenkist
K. Vončina
9
12
0
18 Mar 2020
Watching the World Go By: Representation Learning from Unlabeled Videos
Daniel Gordon
Kiana Ehsani
D. Fox
Ali Farhadi
SSL
AI4TS
8
87
0
18 Mar 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
192
205
0
23 Jan 2020
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
R. He
24
156
0
14 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
A. Tsiami
Petros Koutras
Petros Maragos
16
73
0
09 Jan 2020
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
25
251
0
10 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Bernard Ghanem
Du Tran
SSL
14
428
0
28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
17
52
0
20 Nov 2019
DEPA: Self-Supervised Audio Embedding for Depression Detection
Pingyue Zhang
Mengyue Wu
Heinrich Dinkel
Kai Yu
11
51
0
29 Oct 2019
PRNet: Self-Supervised Learning for Partial-to-Partial Registration
Yue Wang
Justin Solomon
SSL
3DPC
14
379
0
27 Oct 2019
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
19
88
0
24 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos
Kranti K. Parida
Neeraj Matiyali
T. Guha
Gaurav Sharma
VLM
14
41
0
19 Oct 2019
Learning to Have an Ear for Face Super-Resolution
Givi Meishvili
Simon Jenni
Paolo Favaro
SupR
CVBM
28
23
0
27 Sep 2019
CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement
M. Gogate
K. Dashtipour
Ahsan Adeel
Amir Hussain
13
53
0
23 Sep 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
13
91
0
30 Aug 2019
Learning Video Representations using Contrastive Bidirectional Transformer
Chen Sun
Fabien Baradel
Kevin Patrick Murphy
Cordelia Schmid
SSL
ViT
8
133
0
13 Jun 2019
Co-Separating Sounds of Visual Objects
Ruohan Gao
Kristen Grauman
14
205
0
16 Apr 2019
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
14
743
0
10 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
14
422
0
23 Mar 2018
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
19
528
0
18 Dec 2017
Creating A Multi-track Classical Musical Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications
Bochen Li
Xinzhao Liu
K. Dinesh
Z. Duan
Gaurav Sharma
21
148
0
27 Dec 2016
Previous
1
2