Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.03160
Cited By
The Sound of Pixels
9 April 2018
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Sound of Pixels"
50 / 93 papers shown
Title
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
Edson Araujo
Andrew Rouditchenko
Yuan Gong
Saurabhchand Bhati
Samuel Thomas
Brian Kingsbury
Leonid Karlinsky
Rogerio Feris
James Glass
32
0
0
02 May 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao W. Wang
Songruoyao Wu
Jiaxing Yu
K. Zhang
MGen
VGen
65
1
0
01 Apr 2025
Learning Musical Representations for Music Performance Question Answering
Xingjian Diao
Chunhui Zhang
Tingxuan Wu
Ming Cheng
Z. Ouyang
Weiyi Wu
Jiang Gui
65
5
0
10 Feb 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
37
28
0
02 Jan 2025
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
57
3
0
14 Oct 2024
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
Carlos Hernandez-Olivan
Marc Delcroix
Tsubasa Ochiai
Daisuke Niizumi
Naohiro Tawara
Tomohiro Nakatani
Shoko Araki
34
2
0
19 Sep 2024
Language-Queried Target Sound Extraction Without Parallel Training Data
Hao Ma
Zhiyuan Peng
Xu Li
Yukai Li
Mingjie Shao
Qiuqiang Kong
Ju Liu
VLM
69
1
0
14 Sep 2024
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serra
33
2
0
08 Jul 2024
SOAF: Scene Occlusion-aware Neural Acoustic Field
Huiyu Gao
Jiahao Ma
David Ahmedt-Aristizabal
Chuong H. Nguyen
Miaomiao Liu
29
2
0
02 Jul 2024
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
Yu-Fen Huang
Nikki Moran
Simon Coleman
Jon Kelly
Shun-Hwa Wei
...
Chih-Hsuan Li
Da-Yu Huang
Hsuan-Kai Kao
Ting-Wei Lin
Li Su
24
1
0
10 Jun 2024
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
19
5
0
13 Oct 2023
Deep Video Inpainting Guided by Audio-Visual Self-Supervision
Kyuyeon Kim
Junsik Jung
Woo Jae Kim
Sung-eui Yoon
SSL
23
1
0
11 Oct 2023
Multi-Task Hypergraphs for Semi-supervised Learning using Earth Observations
Mihai Cristian Pîrvu
Alina Marcu
A. Dobrescu
N. Belbachir
Marius Leordeanu
16
6
0
21 Aug 2023
Video-to-Music Recommendation using Temporal Alignment of Segments
Laure Prétet
G. Richard
Clement Souchier
Geoffroy Peeters
AI4TS
23
13
0
12 Jun 2023
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
17
1
0
07 Feb 2023
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLM
CLIP
21
24
0
14 Dec 2022
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Yating Xu
Conghui Hu
G. Lee
VGen
35
0
0
09 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
Rahul Sharma
Shrikanth Narayanan
35
8
0
01 Dec 2022
Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization
Yuanyuan Jiang
Jianqin Yin
Yonghao Dang
35
4
0
11 Oct 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo
Pedro Morgado
79
64
0
30 Aug 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
18
8
0
04 Aug 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
31
29
0
20 Jul 2022
Online Video Instance Segmentation via Robust Context Fusion
Xiang Li
Jinglu Wang
Xiaohao Xu
Bhiksha Raj
Yan Lu
30
5
0
12 Jul 2022
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
J. Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
28
110
0
11 Jul 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
40
19
0
07 Jul 2022
Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization
Jiashuo Yu
Junfu Pu
Ying Cheng
Rui Feng
Ying Shan
14
5
0
07 Jul 2022
Sound Localization by Self-Supervised Time Delay Estimation
Ziyang Chen
David Fouhey
Andrew Owens
SSL
19
19
0
26 Apr 2022
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
Marc Delcroix
Jorge Bennasar Vázquez
Tsubasa Ochiai
K. Kinoshita
Yasunori Ohishi
S. Araki
VLM
22
31
0
08 Apr 2022
Learning Neural Acoustic Fields
Andrew F. Luo
Yilun Du
Michael J. Tarr
J. Tenenbaum
Antonio Torralba
Chuang Gan
AI4CE
20
76
0
04 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
14
32
0
31 Mar 2022
The Sound of Bounding-Boxes
Takashi Oya
Shohei Iwase
Shigeo Morishima
11
2
0
30 Mar 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
29
135
0
26 Mar 2022
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
Juan F. Montesinos
V. S. Kadandale
G. Haro
ViT
23
19
0
08 Mar 2022
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
22
106
0
02 Mar 2022
Visual Acoustic Matching
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
16
55
0
14 Feb 2022
Active Audio-Visual Separation of Dynamic Sound Sources
Sagnik Majumder
Kristen Grauman
19
21
0
02 Feb 2022
Self-Supervised Beat Tracking in Musical Signals with Polyphonic Contrastive Learning
Dorian Desblancs
SSL
21
2
0
05 Jan 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks
A. Vasudevan
Dengxin Dai
Luc Van Gool
SSL
25
6
0
04 Jan 2022
Cross Modal Retrieval with Querybank Normalisation
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
19
83
0
23 Dec 2021
Soundify: Matching Sound Effects to Video
David Chuan-En Lin
Anastasis Germanidis
Cristobal Valenzuela
Yining Shi
Nikolas Martelaro
25
16
0
17 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
13
27
0
21 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Tanzila Rahman
Mengyu Yang
Leonid Sigal
ViT
21
8
0
26 Oct 2021
Taming Visually Guided Sound Generation
Vladimir E. Iashin
Esa Rahtu
VLM
28
120
0
17 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
19
0
0
13 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
224
1,018
0
13 Oct 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360
∘
^\circ
∘
Videos
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
15
78
0
11 Oct 2021
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
Rameswar Panda
Chun-Fu Chen
Quanfu Fan
Ximeng Sun
Kate Saenko
A. Oliva
Rogerio Feris
28
47
0
11 May 2021
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
Hang Zhou
Yasheng Sun
Wayne Wu
Chen Change Loy
Xiaogang Wang
Ziwei Liu
CVBM
26
360
0
22 Apr 2021
A cappella: Audio-visual Singing Voice Separation
Juan F. Montesinos
V. S. Kadandale
G. Haro
38
16
0
20 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
13
53
0
13 Apr 2021
1
2
Next