Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.06651
Cited By
Objects that Sound
18 December 2017
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Objects that Sound"
50 / 114 papers shown
Title
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
Edson Araujo
Andrew Rouditchenko
Yuan Gong
Saurabhchand Bhati
Samuel Thomas
Brian Kingsbury
Leonid Karlinsky
Rogerio Feris
James Glass
32
0
0
02 May 2025
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Inho Kim
Youngkil Song
Jicheol Park
Won Hwa Kim
Suha Kwak
22
0
0
21 Apr 2025
The Sound of Water: Inferring Physical Properties from Pouring Liquids
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
Andrew Zisserman
40
0
0
18 Nov 2024
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
D. Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
M. Wang
VLM
46
4
0
18 Nov 2024
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
Xavier Juanola
Gloria Haro
Magdalena Fuentes
31
2
0
01 Oct 2024
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serra
33
2
0
08 Jul 2024
Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation
Se-eun Yoon
Hyunsik Jeon
Julian McAuley
38
0
0
23 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
48
9
0
20 May 2024
Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
Charig Yang
Weidi Xie
Andrew Zisserman
34
1
0
25 Apr 2024
Understanding Hyperbolic Metric Learning through Hard Negative Sampling
Yun Yue
Fangzhou Lin
Guanyi Mou
Ziming Zhang
SSL
30
1
0
23 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
Synchformer: Efficient Synchronization from Sparse Cues
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
11
11
0
29 Jan 2024
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
23
17
0
27 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
21
64
0
07 Nov 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
21
9
0
25 Oct 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
Yuxin Mao
Jing Zhang
Mochu Xiang
Yiran Zhong
Yuchao Dai
35
34
0
12 Oct 2023
Deep Video Inpainting Guided by Audio-Visual Self-Supervision
Kyuyeon Kim
Junsik Jung
Woo Jae Kim
Sung-eui Yoon
SSL
23
1
0
11 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
25
3
0
10 Oct 2023
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
21
18
0
19 Sep 2023
A Multimodal Prototypical Approach for Unsupervised Sound Classification
Saksham Singh Kushwaha
Magdalena Fuentes
22
8
0
21 Jun 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOS
ViT
29
1
0
12 May 2023
Noisy Correspondence Learning with Meta Similarity Correction
Haocheng Han
Kaiyao Miao
Qinghua Zheng
Minnan Luo
19
28
0
13 Apr 2023
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
24
16
0
28 Mar 2023
LipLearner: Customizable Silent Speech Interactions on Mobile Devices
Zixiong Su
Shitao Fang
Jun Rekimoto
16
26
0
12 Feb 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Bernard Ghanem
AAML
29
8
0
03 Jan 2023
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLM
CLIP
21
24
0
14 Dec 2022
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Yating Xu
Conghui Hu
G. Lee
VGen
35
0
0
09 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
Rahul Sharma
Shrikanth Narayanan
35
8
0
01 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
23
51
0
28 Nov 2022
Unifying Tracking and Image-Video Object Detection
Peirong Liu
Rui Wang
Pengchuan Zhang
Omid Poursaeed
Yipin Zhou
Xuefei Cao
Sreya . Dutta Roy
Ashish Shah
Ser-Nam Lim
11
0
0
20 Nov 2022
Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization
Yuanyuan Jiang
Jianqin Yin
Yonghao Dang
35
5
0
11 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David F. Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
27
119
0
02 Oct 2022
Learning State-Aware Visual Representations from Audible Interactions
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
66
22
0
27 Sep 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
18
8
0
04 Aug 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
31
29
0
20 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
31
26
0
20 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
30
25
0
20 Jul 2022
SVGraph: Learning Semantic Graphs from Instructional Videos
Madeline Chantry Schiappa
Y. S. Rawat
17
4
0
16 Jul 2022
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
13
268
0
13 Jul 2022
Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection
Jiashuo Yu
Jin-Yuan Liu
Ying Cheng
Rui Feng
Yuejie Zhang
14
34
0
12 Jul 2022
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
J. Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
28
110
0
11 Jul 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
40
19
0
07 Jul 2022
Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization
Jiashuo Yu
Junfu Pu
Ying Cheng
Rui Feng
Ying Shan
14
5
0
07 Jul 2022
Visual-Assisted Sound Source Depth Estimation in the Wild
Wei Sun
L. Qiu
MDE
13
0
0
07 Jul 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
34
131
0
18 Jun 2022
Weakly-Supervised Action Detection Guided by Audio Narration
Keren Ye
Adriana Kovashka
22
0
0
12 May 2022
Self-supervised Contrastive Learning for Audio-Visual Action Recognition
Yang Liu
Y. Tan
Haoyu Lan
SSL
36
5
0
28 Apr 2022
Sound Localization by Self-Supervised Time Delay Estimation
Ziyang Chen
David Fouhey
Andrew Owens
SSL
19
19
0
26 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Mohit Bansal
Gedas Bertasius
31
39
0
06 Apr 2022
The Sound of Bounding-Boxes
Takashi Oya
Shohei Iwase
Shigeo Morishima
16
2
0
30 Mar 2022
1
2
3
Next