Objects that Sound

18 December 2017

Papers citing "Objects that Sound"

50 / 114 papers shown

Title
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment Edson Araujo Andrew Rouditchenko Yuan Gong Saurabhchand Bhati Samuel Thomas Brian Kingsbury Leonid Karlinsky Rogerio Feris James Glass 32 0 0 02 May 2025
Improving Sound Source Localization with Joint Slot Attention on Image and Audio Inho Kim Youngkil Song Jicheol Park Won Hwa Kim Suha Kwak 22 0 0 21 Apr 2025
The Sound of Water: Inferring Physical Properties from Pouring Liquids Piyush Bagad Makarand Tapaswi Cees G. M. Snoek Andrew Zisserman 40 0 0 18 Nov 2024
Towards Open-Vocabulary Audio-Visual Event Localization Jinxing Zhou D. Guo Ruohao Guo Yuxin Mao Jingjing Hu Yiran Zhong Xiaojun Chang M. Wang VLM 46 4 0 18 Nov 2024
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio Xavier Juanola Gloria Haro Magdalena Fuentes 31 2 0 01 Oct 2024
Sequential Contrastive Audio-Visual Learning Ioannis Tsiamas Santiago Pascual Chunghsin Yeh Joan Serra 33 2 0 08 Jul 2024
Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation Se-eun Yoon Hyunsik Jeon Julian McAuley 38 0 0 23 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas Ziyang Chen Daniel Geng Andrew Owens DiffM 48 9 0 20 May 2024
Made to Order: Discovering monotonic temporal changes via self-supervised video ordering Charig Yang Weidi Xie Andrew Zisserman 34 1 0 25 Apr 2024
Understanding Hyperbolic Metric Learning through Hard Negative Sampling Yun Yue Fangzhou Lin Guanyi Mou Ziming Zhang SSL 30 1 0 23 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners Yan-Bo Lin Gedas Bertasius 37 5 0 28 Mar 2024
Synchformer: Efficient Synchronization from Sparse Cues Vladimir E. Iashin Weidi Xie Esa Rahtu Andrew Zisserman 11 11 0 29 Jan 2024
A-JEPA: Joint-Embedding Predictive Architecture Can Listen Zhengcong Fei Mingyuan Fan Junshi Huang 23 17 0 27 Nov 2023
OmniVec: Learning robust representations with cross modal sharing Siddharth Srivastava Gaurav Sharma SSL 21 64 0 07 Nov 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA Asmar Nadeem Adrian Hilton R. Dawes Graham A. Thomas A. Mustafa 21 9 0 25 Oct 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation Yuxin Mao Jing Zhang Mochu Xiang Yiran Zhong Yuchao Dai 35 34 0 12 Oct 2023
Deep Video Inpainting Guided by Audio-Visual Self-Supervision Kyuyeon Kim Junsik Jung Woo Jae Kim Sung-eui Yoon SSL 23 1 0 11 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation Zhaofeng Shi Qingbo Wu Fanman Meng Linfeng Xu Hongliang Li VOS 25 3 0 10 Oct 2023
Sound Source Localization is All about Cross-Modal Alignment Arda Senocak H. Ryu Junsik Kim Tae-Hyun Oh Hanspeter Pfister Joon Son Chung 21 18 0 19 Sep 2023
A Multimodal Prototypical Approach for Unsupervised Sound Classification Saksham Singh Kushwaha Magdalena Fuentes 22 8 0 21 Jun 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer Yuhang Ling Yuxi Li Zhenye Gan Jiangning Zhang M. Chi Yabiao Wang VOS ViT 29 1 0 12 May 2023
Noisy Correspondence Learning with Meta Similarity Correction Haocheng Han Kaiyao Miao Qinghua Zheng Minnan Luo 19 28 0 13 Apr 2023
Egocentric Auditory Attention Localization in Conversations Fiona Ryan Hao Jiang Abhinav Shukla James M. Rehg V. Ithapu EgoV 24 16 0 28 Mar 2023
LipLearner: Customizable Silent Speech Interactions on Mobile Devices Zixiong Su Shitao Fang Jun Rekimoto 16 26 0 12 Feb 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition Hasan Hammoud Shuming Liu Mohammad Alkhrashi Fahad Albalawi Bernard Ghanem AAML 29 8 0 03 Jan 2023
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos Hao-Wen Dong Naoya Takahashi Yuki Mitsufuji Julian McAuley Taylor Berg-Kirkpatrick VLM CLIP 21 24 0 14 Dec 2022
Motion and Context-Aware Audio-Visual Conditioned Video Prediction Yating Xu Conghui Hu G. Lee VGen 35 0 0 09 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection Rahul Sharma Shrikanth Narayanan 35 8 0 01 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures Xixi Hu Ziyang Chen Andrew Owens 23 51 0 28 Nov 2022
Unifying Tracking and Image-Video Object Detection Peirong Liu Rui Wang Pengchuan Zhang Omid Poursaeed Yipin Zhou Xuefei Cao Sreya . Dutta Roy Ashish Shah Ser-Nam Lim 11 0 0 20 Nov 2022
Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization Yuanyuan Jiang Jianqin Yin Yonghao Dang 35 5 0 11 Oct 2022
Contrastive Audio-Visual Masked Autoencoder Yuan Gong Andrew Rouditchenko Alexander H. Liu David F. Harwath Leonid Karlinsky Hilde Kuehne James R. Glass 27 119 0 02 Oct 2022
Learning State-Aware Visual Representations from Audible Interactions Himangi Mittal Pedro Morgado Unnat Jain Abhinav Gupta 66 22 0 27 Sep 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations Xufeng Zhao C. Weber Muhammad Burhan Hafez S. Wermter 18 8 0 04 Aug 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation Efthymios Tzinis Scott Wisdom Tal Remez J. Hershey 31 29 0 20 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer? Chuhan Zhang Ankush Gupta Andrew Zisserman ViT 31 26 0 20 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning Otniel-Bogdan Mercea Thomas Hummel A. Sophia Koepke Zeynep Akata 30 25 0 20 Jul 2022
SVGraph: Learning Semantic Graphs from Instructional Videos Madeline Chantry Schiappa Y. S. Rawat 17 4 0 16 Jul 2022
Masked Autoencoders that Listen Po-Yao (Bernie) Huang Hu Xu Juncheng Billy Li Alexei Baevski Michael Auli Wojciech Galuba Florian Metze Christoph Feichtenhofer 13 268 0 13 Jul 2022
Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection Jiashuo Yu Jin-Yuan Liu Ying Cheng Rui Feng Yuejie Zhang 14 34 0 12 Jul 2022
Audio-Visual Segmentation Jinxing Zhou Jianyuan Wang J. Zhang Weixuan Sun Jing Zhang Stan Birchfield Dan Guo Lingpeng Kong Meng Wang Yiran Zhong VOS 28 110 0 11 Jul 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration Chuang Gan Yi Gu Siyuan Zhou Jeremy Schwartz S. Alter James Traer Dan Gutfreund J. Tenenbaum Josh H. McDermott Antonio Torralba 40 19 0 07 Jul 2022
Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization Jiashuo Yu Junfu Pu Ying Cheng Rui Feng Ying Shan 14 5 0 07 Jul 2022
Visual-Assisted Sound Source Depth Estimation in the Wild Wei Sun L. Qiu MDE 13 0 0 07 Jul 2022
Self-Supervised Learning for Videos: A Survey Madeline Chantry Schiappa Y. S. Rawat M. Shah SSL 34 131 0 18 Jun 2022
Weakly-Supervised Action Detection Guided by Audio Narration Keren Ye Adriana Kovashka 22 0 0 12 May 2022
Self-supervised Contrastive Learning for Audio-Visual Action Recognition Yang Liu Y. Tan Haoyu Lan SSL 36 5 0 28 Apr 2022
Sound Localization by Self-Supervised Time Delay Estimation Ziyang Chen David Fouhey Andrew Owens SSL 19 19 0 26 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound Yan-Bo Lin Jie Lei Mohit Bansal Gedas Bertasius 31 39 0 06 Apr 2022
The Sound of Bounding-Boxes Takashi Oya Shohei Iwase Shigeo Morishima 16 2 0 30 Mar 2022