ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.06651
  4. Cited By
Objects that Sound

Objects that Sound

18 December 2017
Relja Arandjelović
Andrew Zisserman
    ObjD
    VOS
ArXivPDFHTML

Papers citing "Objects that Sound"

50 / 114 papers shown
Title
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
Edson Araujo
Andrew Rouditchenko
Yuan Gong
Saurabhchand Bhati
Samuel Thomas
Brian Kingsbury
Leonid Karlinsky
Rogerio Feris
James Glass
32
0
0
02 May 2025
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Inho Kim
Youngkil Song
Jicheol Park
Won Hwa Kim
Suha Kwak
22
0
0
21 Apr 2025
The Sound of Water: Inferring Physical Properties from Pouring Liquids
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
Andrew Zisserman
40
0
0
18 Nov 2024
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
D. Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
M. Wang
VLM
46
4
0
18 Nov 2024
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
Xavier Juanola
Gloria Haro
Magdalena Fuentes
31
2
0
01 Oct 2024
Sequential Contrastive Audio-Visual Learning
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serra
33
2
0
08 Jul 2024
Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation
Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation
Se-eun Yoon
Hyunsik Jeon
Julian McAuley
38
0
0
23 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
48
9
0
20 May 2024
Made to Order: Discovering monotonic temporal changes via
  self-supervised video ordering
Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
Charig Yang
Weidi Xie
Andrew Zisserman
34
1
0
25 Apr 2024
Understanding Hyperbolic Metric Learning through Hard Negative Sampling
Understanding Hyperbolic Metric Learning through Hard Negative Sampling
Yun Yue
Fangzhou Lin
Guanyi Mou
Ziming Zhang
SSL
30
1
0
23 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
Synchformer: Efficient Synchronization from Sparse Cues
Synchformer: Efficient Synchronization from Sparse Cues
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
11
11
0
29 Jan 2024
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
23
17
0
27 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
21
64
0
07 Nov 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
21
9
0
25 Oct 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
Yuxin Mao
Jing Zhang
Mochu Xiang
Yiran Zhong
Yuchao Dai
35
34
0
12 Oct 2023
Deep Video Inpainting Guided by Audio-Visual Self-Supervision
Deep Video Inpainting Guided by Audio-Visual Self-Supervision
Kyuyeon Kim
Junsik Jung
Woo Jae Kim
Sung-eui Yoon
SSL
23
1
0
11 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
25
3
0
10 Oct 2023
Sound Source Localization is All about Cross-Modal Alignment
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
21
18
0
19 Sep 2023
A Multimodal Prototypical Approach for Unsupervised Sound Classification
A Multimodal Prototypical Approach for Unsupervised Sound Classification
Saksham Singh Kushwaha
Magdalena Fuentes
22
8
0
21 Jun 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOS
ViT
29
1
0
12 May 2023
Noisy Correspondence Learning with Meta Similarity Correction
Noisy Correspondence Learning with Meta Similarity Correction
Haocheng Han
Kaiyao Miao
Qinghua Zheng
Minnan Luo
19
28
0
13 Apr 2023
Egocentric Auditory Attention Localization in Conversations
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
24
16
0
28 Mar 2023
LipLearner: Customizable Silent Speech Interactions on Mobile Devices
LipLearner: Customizable Silent Speech Interactions on Mobile Devices
Zixiong Su
Shitao Fang
Jun Rekimoto
16
26
0
12 Feb 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action
  Recognition
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Bernard Ghanem
AAML
29
8
0
03 Jan 2023
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled
  Videos
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLM
CLIP
21
24
0
14 Dec 2022
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Yating Xu
Conghui Hu
G. Lee
VGen
35
0
0
09 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active
  Speaker Detection
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
Rahul Sharma
Shrikanth Narayanan
35
8
0
01 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
23
51
0
28 Nov 2022
Unifying Tracking and Image-Video Object Detection
Unifying Tracking and Image-Video Object Detection
Peirong Liu
Rui Wang
Pengchuan Zhang
Omid Poursaeed
Yipin Zhou
Xuefei Cao
Sreya . Dutta Roy
Ashish Shah
Ser-Nam Lim
11
0
0
20 Nov 2022
Leveraging the Video-level Semantic Consistency of Event for
  Audio-visual Event Localization
Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization
Yuanyuan Jiang
Jianqin Yin
Yonghao Dang
35
5
0
11 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David F. Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
27
119
0
02 Oct 2022
Learning State-Aware Visual Representations from Audible Interactions
Learning State-Aware Visual Representations from Audible Interactions
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
66
22
0
27 Sep 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides
  Representations and Explorations
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
18
8
0
04 Aug 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated
  Open-Domain On-Screen Sound Separation
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
31
29
0
20 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Is an Object-Centric Video Representation Beneficial for Transfer?
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
31
26
0
20 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Temporal and cross-modal attention for audio-visual zero-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
30
25
0
20 Jul 2022
SVGraph: Learning Semantic Graphs from Instructional Videos
SVGraph: Learning Semantic Graphs from Instructional Videos
Madeline Chantry Schiappa
Y. S. Rawat
17
4
0
16 Jul 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
13
268
0
13 Jul 2022
Modality-Aware Contrastive Instance Learning with Self-Distillation for
  Weakly-Supervised Audio-Visual Violence Detection
Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection
Jiashuo Yu
Jin-Yuan Liu
Ying Cheng
Rui Feng
Yuejie Zhang
14
34
0
12 Jul 2022
Audio-Visual Segmentation
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
J. Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
28
110
0
11 Jul 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
40
19
0
07 Jul 2022
Learning Music-Dance Representations through Explicit-Implicit Rhythm
  Synchronization
Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization
Jiashuo Yu
Junfu Pu
Ying Cheng
Rui Feng
Ying Shan
14
5
0
07 Jul 2022
Visual-Assisted Sound Source Depth Estimation in the Wild
Visual-Assisted Sound Source Depth Estimation in the Wild
Wei Sun
L. Qiu
MDE
13
0
0
07 Jul 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
34
131
0
18 Jun 2022
Weakly-Supervised Action Detection Guided by Audio Narration
Weakly-Supervised Action Detection Guided by Audio Narration
Keren Ye
Adriana Kovashka
22
0
0
12 May 2022
Self-supervised Contrastive Learning for Audio-Visual Action Recognition
Self-supervised Contrastive Learning for Audio-Visual Action Recognition
Yang Liu
Y. Tan
Haoyu Lan
SSL
36
5
0
28 Apr 2022
Sound Localization by Self-Supervised Time Delay Estimation
Sound Localization by Self-Supervised Time Delay Estimation
Ziyang Chen
David Fouhey
Andrew Owens
SSL
19
19
0
26 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Mohit Bansal
Gedas Bertasius
31
39
0
06 Apr 2022
The Sound of Bounding-Boxes
The Sound of Bounding-Boxes
Takashi Oya
Shohei Iwase
Shigeo Morishima
16
2
0
30 Mar 2022
123
Next