Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2207.05042
Cited By
v1
v2
v3 (latest)
Audio-Visual Segmentation
European Conference on Computer Vision (ECCV), 2022
11 July 2022
Jinxing Zhou
Jianyuan Wang
Jing Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
Re-assign community
ArXiv (abs)
PDF
HTML
Github (398★)
Papers citing
"Audio-Visual Segmentation"
50 / 111 papers shown
Learning Visual Affordance from Audio
Lidong Lu
Guo Chen
Zhu Wei
Yicheng Liu
Tong Lu
220
0
0
01 Dec 2025
Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering
Z. Fu
Changsheng Lv
Mengshi Qi
Huadong Ma
204
0
0
28 Nov 2025
Layover or Direct Flight: Rethinking Audio-Guided Image Segmentation
Joel Alberto Santos
Zongwei Wu
Xavier Alameda-Pineda
Radu Timofte
128
0
0
27 Nov 2025
MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual Learning
Kyeongha Rho
Hyeongkeun Lee
Jae-Won Cho
Joon Son Chung
100
1
0
27 Nov 2025
Referring Video Object Segmentation with Cross-Modality Proxy Queries
IEEE transactions on multimedia (TMM), 2025
Baoli Sun
Xinzhu Ma
Ning Wang
Zhihui Wang
Zhiyong Wang
VOS
519
0
0
26 Nov 2025
Decoupled Audio-Visual Dataset Distillation
Wenyuan Li
Guang Li
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
218
3
0
22 Nov 2025
Segmenting Collision Sound Sources in Egocentric Videos
Kranti Parida
Omar Emara
Hazel Doughty
Dima Damen
VOS
334
0
0
17 Nov 2025
Complementary and Contrastive Learning for Audio-Visual Segmentation
IEEE transactions on multimedia (TMM), 2025
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Pingping Zhang
Huchuan Lu
VOS
333
7
0
11 Oct 2025
SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation
Zhenjie Mao
Yuhuan Yang
Chaofan Ma
Dongsheng Jiang
Jiangchao Yao
Ya Zhang
Yanfeng Wang
157
1
0
11 Oct 2025
Video Object Segmentation-Aware Audio Generation
Ilpo Viertola
Vladimir E. Iashin
Esa Rahtu
DiffM
VOS
VGen
235
1
0
30 Sep 2025
Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation
Jinbae Seo
Hyeongjun Kwon
Kwonyoung Kim
Jiyoung Lee
Kwanghoon Sohn
VOS
312
1
0
26 Sep 2025
SimToken: A Simple Baseline for Referring Audio-Visual Segmentation
Dian Jin
Yanghao Zhou
Jinxing Zhou
Jiaqi Ma
Ruohao Guo
Dan Guo
VOS
320
4
0
22 Sep 2025
Agentic Design Review System
Sayan Nag
K. J. Joseph
Koustava Goswami
Vlad I. Morariu
Balaji Vasan Srinivasan
196
1
0
14 Aug 2025
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
Yogesh Kulkarni
Pooyan Fazli
OffRL
VGen
LRM
385
6
0
05 Aug 2025
Multimodal Referring Segmentation: A Survey
Henghui Ding
Song Tang
Shuting He
Chang-rui Liu
Zuxuan Wu
Yu-Gang Jiang
520
16
0
01 Aug 2025
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Kaining Ying
Henghui Ding
Guangquan Jie
Yu Jiang
VOS
416
7
0
30 Jul 2025
From Waveforms to Pixels: A Survey on Audio-Visual Segmentation
Jia Li
Yapeng Tian
VOS
263
3
0
29 Jul 2025
Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning
Jiong Yin
Liang-Sheng Li
Jiehua Zhang
Yuhan Gao
Chenggang Yan
Xichun Sheng
CLL
288
1
0
29 Jul 2025
Implicit Counterfactual Learning for Audio-Visual Segmentation
Mingfeng Zha
Tianyu Li
G. Wang
Peng Wang
Yangyang Wu
Yang Yang
Heng Tao Shen
VOS
CML
207
1
0
28 Jul 2025
DFR: A Decompose-Fuse-Reconstruct Framework for Multi-Modal Few-Shot Segmentation
Shuai Chen
Fanman Meng
Xiwei Zhang
Haoran Wei
Haoran Wei
Qingbo Wu
Hongliang Li
174
0
0
22 Jul 2025
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
Computer Vision and Pattern Recognition (CVPR), 2025
Yuji Wang
Haoran Xu
Yong-Jin Liu
Jiaze Li
Yansong Tang
269
15
0
02 Jun 2025
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
Yuyuan Liu
Yuanhong Chen
Chong Wang
Junlin Han
Junde Wu
Can Peng
Jingkun Chen
Yu Tian
Gustavo Carneiro
VLM
395
2
0
01 Jun 2025
Zero-Shot Pseudo Labels Generation Using SAM and CLIP for Semi-Supervised Semantic Segmentation
International Conference on Information Photonics (ICIP), 2025
Nagito Saito
Shintaro Ito
Koichi Ito
T. Aoki
VLM
MedIm
484
0
0
26 May 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
Computer Vision and Pattern Recognition (CVPR), 2025
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François Germain
Michael Jeffrey Jones
Moitreya Chatterjee
246
1
0
14 May 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
322
1
0
30 Apr 2025
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Computer Vision and Pattern Recognition (CVPR), 2025
Inho Kim
Youngkil Song
Jicheol Park
Won Hwa Kim
Suha Kwak
458
2
0
21 Apr 2025
HAVT-IVD: Heterogeneity-Aware Cross-Modal Network for Audio-Visual Surveillance: Idling Vehicles Detection With Multichannel Audio and Multiscale Visual Cues
Xiwen Li
Ross T. Whitaker
Tolga Tasdizen
422
0
0
15 Apr 2025
Aligned Better, Listen Better for Audio-Visual Large Language Models
International Conference on Learning Representations (ICLR), 2025
Yuxin Guo
Shuailei Ma
Shijie Ma
Xiaoyi Bao
Chen-Wei Xie
Kecheng Zheng
Tingyu Weng
Siyang Sun
Yun Zheng
Wei Zou
MLLM
AuLLM
406
9
0
02 Apr 2025
Visual Acoustic Fields
Yuelei Li
Hyunjin Kim
Fangneng Zhan
Ri-Zhao Qiu
Mazeyu Ji
Xiaojun Shan
Xueyan Zou
Paul Liang
Hanspeter Pfister
Xiaolong Wang
352
0
0
31 Mar 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
564
7
0
29 Mar 2025
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Computer Vision and Pattern Recognition (CVPR), 2025
Henghui Du
Guangyao Li
Chang Zhou
Chunjie Zhang
Alan Zhao
D. Hu
302
16
0
17 Mar 2025
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Computer Vision and Pattern Recognition (CVPR), 2025
Chen Liu
Peike Li
Liying Yang
Dadong Wang
Lincheng Li
Xin Yu
VOS
287
4
0
17 Mar 2025
Audio Visual Segmentation Through Text Embeddings
International Conference on Information Photonics (ICIP), 2025
Kyungbok Lee
You Zhang
Z. Duan
395
0
0
22 Feb 2025
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation
IEEE transactions on multimedia (TMM), 2025
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Yifan Wang
Pingping Zhang
Lijun Wang
Huchuan Lu
Mamba
VOS
198
19
0
14 Jan 2025
Gotta Hear Them All: Towards Sound Source Aware Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
655
6
0
23 Nov 2024
Towards Open-Vocabulary Audio-Visual Event Localization
Computer Vision and Pattern Recognition (CVPR), 2024
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Ming Wang
VLM
616
28
0
18 Nov 2024
3D Audio-Visual Segmentation
Artem Sokolov
Swapnil Bhosale
Xiatian Zhu
VOS
331
3
0
04 Nov 2024
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Neural Information Processing Systems (NeurIPS), 2024
Shentong Mo
Yibing Song
304
4
0
30 Oct 2024
Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies
Xiwen Li
Rehman Mohammed
Tristalee Mangin
Surojit Saha
Ross T. Whitaker
Kerry E Kelly
Tolga Tasdizen
266
14
0
28 Oct 2024
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Shentong Mo
Haofan Wang
309
3
0
31 Aug 2024
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation
Zili Wang
Qi Yang
Linsu Shi
Jiazhong Yu
M. Tanveer
Fei Li
Shiming Xiang
VOS
283
5
0
03 Aug 2024
Segment Anything for Videos: A Systematic Survey
Chunhui Zhang
Yawen Cui
Weilin Lin
Guanjie Huang
Yan Rong
Li Liu
Shiguang Shan
VLM
273
11
0
31 Jul 2024
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
455
10
0
18 Jul 2024
Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Juncheng Ma
Peiwen Sun
Yaoting Wang
Di Hu
VOS
379
26
0
16 Jul 2024
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Yaoting Wang
Peiwen Sun
Dongzhan Zhou
Guangyao Li
Honggang Zhang
Di Hu
VOS
371
31
0
15 Jul 2024
Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Yaoting Wang
Peiwen Sun
Yuanchao Li
Honggang Zhang
Di Hu
421
14
0
15 Jul 2024
Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
Jinxing Zhou
Dan Guo
Yuxin Mao
Yiran Zhong
Xiaojun Chang
Meng Wang
285
35
0
11 Jul 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
356
11
0
07 Jul 2024
SOAF: Scene Occlusion-aware Neural Acoustic Field
Huiyu Gao
Jiahao Ma
David Ahmedt-Aristizabal
Chuong H. Nguyen
Miaomiao Liu
453
5
0
02 Jul 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury
Sayan Nag
Subhrajyoti Dasgupta
Jun Chen
Mohamed Elhoseiny
Ruohan Gao
Dinesh Manocha
VLM
MLLM
446
29
0
01 Jul 2024
1
2
3
Next
Page 1 of 3