Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.10558
Cited By
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
21 July 2020
Yapeng Tian
Dingzeyu Li
Chenliang Xu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing"
50 / 129 papers shown
Title
Learning to Highlight Audio by Watching Movies
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
0
0
0
17 May 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François Germain
Michael Jeffrey Jones
Moitreya Chatterjee
26
0
0
14 May 2025
Audio-visual Event Localization on Portrait Mode Short Videos
Wuyang Liu
Yi Chai
Yongpeng Yan
Yanzhen Ren
32
0
0
09 Apr 2025
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection
Peng Wu
Wanshun Su
Guansong Pang
Yujia Sun
Qingsen Yan
Peng Wang
Yinger Zhang
VLM
55
0
0
06 Apr 2025
Aligned Better, Listen Better for Audio-Visual Large Language Models
Yuxin Guo
Shuailei Ma
Shijie Ma
Xiaoyi Bao
Chen-Wei Xie
Kecheng Zheng
Tingyu Weng
Siyang Sun
Yun Zheng
Wei Zou
MLLM
AuLLM
62
2
0
02 Apr 2025
Continual Cross-Modal Generalization
Yan Xia
Hai Huang
Minghui Fang
Zhou Zhao
CLL
54
0
0
01 Apr 2025
Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning
Sashuai Zhou
Hai Huang
Yan Xia
MoMe
MoE
77
0
0
26 Mar 2025
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du
Guangyao Li
Chang Zhou
Chunjie Zhang
Alan Zhao
D. Hu
59
0
0
17 Mar 2025
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
E. Shaar
Ariel Shaulov
Gal Chechik
Lior Wolf
VLM
41
0
0
17 Mar 2025
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
Hao Zhou
Xiaobao Guo
Yuzhe Zhu
A. Kong
DiffM
63
1
0
13 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Y. Guo
67
3
0
13 Mar 2025
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
Ali Vosoughi
Dimitra Emmanouilidou
H. Gamper
55
0
0
12 Mar 2025
Learning Musical Representations for Music Performance Question Answering
Xingjian Diao
Chunhui Zhang
Tingxuan Wu
Ming Cheng
Z. Ouyang
Weiyi Wu
Jiang Gui
73
7
0
10 Feb 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
40
28
0
02 Jan 2025
Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing
Inpyo Hong
Youngwan Jo
Hyojeong Lee
Sunghyun Ahn
Sanghyun Park
MQ
49
2
0
26 Dec 2024
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
Pengcheng Zhao
Jinxing Zhou
Yang Zhao
Dan Guo
Yanxiang Chen
90
2
0
15 Dec 2024
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Hao Wu
VLM
58
4
0
18 Nov 2024
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Tianyu Yang
Yiyang Nan
Lisen Dai
Zhenwen Liang
Yapeng Tian
Xuzhi Zhang
39
0
0
07 Nov 2024
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Shentong Mo
Yibing Song
25
0
0
30 Oct 2024
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Shentong Mo
Haofan Wang
33
2
0
31 Aug 2024
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Guangyao Li
Henghui Du
Di Hu
29
4
0
30 Jul 2024
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo
Pedro Morgado
33
5
0
18 Jul 2024
Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
Jinxing Zhou
Dan Guo
Yuxin Mao
Yiran Zhong
Xiaojun Chang
Meng Wang
41
12
0
11 Jul 2024
Semantic Grouping Network for Audio Source Separation
Shentong Mo
Yapeng Tian
36
4
0
04 Jul 2024
SAVE: Segment Audio-Visual Easy way using Segment Anything Model
Khanh-Binh Nguyen
Chae Jung Park
VLM
VOS
42
1
0
02 Jul 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury
Sayan Nag
Subhrajyoti Dasgupta
Jun Chen
Mohamed Elhoseiny
Ruohan Gao
Dinesh Manocha
VLM
MLLM
41
9
0
01 Jul 2024
Extending Segment Anything Model into Auditory and Temporal Dimensions for Audio-Visual Segmentation
Juhyeong Seon
Woobin Im
Sebin Lee
Jumin Lee
Sung-eui Yoon
40
1
0
10 Jun 2024
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Tanvir Mahmud
Shentong Mo
Yapeng Tian
Diana Marculescu
34
4
0
07 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
39
3
0
06 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
102
16
0
06 Jun 2024
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
61
18
0
03 Jun 2024
CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing
Faegheh Sardari
A. Mustafa
Philip J. B. Jackson
Adrian Hilton
26
3
0
17 May 2024
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering
Yuanyuan Jiang
Jianqin Yin
45
1
0
13 May 2024
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk
Jaesung Huh
Evangelos Kazakos
Andrew Zisserman
Dima Damen
40
9
0
08 Apr 2024
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
Tiantian Geng
Teng Wang
Yanfu Zhang
Jinming Duan
Weili Guan
Feng Zheng
29
2
0
04 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue
Yunlong Tang
Daiki Shimada
Jing Bi
Chenliang Xu
VGen
42
17
0
24 Mar 2024
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition
Shijian Deng
Erin E. Kosloski
Siddhi Patel
Zeke A. Barnett
Yiyang Nan
...
William T. Doan
Matthew Wang
Harsh Singh
P. Rollins
Yapeng Tian
39
4
0
22 Mar 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya-Qin Zhang
Yanfeng Wang
39
10
0
17 Mar 2024
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Qilang Ye
Zitong Yu
Xin Liu
38
1
0
11 Mar 2024
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
39
17
0
08 Mar 2024
Enhancing Multimodal Unified Representations for Cross Modal Generalization
Hai Huang
Yan Xia
Shengpeng Ji
Shulei Wang
Hanting Wang
Minghui Fang
Jieming Zhu
Zhenhua Dong
Sashuai Zhou
Zhou Zhao
31
6
0
08 Mar 2024
SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers
Zheng Ning
Brianna L Wimer
Kaiwen Jiang
Keyi Chen
Jerrick Ban
Yapeng Tian
Yuhang Zhao
T. Li
39
15
0
11 Feb 2024
Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics
Pengcheng Zhao
Yanxiang Chen
Yang Zhao
Wei Jia
Zhao Zhang
Ronggang Wang
Richang Hong
DiffM
22
1
0
24 Jan 2024
Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition
Yukun Zuo
Hantao Yao
Liansheng Zhuang
Changsheng Xu
15
2
0
11 Jan 2024
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo
Pedro Morgado
21
13
0
02 Dec 2023
Weakly-Supervised Audio-Visual Segmentation
Shentong Mo
Bhiksha Raj
VOS
43
12
0
25 Nov 2023
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
Yating Xu
Conghui Hu
Gim Hee Lee
17
2
0
14 Nov 2023
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks
Haoyi Duan
Yan Xia
Mingze Zhou
Li Tang
Jieming Zhu
Zhou Zhao
VLM
21
17
0
09 Nov 2023
1
2
3
Next