ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.08842
  4. Cited By
Audio-Visual Event Localization in Unconstrained Videos

Audio-Visual Event Localization in Unconstrained Videos

23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
ArXiv (abs)PDFHTML

Papers citing "Audio-Visual Event Localization in Unconstrained Videos"

50 / 296 papers shown
Title
Distilling Cross-Modal Knowledge via Feature Disentanglement
Distilling Cross-Modal Knowledge via Feature Disentanglement
Junhong Liu
Yuan Zhang
Tao Huang
Wenchao Xu
Renyu Yang
81
0
0
25 Nov 2025
Decoupled Audio-Visual Dataset Distillation
Decoupled Audio-Visual Dataset Distillation
Wenyuan Li
Guang Li
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
90
0
0
22 Nov 2025
R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios
R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios
Lu Zhu
Tiantian Geng
Yangye Chen
Teng Wang
Ping Lu
Feng Zheng
AI4TS
197
0
0
21 Nov 2025
Real-Time Inference for Distributed Multimodal Systems under Communication Delay Uncertainty
Victor Croisfelt
João Henrique Inacio de Souza
Shashi Raj Pandey
B. Soret
P. Popovski
149
0
0
20 Nov 2025
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Cheng Yang
Haiyuan Wan
Yiran Peng
Xin Cheng
Zhaoyang Yu
...
Junchi Yu
Xinlei Yu
Xiawu Zheng
D. Zhou
Chenglin Wu
ReLMLRM
258
0
0
19 Nov 2025
PrAda-GAN: A Private Adaptive Generative Adversarial Network with Bayes Network Structure
PrAda-GAN: A Private Adaptive Generative Adversarial Network with Bayes Network Structure
Ke Jia
Yuheng Ma
Yang Li
Feifei Wang
92
2
0
11 Nov 2025
Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation
Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation
Evelyn Chee
Wynne Hsu
Mong-Li Lee
CLLKELM
272
0
0
10 Nov 2025
Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization
Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization
Heshan Devaka Fernando
Parikshit Ram
Yi Zhou
Soham Dan
Horst Samulowitz
Nathalie Baracaldo
Tianyi Chen
149
0
0
10 Nov 2025
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
Ziyu Guo
Xinyan Chen
Renrui Zhang
Ruichuan An
Yu Qi
Dongzhi Jiang
Xiangtai Li
M. Zhang
Jiaming Song
Pheng-Ann Heng
VGenLRM
120
8
0
30 Oct 2025
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Hossein R. Nowdeh
Jie Ji
Xiaolong Ma
Fatemeh Afghah
76
0
0
28 Oct 2025
Empower Words: DualGround for Structured Phrase and Sentence-Level Temporal Grounding
Empower Words: DualGround for Structured Phrase and Sentence-Level Temporal Grounding
Minseok Kang
M. Lee
Minjung Kim
Donghyeong Kim
Sangyoun Lee
77
0
0
23 Oct 2025
AV-Master: Dual-Path Comprehensive Perception Makes Better Audio-Visual Question Answering
AV-Master: Dual-Path Comprehensive Perception Makes Better Audio-Visual Question Answering
Jiayu Zhang
Qilang Ye
Shuo Ye
Xun Lin
Zihan Song
Zitong Yu
88
0
0
21 Oct 2025
Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning
Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning
Zhaocheng Liu
Zhiwen Yu
Xiaoqing Liu
136
0
0
20 Oct 2025
Not in Sync: Unveiling Temporal Bias in Audio Chat Models
Not in Sync: Unveiling Temporal Bias in Audio Chat Models
Jiayu Yao
Shenghua Liu
Yiwei Wang
Rundong Cheng
Lingrui Mei
Baolong Bi
Zhen Xiong
Xueqi Cheng
80
0
0
14 Oct 2025
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
Haomiao Chen
K. Jamison
M. Sabuncu
Amy Kuceyeski
108
0
0
07 Oct 2025
Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios
Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios
Jinghan Xu Yuyang Zhang Qixuan Cai Jiancheng Chen Keqiu Li
MU
68
0
0
28 Sep 2025
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
Chao Huang
Susan Liang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
107
0
0
26 Sep 2025
Shaping Initial State Prevents Modality Competition in Multi-modal Fusion: A Two-stage Scheduling Framework via Fast Partial Information Decomposition
Shaping Initial State Prevents Modality Competition in Multi-modal Fusion: A Two-stage Scheduling Framework via Fast Partial Information Decomposition
Jiaqi Tang
Yinsong Xu
Yang Liu
Qingchao Chen
111
0
0
25 Sep 2025
Learning from Silence and Noise for Visual Sound Source Localization
Learning from Silence and Noise for Visual Sound Source Localization
Xavier Juanola
G. Morais
Magdalena Fuentes
Gloria Haro
SSL
132
0
0
29 Aug 2025
VGGSounder: Audio-Visual Evaluations for Foundation Models
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev
Thaddäus Wiedemer
Christian Schroeder de Witt
Matthias Bethge
Wieland Brendel
A. Sophia Koepke
AuLLM
187
3
0
11 Aug 2025
ASAudio: A Survey of Advanced Spatial Audio Research
ASAudio: A Survey of Advanced Spatial Audio Research
Zhiyuan Zhu
Yu Zhang
Wenxiang Guo
Changhao Pan
Zhou Zhao
125
1
0
08 Aug 2025
CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization
CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization
Jinxing Zhou
Ziheng Zhou
Yanghao Zhou
Yuxin Mao
Zhangling Duan
Dan Guo
96
1
0
06 Aug 2025
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
Yuyang Liu
Qiuhe Hong
Linlan Huang
Alexandra Gomez-Villa
Dipam Goswami
Xialei Liu
Joost van de Weijer
Yonghong Tian
CLLKELMVLM
169
0
0
06 Aug 2025
Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
Jinxing Zhou
Yanghao Zhou
Mingfei Han
Tong Wang
Xiaojun Chang
Hisham Cholakkal
Rao Muhammad Anwer
VOSLRM
118
1
0
06 Aug 2025
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
Yogesh Kulkarni
Pooyan Fazli
OffRLLRM
196
4
0
05 Aug 2025
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
Mahnoor Fatima Saad
Ziad Al-Halah
VGen
60
1
0
04 Aug 2025
Hybrid Hypergraph Networks for Multimodal Sequence Data Classification
Hybrid Hypergraph Networks for Multimodal Sequence Data Classification
Feng Xu
Hui Wang
Yuting Huang
Danwei Zhang
Zizhu Fan
86
0
0
30 Jul 2025
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Kaining Ying
Henghui Ding
Guangquan Jie
Yu Jiang
VOS
253
5
0
30 Jul 2025
DAMS:Dual-Branch Adaptive Multiscale Spatiotemporal Framework for Video Anomaly Detection
DAMS:Dual-Branch Adaptive Multiscale Spatiotemporal Framework for Video Anomaly Detection
Dezhi An
Wenqiang Liu
Kefan Wang
Zening chen
Jun Lu
Shengcai Zhang
89
0
0
28 Jul 2025
Improving Multimodal Learning via Imbalanced Learning
Improving Multimodal Learning via Imbalanced Learning
Shicai Wei
Chunbo Luo
Yang Luo
141
2
0
14 Jul 2025
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
Huilai Li
Yonghao Dang
Ying Xing
Yiming Wang
Jianqin Yin
131
0
0
14 Jul 2025
MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
Langyu Wang
Bingke Zhu
Yingying Chen
Yiyuan Zhang
Ming Tang
Jinqiao Wang
VLM
260
1
0
02 Jul 2025
Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment
Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment
Yue Zhang
Jilei Sun
Yunhui Guo
Vibhav Gogate
LRM
168
1
0
27 Jun 2025
Action Dubber: Timing Audible Actions via Inflectional Flow
Action Dubber: Timing Audible Actions via Inflectional Flow
Wenlong Wan
Weiying Zheng
Tianyi Xiang
Guiqing Li
Shengfeng He
145
0
0
16 Jun 2025
Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation
Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation
Runhao Zeng
Qi Deng
Ronghao Zhang
Shuaicheng Niu
Jian Chen
Xiping Hu
Victor C. M. Leung
TTA
113
0
0
14 Jun 2025
MokA: Multimodal Low-Rank Adaptation for MLLMs
MokA: Multimodal Low-Rank Adaptation for MLLMs
Yake Wei
Yu Miao
Dongzhan Zhou
Di Hu
189
0
0
05 Jun 2025
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
Lidong Lu
Guo Chen
Ruoyao Xiao
Yicheng Liu
Tong Lu
VLMLRM
251
6
0
05 Jun 2025
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling
X. Yu
Yan Fang
Xiaojie Jin
Yao Zhao
Yunchao Wei
219
1
0
29 May 2025
ZeroSep: Separate Anything in Audio with Zero Training
ZeroSep: Separate Anything in Audio with Zero Training
Chao Huang
Yuesheng Ma
J. Huang
Susan Liang
Yunlong Tang
Jing Bi
Wenqiang Liu
Nima Mesgarani
Chenliang Xu
DiffMVLM
190
3
0
29 May 2025
Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning
Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning
Jiangrong Shen
Yulin Xie
Qi Xu
Gang Pan
Huajin Tang
Badong Chen
172
3
0
20 May 2025
Learning to Highlight Audio by Watching Movies
Learning to Highlight Audio by Watching MoviesComputer Vision and Pattern Recognition (CVPR), 2025
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
217
3
0
17 May 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video ParsingComputer Vision and Pattern Recognition (CVPR), 2025
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François Germain
Michael Jeffrey Jones
Moitreya Chatterjee
194
1
0
14 May 2025
Audio-visual Event Localization on Portrait Mode Short Videos
Audio-visual Event Localization on Portrait Mode Short Videos
Wuyang Liu
Yi Chai
Yongpeng Yan
Yanzhen Ren
233
1
0
09 Apr 2025
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection
Peng Wu
Wanshun Su
Guansong Pang
Yujia Sun
Qingsen Yan
Peng Wang
Yujiao Shi
VLM
236
3
0
06 Apr 2025
Aligned Better, Listen Better for Audio-Visual Large Language Models
Aligned Better, Listen Better for Audio-Visual Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025
Yuxin Guo
Shuailei Ma
Shijie Ma
Xiaoyi Bao
Chen-Wei Xie
Kecheng Zheng
Tingyu Weng
Siyang Sun
Yun Zheng
Wei Zou
MLLMAuLLM
275
6
0
02 Apr 2025
Continual Cross-Modal Generalization
Continual Cross-Modal Generalization
Yan Xia
Hai Huang
Minghui Fang
Zhou Zhao
CLL
239
1
0
01 Apr 2025
Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning
Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning
Sashuai Zhou
Hai Huang
Yan Xia
MoMeMoE
232
2
0
26 Mar 2025
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent AlignmentComputer Vision and Pattern Recognition (CVPR), 2025
Chen Liu
Peike Li
Liying Yang
Dadong Wang
Lincheng Li
Xin Yu
VOS
179
1
0
17 Mar 2025
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic ThresholdsComputer Vision and Pattern Recognition (CVPR), 2025
E. Shaar
Ariel Shaulov
Gal Chechik
Lior Wolf
VLM
263
1
0
17 Mar 2025
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
Ali Vosoughi
Dimitra Emmanouilidou
H. Gamper
377
2
0
12 Mar 2025
123456
Next