ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.08842
  4. Cited By
Audio-Visual Event Localization in Unconstrained Videos

Audio-Visual Event Localization in Unconstrained Videos

23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
ArXiv (abs)PDFHTML

Papers citing "Audio-Visual Event Localization in Unconstrained Videos"

50 / 301 papers shown
Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering
Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering
Z. Fu
Changsheng Lv
Mengshi Qi
Huadong Ma
160
0
0
28 Nov 2025
MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual Learning
Kyeongha Rho
Hyeongkeun Lee
Jae-Won Cho
Joon Son Chung
45
0
0
27 Nov 2025
Distilling Cross-Modal Knowledge via Feature Disentanglement
Distilling Cross-Modal Knowledge via Feature Disentanglement
Junhong Liu
Yuan Zhang
Tao Huang
Wenchao Xu
Renyu Yang
142
0
0
25 Nov 2025
Decoupled Audio-Visual Dataset Distillation
Decoupled Audio-Visual Dataset Distillation
Wenyuan Li
Guang Li
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
134
1
0
22 Nov 2025
R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios
R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios
Lu Zhu
Tiantian Geng
Yangye Chen
Teng Wang
Ping Lu
Feng Zheng
AI4TS
261
0
0
21 Nov 2025
Real-Time Inference for Distributed Multimodal Systems under Communication Delay Uncertainty
Victor Croisfelt
João Henrique Inacio de Souza
Shashi Raj Pandey
B. Soret
P. Popovski
161
0
0
20 Nov 2025
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Cheng Yang
Haiyuan Wan
Yiran Peng
Xin Cheng
Quan Shi
...
Junchi Yu
Xinlei Yu
Xiawu Zheng
D. Zhou
Chenglin Wu
ReLMLRM
311
0
0
19 Nov 2025
PrAda-GAN: A Private Adaptive Generative Adversarial Network with Bayes Network Structure
PrAda-GAN: A Private Adaptive Generative Adversarial Network with Bayes Network Structure
Ke Jia
Yuheng Ma
Yang Li
Feifei Wang
124
2
0
11 Nov 2025
Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization
Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization
Heshan Devaka Fernando
Parikshit Ram
Yi Zhou
Soham Dan
Horst Samulowitz
Nathalie Baracaldo
Tianyi Chen
226
0
0
10 Nov 2025
Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation
Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation
Evelyn Chee
Wynne Hsu
Mong-Li Lee
CLLKELM
341
0
0
10 Nov 2025
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
Ziyu Guo
Xinyan Chen
Renrui Zhang
Ruichuan An
Yu Qi
Dongzhi Jiang
Xiangtai Li
M. Zhang
Jiaming Song
Pheng-Ann Heng
VGenLRM
190
12
0
30 Oct 2025
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Hossein R. Nowdeh
Jie Ji
Xiaolong Ma
Fatemeh Afghah
139
0
0
28 Oct 2025
Empower Words: DualGround for Structured Phrase and Sentence-Level Temporal Grounding
Empower Words: DualGround for Structured Phrase and Sentence-Level Temporal Grounding
Minseok Kang
M. Lee
Minjung Kim
Donghyeong Kim
Sangyoun Lee
118
0
0
23 Oct 2025
AV-Master: Dual-Path Comprehensive Perception Makes Better Audio-Visual Question Answering
AV-Master: Dual-Path Comprehensive Perception Makes Better Audio-Visual Question Answering
Jiayu Zhang
Qilang Ye
Shuo Ye
Xun Lin
Zihan Song
Zitong Yu
112
0
0
21 Oct 2025
Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning
Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning
Zhaocheng Liu
Zhiwen Yu
Xiaoqing Liu
200
0
0
20 Oct 2025
Not in Sync: Unveiling Temporal Bias in Audio Chat Models
Not in Sync: Unveiling Temporal Bias in Audio Chat Models
Jiayu Yao
Shenghua Liu
Yiwei Wang
Rundong Cheng
Lingrui Mei
Baolong Bi
Zhen Xiong
Xueqi Cheng
116
0
0
14 Oct 2025
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
Haomiao Chen
K. Jamison
M. Sabuncu
Amy Kuceyeski
143
1
0
07 Oct 2025
Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios
Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios
Jinghan Xu Yuyang Zhang Qixuan Cai Jiancheng Chen Keqiu Li
MU
97
0
0
28 Sep 2025
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
Chao Huang
Susan Liang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
147
0
0
26 Sep 2025
Shaping Initial State Prevents Modality Competition in Multi-modal Fusion: A Two-stage Scheduling Framework via Fast Partial Information Decomposition
Shaping Initial State Prevents Modality Competition in Multi-modal Fusion: A Two-stage Scheduling Framework via Fast Partial Information Decomposition
Jiaqi Tang
Yinsong Xu
Yang Liu
Qingchao Chen
138
0
0
25 Sep 2025
Learning from Silence and Noise for Visual Sound Source Localization
Learning from Silence and Noise for Visual Sound Source Localization
Xavier Juanola
G. Morais
Magdalena Fuentes
Gloria Haro
SSL
164
0
0
29 Aug 2025
VGGSounder: Audio-Visual Evaluations for Foundation Models
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev
Thaddäus Wiedemer
Christian Schroeder de Witt
Matthias Bethge
Wieland Brendel
A. Sophia Koepke
AuLLM
231
4
0
11 Aug 2025
ASAudio: A Survey of Advanced Spatial Audio Research
ASAudio: A Survey of Advanced Spatial Audio Research
Zhiyuan Zhu
Yu Zhang
Wenxiang Guo
Changhao Pan
Zhou Zhao
198
1
0
08 Aug 2025
CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization
CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization
Jinxing Zhou
Ziheng Zhou
Yanghao Zhou
Yuxin Mao
Zhangling Duan
Dan Guo
136
2
0
06 Aug 2025
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
Yuyang Liu
Qiuhe Hong
Linlan Huang
Alexandra Gomez-Villa
Dipam Goswami
Xialei Liu
Joost van de Weijer
Yonghong Tian
CLLKELMVLM
212
0
0
06 Aug 2025
Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
Jinxing Zhou
Yanghao Zhou
Mingfei Han
Tong Wang
Xiaojun Chang
Hisham Cholakkal
Rao Muhammad Anwer
VOSLRM
182
1
0
06 Aug 2025
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
Yogesh Kulkarni
Pooyan Fazli
OffRLLRM
280
4
0
05 Aug 2025
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
Mahnoor Fatima Saad
Ziad Al-Halah
VGen
85
1
0
04 Aug 2025
Hybrid Hypergraph Networks for Multimodal Sequence Data Classification
Hybrid Hypergraph Networks for Multimodal Sequence Data Classification
Feng Xu
Hui Wang
Yuting Huang
Danwei Zhang
Zizhu Fan
110
0
0
30 Jul 2025
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Kaining Ying
Henghui Ding
Guangquan Jie
Yu Jiang
VOS
321
5
0
30 Jul 2025
Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning
Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning
Jiong Yin
Liang-Sheng Li
Jiehua Zhang
Yuhan Gao
Chenggang Yan
Xichun Sheng
CLL
215
1
0
29 Jul 2025
DAMS:Dual-Branch Adaptive Multiscale Spatiotemporal Framework for Video Anomaly Detection
DAMS:Dual-Branch Adaptive Multiscale Spatiotemporal Framework for Video Anomaly Detection
Dezhi An
Wenqiang Liu
Kefan Wang
Zening chen
Jun Lu
Shengcai Zhang
107
0
0
28 Jul 2025
Improving Multimodal Learning via Imbalanced Learning
Improving Multimodal Learning via Imbalanced Learning
Shicai Wei
Chunbo Luo
Yang Luo
203
4
0
14 Jul 2025
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
Huilai Li
Yonghao Dang
Ying Xing
Yiming Wang
Jianqin Yin
183
0
0
14 Jul 2025
MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
Langyu Wang
Bingke Zhu
Yingying Chen
Yiyuan Zhang
Ming Tang
Jinqiao Wang
VLM
321
1
0
02 Jul 2025
Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment
Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment
Yue Zhang
Jilei Sun
Yunhui Guo
Vibhav Gogate
LRM
204
1
0
27 Jun 2025
Action Dubber: Timing Audible Actions via Inflectional Flow
Action Dubber: Timing Audible Actions via Inflectional Flow
Wenlong Wan
Weiying Zheng
Tianyi Xiang
Guiqing Li
Shengfeng He
174
0
0
16 Jun 2025
Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation
Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation
Runhao Zeng
Qi Deng
Ronghao Zhang
Shuaicheng Niu
Jian Chen
Xiping Hu
Victor C. M. Leung
TTA
134
0
0
14 Jun 2025
MokA: Multimodal Low-Rank Adaptation for MLLMs
MokA: Multimodal Low-Rank Adaptation for MLLMs
Yake Wei
Yu Miao
Dongzhan Zhou
Di Hu
257
0
0
05 Jun 2025
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
Lidong Lu
Guo Chen
Ruoyao Xiao
Yicheng Liu
Tong Lu
VLMLRM
339
7
0
05 Jun 2025
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling
X. Yu
Yan Fang
Xiaojie Jin
Yao Zhao
Yunchao Wei
284
1
0
29 May 2025
ZeroSep: Separate Anything in Audio with Zero Training
ZeroSep: Separate Anything in Audio with Zero Training
Chao Huang
Yuesheng Ma
J. Huang
Susan Liang
Yunlong Tang
Jing Bi
Wenqiang Liu
Nima Mesgarani
Chenliang Xu
DiffMVLM
249
3
0
29 May 2025
Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning
Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning
Jiangrong Shen
Yulin Xie
Qi Xu
Gang Pan
Huajin Tang
Badong Chen
216
6
0
20 May 2025
Learning to Highlight Audio by Watching Movies
Learning to Highlight Audio by Watching MoviesComputer Vision and Pattern Recognition (CVPR), 2025
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
257
4
0
17 May 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video ParsingComputer Vision and Pattern Recognition (CVPR), 2025
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François Germain
Michael Jeffrey Jones
Moitreya Chatterjee
222
1
0
14 May 2025
Audio-visual Event Localization on Portrait Mode Short Videos
Audio-visual Event Localization on Portrait Mode Short Videos
Wuyang Liu
Yi Chai
Yongpeng Yan
Yanzhen Ren
303
1
0
09 Apr 2025
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection
Peng Wu
Wanshun Su
Guansong Pang
Yujia Sun
Qingsen Yan
Peng Wang
Yujiao Shi
VLM
311
5
0
06 Apr 2025
Aligned Better, Listen Better for Audio-Visual Large Language Models
Aligned Better, Listen Better for Audio-Visual Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025
Yuxin Guo
Shuailei Ma
Shijie Ma
Xiaoyi Bao
Chen-Wei Xie
Kecheng Zheng
Tingyu Weng
Siyang Sun
Yun Zheng
Wei Zou
MLLMAuLLM
323
8
0
02 Apr 2025
Continual Cross-Modal Generalization
Continual Cross-Modal Generalization
Yan Xia
Hai Huang
Minghui Fang
Zhou Zhao
CLL
275
1
0
01 Apr 2025
Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning
Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning
Sashuai Zhou
Hai Huang
Yan Xia
MoMeMoE
285
3
0
26 Mar 2025
1234567
Next