ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.08842
  4. Cited By
Audio-Visual Event Localization in Unconstrained Videos

Audio-Visual Event Localization in Unconstrained Videos

23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
ArXiv (abs)PDFHTML

Papers citing "Audio-Visual Event Localization in Unconstrained Videos"

50 / 296 papers shown
Title
Metric Learning with Progressive Self-Distillation for Audio-Visual Embedding Learning
Metric Learning with Progressive Self-Distillation for Audio-Visual Embedding LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Donghuo Zeng
Kazushi Ikeda
SSL
177
0
0
17 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
398
6
0
10 Jan 2025
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic CameraIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Yuhang He
Sangyun Shin
Anoop Cherian
Niki Trigoni
Andrew Markham
392
0
0
31 Dec 2024
Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing
Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge ComputingACM Symposium on Applied Computing (SAC), 2024
Inpyo Hong
Youngwan Jo
Hyojeong Lee
Sunghyun Ahn
Sanghyun Park
MQ
313
6
0
26 Dec 2024
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual
  Video Parsing
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video ParsingAAAI Conference on Artificial Intelligence (AAAI), 2024
Pengcheng Zhao
Jinxing Zhou
Yang Zhao
Dan Guo
Yanxiang Chen
277
13
0
15 Dec 2024
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand
  Audio-Visual Information?
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Kaixiong Gong
Kaituo Feng
Yangqiu Song
Yibing Wang
Mofan Cheng
...
Jiaming Han
Benyou Wang
Yutong Bai
Zhiyong Yang
Xiangyu Yue
MLLMAuLLMVLM
235
25
0
03 Dec 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation LearningACM Computing Surveys (ACM CSUR), 2024
Luis Vilaca
Yi Yu
Paula Vinan
418
2
0
24 Nov 2024
Towards Open-Vocabulary Audio-Visual Event LocalizationComputer Vision and Pattern Recognition (CVPR), 2024
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Ming Wang
VLM
431
19
0
18 Nov 2024
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing
  Audio-Visual Question Answering
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Tianyu Yang
Yiyang Nan
Lisen Dai
Zhenwen Liang
Yapeng Tian
Wei Wei
236
1
0
07 Nov 2024
Continual Audio-Visual Sound Separation
Continual Audio-Visual Sound SeparationNeural Information Processing Systems (NeurIPS), 2024
Weiguo Pian
Yiyang Nan
Shijian Deng
Shentong Mo
Yunhui Guo
Yapeng Tian
VLMCLL
312
3
0
05 Nov 2024
Scaling Concept With Text-Guided Diffusion Models
Scaling Concept With Text-Guided Diffusion Models
Chao Huang
Susan Liang
Yunlong Tang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
150
10
0
31 Oct 2024
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Aligning Audio-Visual Joint Representations with an Agentic WorkflowNeural Information Processing Systems (NeurIPS), 2024
Shentong Mo
Yibing Song
201
2
0
30 Oct 2024
Joint Audio-Visual Idling Vehicle Detection with Streamlined Input
  Dependencies
Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies
Xiwen Li
Rehman Mohammed
Tristalee Mangin
Surojit Saha
Ross T. Whitaker
Kerry E Kelly
Tolga Tasdizen
197
6
0
28 Oct 2024
On-the-fly Modulation for Balanced Multimodal Learning
On-the-fly Modulation for Balanced Multimodal LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yake Wei
D. Hu
Henghui Du
Ji-Rong Wen
209
28
0
15 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent ApproachNeural Information Processing Systems (NeurIPS), 2024
Rory Young
Nicolas Pugeault
AAML
302
20
0
14 Oct 2024
STNet: Deep Audio-Visual Fusion Network for Robust Speaker Tracking
STNet: Deep Audio-Visual Fusion Network for Robust Speaker TrackingIEEE transactions on multimedia (IEEE TMM), 2024
Yidi Li
Hong Liu
Bing Yang
297
7
0
08 Oct 2024
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Minoh Jeong
Min Namgung
Min Namgung
Luan Tuyen Chau
Yao-Yi Chiang
Alfred Hero
333
3
0
02 Oct 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
446
8
0
12 Sep 2024
VyAnG-Net: A Novel Multi-Modal Sarcasm Recognition Model by Uncovering
  Visual, Acoustic and Glossary Features
VyAnG-Net: A Novel Multi-Modal Sarcasm Recognition Model by Uncovering Visual, Acoustic and Glossary FeaturesIntelligent Data Analysis (IDA), 2024
Ananya Pandey
Dinesh Kumar Vishwakarma
146
2
0
05 Aug 2024
CACE-Net: Co-guidance Attention and Contrastive Enhancement for
  Effective Audio-Visual Event Localization
CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event LocalizationACM Multimedia (MM), 2024
Xiang He
Xiangxi Liu
Yang Li
Dongcheng Zhao
Guobin Shen
Qingqun Kong
Xin Yang
Yi Zeng
208
11
0
04 Aug 2024
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Guangyao Li
Henghui Du
Di Hu
158
14
0
30 Jul 2024
Detached and Interactive Multimodal Learning
Detached and Interactive Multimodal LearningACM Multimedia (MM), 2024
Yunfeng Fan
Wenchao Xu
Yining Qi
Junhong Liu
Song Guo
297
9
0
28 Jul 2024
Aligning Sight and Sound: Advanced Sound Source Localization Through
  Audio-Visual Alignment
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
281
9
0
18 Jul 2024
Audio-visual Generalized Zero-shot Learning the Easy Way
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo
Pedro Morgado
207
7
0
18 Jul 2024
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Yaoting Wang
Peiwen Sun
Dongzhan Zhou
Guangyao Li
Honggang Zhang
Di Hu
VOS
226
21
0
15 Jul 2024
Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
Jinxing Zhou
Dan Guo
Yuxin Mao
Yiran Zhong
Xiaojun Chang
Meng Wang
166
30
0
11 Jul 2024
Semantic Grouping Network for Audio Source Separation
Semantic Grouping Network for Audio Source Separation
Shentong Mo
Yapeng Tian
190
5
0
04 Jul 2024
SOAF: Scene Occlusion-aware Neural Acoustic Field
SOAF: Scene Occlusion-aware Neural Acoustic Field
Huiyu Gao
Jiahao Ma
David Ahmedt-Aristizabal
Chuong H. Nguyen
Miaomiao Liu
340
5
0
02 Jul 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and
  Time
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury
Sayan Nag
Subhrajyoti Dasgupta
Jun Chen
Mohamed Elhoseiny
Ruohan Gao
Dinesh Manocha
VLMMLLM
331
20
0
01 Jul 2024
Localizing Events in Videos with Multimodal Queries
Localizing Events in Videos with Multimodal QueriesComputer Vision and Pattern Recognition (CVPR), 2024
Gengyuan Zhang
Mang Ling Ada Fok
Yan Xia
Yansong Tang
Zorah Lähner
Juil Sock
Volker Tresp
Jindong Gu
288
4
0
14 Jun 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal NarrativeInternational Conference on Learning Representations (ICLR), 2024
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
383
8
0
10 Jun 2024
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual
  Transformers
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Tanvir Mahmud
Shentong Mo
Yapeng Tian
Diana Marculescu
142
7
0
07 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
309
14
0
06 Jun 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
325
1
0
04 Jun 2024
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise
  Pseudo Labeling
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
215
33
0
03 Jun 2024
From CNNs to Transformers in Multimodal Human Action Recognition: A
  Survey
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
277
26
0
22 May 2024
CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly
  Supervised Audio-Visual Video Parsing
CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video ParsingEuropean Conference on Computer Vision (ECCV), 2024
Faegheh Sardari
A. Mustafa
Philip J. B. Jackson
Adrian Hilton
327
10
0
17 May 2024
ReconBoost: Boosting Can Achieve Modality Reconcilement
ReconBoost: Boosting Can Achieve Modality ReconcilementInternational Conference on Machine Learning (ICML), 2024
Cong Hua
Qianqian Xu
Shilong Bao
Zhiyong Yang
Qingming Huang
172
37
0
15 May 2024
Improving Multimodal Learning with Multi-Loss Gradient Modulation
Improving Multimodal Learning with Multi-Loss Gradient ModulationBritish Machine Vision Conference (BMVC), 2024
Konstantinos Kontras
Christos Chatzichristos
Matthew Blaschko
M. D. Vos
182
9
0
13 May 2024
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual
  Question Answering
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering
Yuanyuan Jiang
Jianqin Yin
263
2
0
13 May 2024
Unified Video-Language Pre-training with Synchronized Audio
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
232
2
0
12 May 2024
Decoding Radiologists' Intentions: A Novel System for Accurate Region
  Identification in Chest X-ray Image Analysis
Decoding Radiologists' Intentions: A Novel System for Accurate Region Identification in Chest X-ray Image Analysis
Akash Awasthi
Safwan Ahmad
Bryant Le
Hien Nguyen
93
2
0
29 Apr 2024
Anchor-aware Deep Metric Learning for Audio-visual Retrieval
Anchor-aware Deep Metric Learning for Audio-visual Retrieval
Donghuo Zeng
Yanan Wang
Kazushi Ikeda
Yi Yu
154
3
0
21 Apr 2024
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large
  Multi-Modal Models
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models
David Kurzendörfer
Otniel-Bogdan Mercea
A. Sophia Koepke
Zeynep Akata
VLMCLIP
142
3
0
09 Apr 2024
TIM: A Time Interval Machine for Audio-Visual Action Recognition
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk
Jaesung Huh
Evangelos Kazakos
Andrew Zisserman
Dima Damen
258
24
0
08 Apr 2024
UniAV: Unified Audio-Visual Perception for Multi-Task Video Event Localization
UniAV: Unified Audio-Visual Perception for Multi-Task Video Event LocalizationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Tiantian Geng
Teng Wang
Jinming Duan
Yanfu Zhang
Weili Guan
Feng Zheng
Ling Shao
216
2
0
04 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
203
10
0
28 Mar 2024
Learning to Visually Localize Sound Sources from Mixtures without Prior
  Source Knowledge
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim
Sung-Jin Um
Sangmin Lee
Jung Uk Kim
140
15
0
26 Mar 2024
Answering Diverse Questions via Text Attached with Key Audio-Visual
  Clues
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Qilang Ye
Zitong Yu
Xin Liu
205
4
0
11 Mar 2024
Reframe Anything: LLM Agent for Open World Video Reframing
Reframe Anything: LLM Agent for Open World Video Reframing
Jiawang Cao
Yongliang Wu
Weiheng Chi
Wenbo Zhu
Ziyue Su
Jay Wu
138
7
0
10 Mar 2024
Previous
123456
Next