ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.08842
  4. Cited By
Audio-Visual Event Localization in Unconstrained Videos

Audio-Visual Event Localization in Unconstrained Videos

23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
ArXiv (abs)PDFHTML

Papers citing "Audio-Visual Event Localization in Unconstrained Videos"

50 / 296 papers shown
Title
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event
  Parser
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event ParserNeural Information Processing Systems (NeurIPS), 2023
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
200
22
0
27 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
100
17
0
22 May 2023
Connecting Multi-modal Contrastive Representations
Connecting Multi-modal Contrastive RepresentationsNeural Information Processing Systems (NeurIPS), 2023
Zehan Wang
Yang Zhao
Xize Cheng
Haifeng Huang
Jiageng Liu
...
Lin Li
Yongqiang Wang
Aoxiong Yin
Ziang Zhang
Zhou Zhao
153
40
0
22 May 2023
Target-Aware Spatio-Temporal Reasoning via Answering Questions in
  Dynamics Audio-Visual Scenarios
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual ScenariosConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yuanyuan Jiang
Jianqin Yin
143
8
0
21 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
316
127
0
14 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Transavs: End-To-End Audio-Visual Segmentation With TransformerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOSViT
137
7
0
12 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze
  Anticipation
Listen to Look into the Future: Audio-Visual Egocentric Gaze AnticipationEuropean Conference on Computer Vision (ECCV), 2023
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
243
16
0
06 May 2023
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and
  Segmentation
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation
Shentong Mo
Yapeng Tian
VLM
177
57
0
03 May 2023
Audio-Visual Grouping Network for Sound Localization from Mixtures
Audio-Visual Grouping Network for Sound Localization from MixturesComputer Vision and Pattern Recognition (CVPR), 2023
Shentong Mo
Yapeng Tian
117
63
0
29 Mar 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Physics-Driven Diffusion Models for Impact Sound Synthesis from VideosComputer Vision and Pattern Recognition (CVPR), 2023
Kun Su
Kaizhi Qian
Eli Shlizerman
Antonio Torralba
Chuang Gan
VGenAI4CE
211
27
0
29 Mar 2023
Egocentric Audio-Visual Object Localization
Egocentric Audio-Visual Object LocalizationComputer Vision and Pattern Recognition (CVPR), 2023
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
151
44
0
23 Mar 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale
  Benchmark and Baseline
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and BaselineComputer Vision and Pattern Recognition (CVPR), 2023
Tiantian Geng
Teng Wang
Yanfu Zhang
Runmin Cong
Feng Zheng
161
58
0
22 Mar 2023
Learning Audio-Visual Source Localization via False Negative Aware
  Contrastive Learning
Learning Audio-Visual Source Localization via False Negative Aware Contrastive LearningComputer Vision and Pattern Recognition (CVPR), 2023
Weixuan Sun
Jiayi Zhang
Jianyuan Wang
Zheyuan Liu
Yiran Zhong
Tianpeng Feng
Yandong Guo
Yanhao Zhang
Nick Barnes
SSL
222
64
0
20 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
179
21
0
04 Mar 2023
Adapter Incremental Continual Learning of Efficient Audio Spectrogram
  Transformers
Adapter Incremental Continual Learning of Efficient Audio Spectrogram TransformersInterspeech (Interspeech), 2023
Nithish Muthuchamy Selvaraj
Xiaobao Guo
A. Kong
Bingquan Shen
Alex C. Kot
CLL
132
12
0
28 Feb 2023
Context Understanding in Computer Vision: A Survey
Context Understanding in Computer Vision: A SurveyComputer Vision and Image Understanding (CVIU), 2023
Xuan Wang
Zhigang Zhu
204
64
0
10 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
181
1
0
07 Feb 2023
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
  Synthesis
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene SynthesisNeural Information Processing Systems (NeurIPS), 2023
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
VGen
321
58
0
04 Feb 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Epic-Sounds: A Large-scale Dataset of Actions That SoundIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jaesung Huh
Jacob Chalk
Evangelos Kazakos
Dima Damen
Andrew Zisserman
EgoV
265
55
0
01 Feb 2023
Audio-Visual Segmentation with Semantics
Audio-Visual Segmentation with SemanticsInternational Journal of Computer Vision (IJCV), 2023
Jinxing Zhou
Xuyang Shen
Jianyuan Wang
Jiayi Zhang
Weixuan Sun
...
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
144
71
0
30 Jan 2023
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Vision Transformers are Parameter-Efficient Audio-Visual LearnersComputer Vision and Pattern Recognition (CVPR), 2022
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
192
106
0
15 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked AutoencodersIEEE International Conference on Computer Vision (ICCV), 2022
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
258
55
0
09 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
iQuery: Instruments as Queries for Audio-Visual Sound SeparationComputer Vision and Pattern Recognition (CVPR), 2022
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
247
39
0
07 Dec 2022
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent DaylightInternational Journal of Computer Vision (IJCV), 2022
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
199
2
0
05 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in MixturesComputer Vision and Pattern Recognition (CVPR), 2022
Xixi Hu
Ziyang Chen
Andrew Owens
165
65
0
28 Nov 2022
LISA: Localized Image Stylization with Audio via Implicit Neural
  Representation
LISA: Localized Image Stylization with Audio via Implicit Neural Representation
Seung Hyun Lee
Chanyoung Kim
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
127
3
0
21 Nov 2022
Contrastive Positive Sample Propagation along the Audio-Visual Event
  Line
Contrastive Positive Sample Propagation along the Audio-Visual Event LineIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Jinxing Zhou
Dan Guo
Meng Wang
198
84
0
18 Nov 2022
The Lean Data Scientist: Recent Advances towards Overcoming the Data
  Bottleneck
The Lean Data Scientist: Recent Advances towards Overcoming the Data BottleneckCommunications of the ACM (CACM), 2022
Chen Shani
Jonathan Zarecki
Dafna Shahaf
105
7
0
15 Nov 2022
PMR: Prototypical Modal Rebalance for Multimodal Learning
PMR: Prototypical Modal Rebalance for Multimodal LearningComputer Vision and Pattern Recognition (CVPR), 2022
Yunfeng Fan
Wenchao Xu
Yining Qi
Junxiao Wang
Song Guo
1.4K
140
0
14 Nov 2022
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal
  Retrieval
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal RetrievalIEEE International Symposium on Multimedia (ISM), 2022
Donghuo Zeng
Yanan Wang
Jianming Wu
K. Ikeda
166
5
0
07 Nov 2022
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source
  Separation
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source SeparationNeural Information Processing Systems (NeurIPS), 2022
Moitreya Chatterjee
Narendra Ahuja
A. Cherian
176
15
0
29 Oct 2022
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using
  Permutation-Free Loss Function
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss FunctionInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Qing Wang
Hang Chen
Yannan Jiang
Zhe Wang
Yuyang Wang
Jun Du
Chin-Hui Lee
143
4
0
26 Oct 2022
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio
  Visual Event Localization
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event LocalizationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tanvir Mahmud
Diana Marculescu
CLIP
151
39
0
11 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
290
35
0
05 Oct 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
234
153
0
07 Sep 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
248
66
0
20 Aug 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Exploring Fine-Grained Audiovisual Categorization with the SSW60 DatasetEuropean Conference on Computer Vision (ECCV), 2022
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
177
14
0
21 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Temporal and cross-modal attention for audio-visual zero-shot learningEuropean Conference on Computer Vision (ECCV), 2022
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
183
32
0
20 Jul 2022
Online Video Instance Segmentation via Robust Context Fusion
Online Video Instance Segmentation via Robust Context Fusion
Xiang Li
Jinglu Wang
Xiaohao Xu
Bhiksha Raj
Yan Lu
174
5
0
12 Jul 2022
Audio-Visual Segmentation
Audio-Visual SegmentationEuropean Conference on Computer Vision (ECCV), 2022
Jinxing Zhou
Jianyuan Wang
Jing Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
247
161
0
11 Jul 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
188
34
0
20 Jun 2022
Discrete Contrastive Diffusion for Cross-Modal Music and Image
  Generation
Discrete Contrastive Diffusion for Cross-Modal Music and Image GenerationInternational Conference on Learning Representations (ICLR), 2022
Ye Zhu
Yuehua Wu
Kyle Olszewski
Jian Ren
Sergey Tulyakov
Yan Yan
DiffM
342
56
0
15 Jun 2022
Past and Future Motion Guided Network for Audio Visual Event
  Localization
Past and Future Motion Guided Network for Audio Visual Event Localization
Ting-Yen Chen
Jianqin Yin
Jin Tang
99
3
0
08 May 2022
How to Listen? Rethinking Visual Sound Localization
How to Listen? Rethinking Visual Sound LocalizationInterspeech (Interspeech), 2022
Ho-Hsiang Wu
Magdalena Fuentes
Prem Seetharaman
J. P. Bello
ObjD
90
5
0
11 Apr 2022
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real TransferComputer Vision and Pattern Recognition (CVPR), 2022
Ruohan Gao
Zilin Si
Yen-Yu Chang
Samuel Clarke
Jeannette Bohg
Li Fei-Fei
Wenzhen Yuan
Jiajun Wu
159
103
0
05 Apr 2022
Quantized GAN for Complex Music Generation from Dance Videos
Quantized GAN for Complex Music Generation from Dance VideosEuropean Conference on Computer Vision (ECCV), 2022
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
196
55
0
01 Apr 2022
Investigating Modality Bias in Audio Visual Video Parsing
Investigating Modality Bias in Audio Visual Video Parsing
Piyush Singh Pasi
Shubham Nemani
Preethi Jyothi
Ganesh Ramakrishnan
214
4
0
31 Mar 2022
The Sound of Bounding-Boxes
The Sound of Bounding-BoxesInternational Conference on Pattern Recognition (ICPR), 2022
Takashi Oya
Shohei Iwase
Shigeo Morishima
116
2
0
30 Mar 2022
Balanced Multimodal Learning via On-the-fly Gradient Modulation
Balanced Multimodal Learning via On-the-fly Gradient ModulationComputer Vision and Pattern Recognition (CVPR), 2022
Xiaokang Peng
Yake Wei
Andong Deng
Dong Wang
Di Hu
233
322
0
29 Mar 2022
Audio-Adaptive Activity Recognition Across Video Domains
Audio-Adaptive Activity Recognition Across Video DomainsComputer Vision and Pattern Recognition (CVPR), 2022
Yun C. Zhang
Hazel Doughty
Ling Shao
Cees G. M. Snoek
158
48
0
27 Mar 2022
Previous
123456
Next