Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1803.08842
Cited By
Audio-Visual Event Localization in Unconstrained Videos
23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Event Localization in Unconstrained Videos"
50 / 296 papers shown
Title
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Neural Information Processing Systems (NeurIPS), 2023
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
200
22
0
27 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
100
17
0
22 May 2023
Connecting Multi-modal Contrastive Representations
Neural Information Processing Systems (NeurIPS), 2023
Zehan Wang
Yang Zhao
Xize Cheng
Haifeng Huang
Jiageng Liu
...
Lin Li
Yongqiang Wang
Aoxiong Yin
Ziang Zhang
Zhou Zhao
153
40
0
22 May 2023
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yuanyuan Jiang
Jianqin Yin
143
8
0
21 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
316
127
0
14 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOS
ViT
137
7
0
12 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
European Conference on Computer Vision (ECCV), 2023
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
243
16
0
06 May 2023
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation
Shentong Mo
Yapeng Tian
VLM
177
57
0
03 May 2023
Audio-Visual Grouping Network for Sound Localization from Mixtures
Computer Vision and Pattern Recognition (CVPR), 2023
Shentong Mo
Yapeng Tian
117
63
0
29 Mar 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Kun Su
Kaizhi Qian
Eli Shlizerman
Antonio Torralba
Chuang Gan
VGen
AI4CE
211
27
0
29 Mar 2023
Egocentric Audio-Visual Object Localization
Computer Vision and Pattern Recognition (CVPR), 2023
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
151
44
0
23 Mar 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Computer Vision and Pattern Recognition (CVPR), 2023
Tiantian Geng
Teng Wang
Yanfu Zhang
Runmin Cong
Feng Zheng
161
58
0
22 Mar 2023
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Weixuan Sun
Jiayi Zhang
Jianyuan Wang
Zheyuan Liu
Yiran Zhong
Tianpeng Feng
Yandong Guo
Yanhao Zhang
Nick Barnes
SSL
222
64
0
20 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
179
21
0
04 Mar 2023
Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers
Interspeech (Interspeech), 2023
Nithish Muthuchamy Selvaraj
Xiaobao Guo
A. Kong
Bingquan Shen
Alex C. Kot
CLL
132
12
0
28 Feb 2023
Context Understanding in Computer Vision: A Survey
Computer Vision and Image Understanding (CVIU), 2023
Xuan Wang
Zhigang Zhu
204
64
0
10 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
181
1
0
07 Feb 2023
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Neural Information Processing Systems (NeurIPS), 2023
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
VGen
321
58
0
04 Feb 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jaesung Huh
Jacob Chalk
Evangelos Kazakos
Dima Damen
Andrew Zisserman
EgoV
265
55
0
01 Feb 2023
Audio-Visual Segmentation with Semantics
International Journal of Computer Vision (IJCV), 2023
Jinxing Zhou
Xuyang Shen
Jianyuan Wang
Jiayi Zhang
Weixuan Sun
...
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
144
71
0
30 Jan 2023
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Computer Vision and Pattern Recognition (CVPR), 2022
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
192
106
0
15 Dec 2022
Audiovisual Masked Autoencoders
IEEE International Conference on Computer Vision (ICCV), 2022
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
258
55
0
09 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Computer Vision and Pattern Recognition (CVPR), 2022
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
247
39
0
07 Dec 2022
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
International Journal of Computer Vision (IJCV), 2022
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
199
2
0
05 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Computer Vision and Pattern Recognition (CVPR), 2022
Xixi Hu
Ziyang Chen
Andrew Owens
165
65
0
28 Nov 2022
LISA: Localized Image Stylization with Audio via Implicit Neural Representation
Seung Hyun Lee
Chanyoung Kim
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
127
3
0
21 Nov 2022
Contrastive Positive Sample Propagation along the Audio-Visual Event Line
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Jinxing Zhou
Dan Guo
Meng Wang
198
84
0
18 Nov 2022
The Lean Data Scientist: Recent Advances towards Overcoming the Data Bottleneck
Communications of the ACM (CACM), 2022
Chen Shani
Jonathan Zarecki
Dafna Shahaf
105
7
0
15 Nov 2022
PMR: Prototypical Modal Rebalance for Multimodal Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Yunfeng Fan
Wenchao Xu
Yining Qi
Junxiao Wang
Song Guo
1.4K
140
0
14 Nov 2022
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal Retrieval
IEEE International Symposium on Multimedia (ISM), 2022
Donghuo Zeng
Yanan Wang
Jianming Wu
K. Ikeda
166
5
0
07 Nov 2022
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
Neural Information Processing Systems (NeurIPS), 2022
Moitreya Chatterjee
Narendra Ahuja
A. Cherian
176
15
0
29 Oct 2022
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function
International Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Qing Wang
Hang Chen
Yannan Jiang
Zhe Wang
Yuyang Wang
Jun Du
Chin-Hui Lee
143
4
0
26 Oct 2022
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tanvir Mahmud
Diana Marculescu
CLIP
151
39
0
11 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
290
35
0
05 Oct 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
ACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
234
153
0
07 Sep 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
248
66
0
20 Aug 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
European Conference on Computer Vision (ECCV), 2022
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
177
14
0
21 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
European Conference on Computer Vision (ECCV), 2022
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
183
32
0
20 Jul 2022
Online Video Instance Segmentation via Robust Context Fusion
Xiang Li
Jinglu Wang
Xiaohao Xu
Bhiksha Raj
Yan Lu
174
5
0
12 Jul 2022
Audio-Visual Segmentation
European Conference on Computer Vision (ECCV), 2022
Jinxing Zhou
Jianyuan Wang
Jing Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
247
161
0
11 Jul 2022
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
188
34
0
20 Jun 2022
Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation
International Conference on Learning Representations (ICLR), 2022
Ye Zhu
Yuehua Wu
Kyle Olszewski
Jian Ren
Sergey Tulyakov
Yan Yan
DiffM
342
56
0
15 Jun 2022
Past and Future Motion Guided Network for Audio Visual Event Localization
Ting-Yen Chen
Jianqin Yin
Jin Tang
99
3
0
08 May 2022
How to Listen? Rethinking Visual Sound Localization
Interspeech (Interspeech), 2022
Ho-Hsiang Wu
Magdalena Fuentes
Prem Seetharaman
J. P. Bello
ObjD
90
5
0
11 Apr 2022
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
Computer Vision and Pattern Recognition (CVPR), 2022
Ruohan Gao
Zilin Si
Yen-Yu Chang
Samuel Clarke
Jeannette Bohg
Li Fei-Fei
Wenzhen Yuan
Jiajun Wu
159
103
0
05 Apr 2022
Quantized GAN for Complex Music Generation from Dance Videos
European Conference on Computer Vision (ECCV), 2022
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
196
55
0
01 Apr 2022
Investigating Modality Bias in Audio Visual Video Parsing
Piyush Singh Pasi
Shubham Nemani
Preethi Jyothi
Ganesh Ramakrishnan
214
4
0
31 Mar 2022
The Sound of Bounding-Boxes
International Conference on Pattern Recognition (ICPR), 2022
Takashi Oya
Shohei Iwase
Shigeo Morishima
116
2
0
30 Mar 2022
Balanced Multimodal Learning via On-the-fly Gradient Modulation
Computer Vision and Pattern Recognition (CVPR), 2022
Xiaokang Peng
Yake Wei
Andong Deng
Dong Wang
Di Hu
233
322
0
29 Mar 2022
Audio-Adaptive Activity Recognition Across Video Domains
Computer Vision and Pattern Recognition (CVPR), 2022
Yun C. Zhang
Hazel Doughty
Ling Shao
Cees G. M. Snoek
158
48
0
27 Mar 2022
Previous
1
2
3
4
5
6
Next