Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1803.08842
Cited By
Audio-Visual Event Localization in Unconstrained Videos
23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Event Localization in Unconstrained Videos"
50 / 298 papers shown
Title
Balanced Multimodal Learning via On-the-fly Gradient Modulation
Computer Vision and Pattern Recognition (CVPR), 2022
Xiaokang Peng
Yake Wei
Andong Deng
Dong Wang
Di Hu
273
324
0
29 Mar 2022
Audio-Adaptive Activity Recognition Across Video Domains
Computer Vision and Pattern Recognition (CVPR), 2022
Yun C. Zhang
Hazel Doughty
Ling Shao
Cees G. M. Snoek
166
48
0
27 Mar 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Computer Vision and Pattern Recognition (CVPR), 2022
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
251
200
0
26 Mar 2022
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
Computer Vision and Pattern Recognition (CVPR), 2022
Xian Liu
Qianyi Wu
Hang Zhou
Yinghao Xu
Rui Qian
Xinyi Lin
Xiaowei Zhou
Wayne Wu
Bo Dai
Bolei Zhou
SLR
203
133
0
24 Mar 2022
Towards Inadequately Pre-trained Models in Transfer Learning
IEEE International Conference on Computer Vision (ICCV), 2022
Andong Deng
Xingjian Li
Di Hu
Tianyang Wang
Haoyi Xiong
Chengzhong Xu
123
7
0
09 Mar 2022
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
Computer Vision and Pattern Recognition (CVPR), 2022
Otniel-Bogdan Mercea
Lukas Riesch
A. Sophia Koepke
Zeynep Akata
134
54
0
07 Mar 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
AAAI Conference on Artificial Intelligence (AAAI), 2022
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
147
30
0
13 Feb 2022
Audio-Visual Fusion Layers for Event Type Aware Video Recognition
Arda Senocak
Junsik Kim
Tae-Hyun Oh
H. Ryu
Dingzeyu Li
In So Kweon
109
1
0
12 Feb 2022
OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos
Merey Ramazanova
Victor Escorcia
Fabian Caba Heilbron
Chen Zhao
Guohao Li
179
4
0
10 Feb 2022
Learning Sound Localization Better From Semantically Similar Samples
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Arda Senocak
H. Ryu
Junsik Kim
In So Kweon
SSL
117
37
0
07 Feb 2022
Multimodal data matters: language model pre-training over structured and unstructured electronic health records
IEEE journal of biomedical and health informatics (IEEE JBHI), 2022
Sicen Liu
Xiaolong Wang
Yongshuai Hou
Ge Li
Hui Wang
Huiqin Xu
Yang Xiang
Buzhou Tang
303
41
0
25 Jan 2022
Temporal Sentence Grounding in Videos: A Survey and Future Directions
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
3DGS
330
49
0
20 Jan 2022
Weakly Supervised Visual-Auditory Fixation Prediction with Multigranularity Perception
Guotao Wang
Chenglizhao Chen
Deng-Ping Fan
Aimin Hao
Hong Qin
284
2
0
27 Dec 2021
Class-aware Sounding Objects Localization via Audiovisual Correspondence
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
148
47
0
22 Dec 2021
Decompose the Sounds and Pixels, Recompose the Events
AAAI Conference on Artificial Intelligence (AAAI), 2021
Varshanth R. Rao
Md Ibrahim Khalil
Haoda Li
Peng Dai
Juwei Lu
121
5
0
21 Dec 2021
Soundify: Matching Sound Effects to Video
ACM Symposium on User Interface Software and Technology (UIST), 2021
David Chuan-En Lin
Anastasis Germanidis
Cristobal Valenzuela
Yining Shi
Nikolas Martelaro
266
19
0
17 Dec 2021
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection
Jiaqi Tang
Zhaoyang Liu
Chao Qian
Wayne Wu
Limin Wang
187
23
0
09 Dec 2021
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
179
49
0
08 Dec 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Jiashuo Yu
Ying Cheng
Ruiwei Zhao
Rui Feng
Yuejie Zhang
189
80
0
24 Nov 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
British Machine Vision Conference (BMVC), 2021
Rishabh Garg
Ruohan Gao
Kristen Grauman
148
31
0
21 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Conference on Robot Learning (CoRL), 2021
Ziyang Chen
Xixi Hu
Andrew Owens
149
30
0
10 Nov 2021
Space-Time Memory Network for Sounding Object Localization in Videos
British Machine Vision Conference (BMVC), 2021
Sizhe Li
Yapeng Tian
Chenliang Xu
115
12
0
10 Nov 2021
Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos
Minglang Qiao
Yufan Liu
Mai Xu
Xin Deng
Bing Li
Weiming Hu
Ali Borji
CVBM
110
5
0
05 Nov 2021
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
229
48
0
19 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
139
0
0
13 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
836
1,431
0
13 Oct 2021
V-SlowFast Network for Efficient Visual Sound Separation
Xiangjie Sui
Esa Rahtu
214
12
0
18 Sep 2021
Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction
Hailong Ning
Bin Zhao
Zhanxuan Hu
Lang He
Ercheng Pei
216
12
0
17 Sep 2021
Audio-Visual Transformer Based Crowd Counting
Usman Sajid
Xiangyu Chen
Hasan Sajid
Taejoon Kim
Guanghui Wang
ViT
218
24
0
04 Sep 2021
Binaural Audio Generation via Multi-task Learning
Sijia Li
Shiguang Liu
Tianyi Zhou
95
16
0
02 Sep 2021
Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers
Neural Information Processing Systems (NeurIPS), 2021
Nikita Dvornik
Isma Hadji
Konstantinos G. Derpanis
Animesh Garg
Allan D. Jepson
143
62
0
26 Aug 2021
Multi-Modulation Network for Audio-Visual Event Localization
Hao Wang
Zhengjun Zha
Liang Li
Xuejin Chen
Jiebo Luo
119
2
0
26 Aug 2021
The Right to Talk: An Audio-Visual Transformer Approach
IEEE International Conference on Computer Vision (ICCV), 2021
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
190
38
0
06 Aug 2021
Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization
VISIGRAPP (VISIGRAPP), 2021
Anurag Bagchi
Jazib Mahmood
Dolton Fernandes
Ravi Kiran Sarvadevabhatla
335
32
0
27 Jun 2021
Saying the Unseen: Video Descriptions via Dialog Agents
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
166
8
0
26 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
250
11
0
12 Jun 2021
Dual Normalization Multitasking for Audio-Visual Sounding Object Localization
Tokuhiro Nishikawa
Daiki Shimada
Jerry Jun Yokono
78
0
0
01 Jun 2021
Rethinking the constraints of multimodal fusion: case study in Weakly-Supervised Audio-Visual Video Parsing
Jianning Wu
Zhuqing Jiang
S. Wen
Aidong Men
Haiying Wang
166
1
0
30 May 2021
Multi-target DoA Estimation with an Audio-visual Fusion Mechanism
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Xinyuan Qian
Maulik C. Madhavi
Zexu Pan
Jiadong Wang
Haizhou Li
116
49
0
13 May 2021
Where and When: Space-Time Attention for Audio-Visual Explanations
Yanbei Chen
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
119
4
0
04 May 2021
Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
AAAI Conference on Artificial Intelligence (AAAI), 2021
Yan-Bo Lin
Y. Wang
192
23
0
03 May 2021
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Computer Vision and Pattern Recognition (CVPR), 2021
Yanbei Chen
Yongqin Xian
A. Sophia Koepke
Ying Shan
Zeynep Akata
256
95
0
22 Apr 2021
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition
Zejia Weng
Zuxuan Wu
Hengduo Li
Yue Yu
Yu-Gang Jiang
228
5
0
20 Apr 2021
Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Xiangjie Sui
Esa Rahtu
140
30
0
17 Apr 2021
Self-supervised object detection from audio-visual correspondence
Computer Vision and Pattern Recognition (CVPR), 2021
Triantafyllos Afouras
Yuki M. Asano
Francois Fagan
Andrea Vedaldi
Florian Metze
SSL
303
54
0
13 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Computer Vision and Pattern Recognition (CVPR), 2021
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
87
67
0
13 Apr 2021
MPN: Multimodal Parallel Network for Audio-Visual Event Localization
IEEE International Conference on Multimedia and Expo (ICME), 2021
Jiashuo Yu
Ying Cheng
Rui Feng
175
20
0
07 Apr 2021
Contrastive Learning of Global-Local Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
SSL
144
7
0
07 Apr 2021
Localizing Visual Sounds the Hard Way
Computer Vision and Pattern Recognition (CVPR), 2021
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
201
226
0
06 Apr 2021
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Computer Vision and Pattern Recognition (CVPR), 2021
Yapeng Tian
Di Hu
Chenliang Xu
ObjD
171
91
0
05 Apr 2021
Previous
1
2
3
4
5
6
Next