Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1803.08842
Cited By
Audio-Visual Event Localization in Unconstrained Videos
23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Event Localization in Unconstrained Videos"
50 / 298 papers shown
Title
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Qilang Ye
Zitong Yu
Xin Liu
209
4
0
11 Mar 2024
Reframe Anything: LLM Agent for Open World Video Reframing
Jiawang Cao
Yongliang Wu
Weiheng Chi
Wenbo Zhu
Ziyue Su
Jay Wu
142
8
0
10 Mar 2024
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
167
26
0
08 Mar 2024
SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers
International Conference on Human Factors in Computing Systems (CHI), 2024
Zheng Ning
Brianna L Wimer
Kaiwen Jiang
Keyi Chen
Jerrick Ban
Yapeng Tian
Yuhang Zhao
Tao Li
188
31
0
11 Feb 2024
Multimodal Action Quality Assessment
Ling-an Zeng
Wei-Shi Zheng
445
30
0
31 Jan 2024
Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics
International Conference on Neural Information Processing (ICONIP), 2024
Pengcheng Zhao
Yanxiang Chen
Yang Zhao
Wei Jia
Zhao Zhang
Ronggang Wang
Richang Hong
DiffM
128
1
0
24 Jan 2024
On the Audio Hallucinations in Large Audio-Video Language Models
Taichi Nishimura
Shota Nakada
Masayoshi Kondo
VLM
184
12
0
18 Jan 2024
Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yukun Zuo
Hantao Yao
Liansheng Zhuang
Changsheng Xu
252
5
0
11 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
International Journal of Computer Vision (IJCV), 2024
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
319
9
0
08 Jan 2024
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
239
7
0
08 Jan 2024
Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization
Davide Berghi
Philip J. B. Jackson
191
5
0
21 Dec 2023
Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering
Zhangbin Li
Dan Guo
Jinxing Zhou
Jing Zhang
Meng Wang
175
24
0
20 Dec 2023
Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Renjie Wu
Hu Wang
Feras Dayoub
Hsiang-Ting Chen
144
9
0
14 Dec 2023
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Computer Vision and Pattern Recognition (CVPR), 2023
Shentong Mo
Pedro Morgado
230
29
0
02 Dec 2023
Centre Stage: Centricity-based Audio-Visual Temporal Action Detection
Hanyuan Wang
Majid Mirmehdi
Dima Damen
Toby Perrett
171
3
0
28 Nov 2023
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yating Xu
Conghui Hu
Gim Hee Lee
147
7
0
14 Nov 2023
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks
Haoyi Duan
Yan Xia
Mingze Zhou
Li Tang
Jieming Zhu
Zhou Zhao
VLM
262
38
0
09 Nov 2023
Can CLIP Help Sound Source Localization?
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Sooyoung Park
Arda Senocak
Joon Son Chung
153
15
0
07 Nov 2023
Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based Wireless Communication Systems
Network and Distributed System Security Symposium (NDSS), 2023
Jung-Woo Chang
Ke Sun
Nasimeh Heydaribeni
Seira Hidano
Xinyu Zhang
F. Koushanfar
AAML
207
2
0
01 Nov 2023
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yuxin Ye
Wenming Yang
Yapeng Tian
181
12
0
31 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
256
13
0
25 Oct 2023
Extending Multi-modal Contrastive Representations
Neural Information Processing Systems (NeurIPS), 2023
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
156
18
0
13 Oct 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Yuxin Mao
Jing Zhang
Mochu Xiang
Yiran Zhong
Yuchao Dai
135
53
0
12 Oct 2023
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment
International Conference on Machine Learning (ICML), 2023
Jaewoo Lee
Jaehong Yoon
Wonjae Kim
Yunji Kim
Sung Ju Hwang
CLL
239
1
0
12 Oct 2023
Deep Video Inpainting Guided by Audio-Visual Self-Supervision
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Kyuyeon Kim
Junsik Jung
Woo Jae Kim
Sung-eui Yoon
SSL
145
1
0
11 Oct 2023
CM-PIE: Cross-modal perception for interactive-enhanced audio-visual video parsing
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yaru Chen
Ruohao Guo
Xubo Liu
Peipei Wu
Guangyao Li
Zhenbo Li
Wenwu Wang
209
11
0
11 Oct 2023
What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?
Siting Li
Chenzhuang Du
Yue Zhao
Yu Huang
Hang Zhao
171
6
0
10 Oct 2023
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Xiulong Liu
Zhikang Dong
Peng Zhang
172
32
0
10 Oct 2023
Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization
Edward Fish
Jon Weinbren
Andrew Gilbert
124
1
0
05 Oct 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
International Conference on Learning Representations (ICLR), 2023
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
...
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLM
MLLM
556
328
0
03 Oct 2023
Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey
IEEE Communications Surveys and Tutorials (COMST), 2023
Sicong Liu
Bin Guo
Cheng Fang
Ziqi Wang
Shiyan Luo
Zimu Zhou
Zhiwen Yu
AI4CE
239
35
0
27 Sep 2023
Sound Source Localization is All about Cross-Modal Alignment
IEEE International Conference on Computer Vision (ICCV), 2023
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
175
30
0
19 Sep 2023
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
ACM Multimedia (ACM MM), 2023
Kexin Li
Zongxin Yang
Lei Chen
Yezhou Yang
Jun Xiao
VOS
198
79
0
18 Sep 2023
Class-Incremental Grouping Network for Continual Audio-Visual Learning
IEEE International Conference on Computer Vision (ICCV), 2023
Shentong Mo
Weiguo Pian
Yapeng Tian
CLL
VLM
180
31
0
11 Sep 2023
Text-to-feature diffusion for audio-visual few-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
VLM
170
3
0
07 Sep 2023
Audio-Visual Class-Incremental Learning
IEEE International Conference on Computer Vision (ICCV), 2023
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
186
33
0
21 Aug 2023
Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions
PLoS ONE (PLoS ONE), 2023
Michael Joannou
P. Rotshtein
U. Noppeney
141
1
0
18 Aug 2023
Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
J. Wilkins
Justin Salamon
Magdalena Fuentes
J. P. Bello
Oriol Nieto
CLIP
108
6
0
17 Aug 2023
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
IEEE International Conference on Computer Vision (ICCV), 2023
Hong Li
Xingyu Li
Pengbo Hu
Yinuo Lei
Chunxiao Li
Yi Zhou
211
63
0
15 Aug 2023
Progressive Spatio-temporal Perception for Audio-Visual Question Answering
ACM Multimedia (ACM MM), 2023
Guangyao Li
Wenxuan Hou
Di Hu
204
41
0
10 Aug 2023
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
ACM Multimedia (ACM MM), 2023
Tianyu Liu
Peng Zhang
Wei Huang
Yufei Zha
Tao You
Yanni Zhang
SSL
103
4
0
09 Aug 2023
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers
European Workshop on Visual Information Processing (EUVIP), 2023
Muhammad Bilal Shaikh
Douglas Chai
Syed Mohammed Shamsul Islam
Naveed Akhtar
281
7
0
01 Aug 2023
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models
Asian Conference on Computer Vision (ACCV), 2023
Chao Huang
Susan Liang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
158
4
0
31 Jul 2023
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
ACM Symposium on User Interface Software and Technology (UIST), 2023
Zheng Zhang
Zheng Ning
Chenliang Xu
Yapeng Tian
Toby Jia-Jun Li
215
11
0
27 Jul 2023
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
IEEE Transactions on Image Processing (IEEE TIP), 2023
Peng Wu
Jing Liu
Xiangteng He
Yuxin Peng
Peng Wang
Yanning Zhang
362
45
0
24 Jul 2023
Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event Localization
K. Ramakrishnan
110
0
0
12 Jul 2023
FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction
Gang Wang
Peng Zhang
Jun Xiong
Fei Yang
Wei Huang
Yufei Zha
CVBM
177
1
0
08 Jul 2023
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing
Jie Fu
Junyu Gao
Changsheng Xu
227
17
0
05 Jul 2023
AVSegFormer: Audio-Visual Segmentation with Transformer
AAAI Conference on Artificial Intelligence (AAAI), 2023
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
VOS
316
77
0
03 Jul 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
IEEE International Conference on Robotics and Automation (ICRA), 2023
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
289
14
0
01 Jun 2023
Previous
1
2
3
4
5
6
Next