ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.08842
  4. Cited By
Audio-Visual Event Localization in Unconstrained Videos

Audio-Visual Event Localization in Unconstrained Videos

23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
ArXivPDFHTML

Papers citing "Audio-Visual Event Localization in Unconstrained Videos"

50 / 252 papers shown
Title
TIM: A Time Interval Machine for Audio-Visual Action Recognition
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk
Jaesung Huh
Evangelos Kazakos
Andrew Zisserman
Dima Damen
33
9
0
08 Apr 2024
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
Tiantian Geng
Teng Wang
Yanfu Zhang
Jinming Duan
Weili Guan
Feng Zheng
21
0
0
04 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
Learning to Visually Localize Sound Sources from Mixtures without Prior
  Source Knowledge
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim
Sung-Jin Um
Sangmin Lee
Jung Uk Kim
33
4
0
26 Mar 2024
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition
Shijian Deng
Erin E. Kosloski
Siddhi Patel
Zeke A. Barnett
Yiyang Nan
...
William T. Doan
Matthew Wang
Harsh Singh
P. Rollins
Yapeng Tian
31
4
0
22 Mar 2024
Answering Diverse Questions via Text Attached with Key Audio-Visual
  Clues
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Qilang Ye
Zitong Yu
Xin Liu
33
1
0
11 Mar 2024
Reframe Anything: LLM Agent for Open World Video Reframing
Reframe Anything: LLM Agent for Open World Video Reframing
Jiawang Cao
Yongliang Wu
Weiheng Chi
Wenbo Zhu
Ziyue Su
Jay Wu
31
3
0
10 Mar 2024
Text-to-Audio Generation Synchronized with Videos
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
37
17
0
08 Mar 2024
Unlocking the Potential of Multimodal Unified Discrete Representation
  through Training-Free Codebook Optimization and Hierarchical Alignment
Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment
Hai Huang
Yan Xia
Shengpeng Ji
Shulei Wang
Hanting Wang
Jieming Zhu
Zhenhua Dong
Zhou Zhao
27
6
0
08 Mar 2024
SPICA: Interactive Video Content Exploration through Augmented Audio
  Descriptions for Blind or Low-Vision Viewers
SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers
Zheng Ning
Brianna L Wimer
Kaiwen Jiang
Keyi Chen
Jerrick Ban
Yapeng Tian
Yuhang Zhao
T. Li
34
15
0
11 Feb 2024
Multimodal Action Quality Assessment
Multimodal Action Quality Assessment
Ling-an Zeng
Wei-Shi Zheng
43
13
0
31 Jan 2024
Audio-Infused Automatic Image Colorization by Exploiting Audio Scene
  Semantics
Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics
Pengcheng Zhao
Yanxiang Chen
Yang Zhao
Wei Jia
Zhao Zhang
Ronggang Wang
Richang Hong
DiffM
22
1
0
24 Jan 2024
On the Audio Hallucinations in Large Audio-Video Language Models
On the Audio Hallucinations in Large Audio-Video Language Models
Taichi Nishimura
Shota Nakada
Masayoshi Kondo
VLM
25
5
0
18 Jan 2024
Hierarchical Augmentation and Distillation for Class Incremental
  Audio-Visual Video Recognition
Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition
Yukun Zuo
Hantao Yao
Liansheng Zhuang
Changsheng Xu
15
2
0
11 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
38
6
0
08 Jan 2024
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video
  Classification
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
25
5
0
08 Jan 2024
Leveraging Visual Supervision for Array-based Active Speaker Detection
  and Localization
Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization
Davide Berghi
Philip J. B. Jackson
35
5
0
21 Dec 2023
Object-aware Adaptive-Positivity Learning for Audio-Visual Question
  Answering
Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering
Zhangbin Li
Dan Guo
Jinxing Zhou
Jing Zhang
Meng Wang
24
11
0
20 Dec 2023
Segment Beyond View: Handling Partially Missing Modality for
  Audio-Visual Semantic Segmentation
Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
Renjie Wu
Hu Wang
Feras Dayoub
Hsiang-Ting Chen
17
5
0
14 Dec 2023
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense
  Interactions through Masked Modeling
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo
Pedro Morgado
19
13
0
02 Dec 2023
Centre Stage: Centricity-based Audio-Visual Temporal Action Detection
Centre Stage: Centricity-based Audio-Visual Temporal Action Detection
Hanyuan Wang
Majid Mirmehdi
Dima Damen
Toby Perrett
28
2
0
28 Nov 2023
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video
  Parsing
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
Yating Xu
Conghui Hu
Gim Hee Lee
17
2
0
14 Nov 2023
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual
  Downstream Tasks
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks
Haoyi Duan
Yan Xia
Mingze Zhou
Li Tang
Jieming Zhu
Zhou Zhao
VLM
19
17
0
09 Nov 2023
Can CLIP Help Sound Source Localization?
Can CLIP Help Sound Source Localization?
Sooyoung Park
Arda Senocak
Joon Son Chung
22
6
0
07 Nov 2023
Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based
  Wireless Communication Systems
Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based Wireless Communication Systems
Jung-Woo Chang
Ke Sun
Nasimeh Heydaribeni
Seira Hidano
Xinyu Zhang
F. Koushanfar
AAML
17
1
0
01 Nov 2023
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
Yuxin Ye
Wenming Yang
Yapeng Tian
26
10
0
31 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
21
9
0
25 Oct 2023
Extending Multi-modal Contrastive Representations
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
19
5
0
13 Oct 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
Yuxin Mao
Jing Zhang
Mochu Xiang
Yiran Zhong
Yuchao Dai
35
34
0
12 Oct 2023
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal
  Localized Alignment
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment
Jaewoo Lee
Jaehong Yoon
Wonjae Kim
Yunji Kim
Sung Ju Hwang
CLL
14
1
0
12 Oct 2023
Deep Video Inpainting Guided by Audio-Visual Self-Supervision
Deep Video Inpainting Guided by Audio-Visual Self-Supervision
Kyuyeon Kim
Junsik Jung
Woo Jae Kim
Sung-eui Yoon
SSL
23
1
0
11 Oct 2023
CM-PIE: Cross-modal perception for interactive-enhanced audio-visual
  video parsing
CM-PIE: Cross-modal perception for interactive-enhanced audio-visual video parsing
Yaru Chen
Ruohao Guo
Xubo Liu
Peipei Wu
Guangyao Li
Zhenbo Li
Wenwu Wang
32
7
0
11 Oct 2023
What Makes for Robust Multi-Modal Models in the Face of Missing
  Modalities?
What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?
Siting Li
Chenzhuang Du
Yue Zhao
Yu Huang
Hang Zhao
19
4
0
10 Oct 2023
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for
  Unbiased Question-Answering
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
Xiulong Liu
Zhikang Dong
Peng Zhang
22
21
0
10 Oct 2023
Multi-Resolution Audio-Visual Feature Fusion for Temporal Action
  Localization
Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization
Edward Fish
Jon Weinbren
Andrew Gilbert
31
0
0
05 Oct 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by
  Language-based Semantic Alignment
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
...
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLM
MLLM
27
202
0
03 Oct 2023
Enabling Resource-efficient AIoT System with Cross-level Optimization: A
  survey
Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey
Sicong Liu
Bin Guo
Cheng Fang
Ziqi Wang
Shiyan Luo
Zimu Zhou
Zhiwen Yu
AI4CE
23
22
0
27 Sep 2023
Sound Source Localization is All about Cross-Modal Alignment
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
21
18
0
19 Sep 2023
CATR: Combinatorial-Dependence Audio-Queried Transformer for
  Audio-Visual Video Segmentation
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
Kexin Li
Zongxin Yang
Lei Chen
Yezhou Yang
Jun Xiao
VOS
37
51
0
18 Sep 2023
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Shentong Mo
Weiguo Pian
Yapeng Tian
CLL
VLM
35
23
0
11 Sep 2023
Text-to-feature diffusion for audio-visual few-shot learning
Text-to-feature diffusion for audio-visual few-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
VLM
27
2
0
07 Sep 2023
Audio-Visual Class-Incremental Learning
Audio-Visual Class-Incremental Learning
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
20
28
0
21 Aug 2023
Audiovisual Moments in Time: A Large-Scale Annotated Dataset of
  Audiovisual Actions
Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions
Michael Joannou
P. Rotshtein
U. Noppeney
13
0
0
18 Aug 2023
Bridging High-Quality Audio and Video via Language for Sound Effects
  Retrieval from Visual Queries
Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
J. Wilkins
Justin Salamon
Magdalena Fuentes
J. P. Bello
Oriol Nieto
CLIP
14
5
0
17 Aug 2023
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Hong Li
Xingyu Li
Pengbo Hu
Yinuo Lei
Chunxiao Li
Yi Zhou
28
20
0
15 Aug 2023
Progressive Spatio-temporal Perception for Audio-Visual Question
  Answering
Progressive Spatio-temporal Perception for Audio-Visual Question Answering
Guangyao Li
Wenxuan Hou
Di Hu
29
26
0
10 Aug 2023
Induction Network: Audio-Visual Modality Gap-Bridging for
  Self-Supervised Sound Source Localization
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
Tianyu Liu
Peng Zhang
Wei Huang
Yufei Zha
Tao You
Yanni Zhang
SSL
19
2
0
09 Aug 2023
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using
  Transformers
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers
Muhammad Bilal Shaikh
Douglas Chai
Syed Mohammed Shamsul Islam
Naveed Akhtar
17
5
0
01 Aug 2023
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
Zheng Zhang
Zheng Ning
Chenliang Xu
Yapeng Tian
Toby Jia-Jun Li
59
6
0
27 Jul 2023
Towards Video Anomaly Retrieval from Video Anomaly Detection: New
  Benchmarks and Model
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
Peng Wu
Jing Liu
Xiangteng He
Yuxin Peng
Peng Wang
Yanning Zhang
35
29
0
24 Jul 2023
Previous
123456
Next