ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.08842
  4. Cited By
Audio-Visual Event Localization in Unconstrained Videos

Audio-Visual Event Localization in Unconstrained Videos

23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
ArXivPDFHTML

Papers citing "Audio-Visual Event Localization in Unconstrained Videos"

50 / 252 papers shown
Title
Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event
  Localization
Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event Localization
K. Ramakrishnan
15
0
0
12 Jul 2023
FTFDNet: Learning to Detect Talking Face Video Manipulation with
  Tri-Modality Interaction
FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction
Gang Wang
Peng Zhang
Jun Xiong
Fei Yang
Wei Huang
Yufei Zha
CVBM
9
1
0
08 Jul 2023
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised
  Audio-Visual Video Parsing
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing
Jie Fu
Junyu Gao
Changsheng Xu
26
6
0
05 Jul 2023
AVSegFormer: Audio-Visual Segmentation with Transformer
AVSegFormer: Audio-Visual Segmentation with Transformer
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
VOS
29
46
0
03 Jul 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household
  Agents that See and Hear
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
34
11
0
01 Jun 2023
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language
  Perspective
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Yingying Fan
Yu Wu
Bo Du
Yutian Lin
26
8
0
01 Jun 2023
A Unified Audio-Visual Learning Framework for Localization, Separation,
  and Recognition
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
Shentong Mo
Pedro Morgado
30
21
0
30 May 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event
  Parser
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
18
10
0
27 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
18
17
0
22 May 2023
Connecting Multi-modal Contrastive Representations
Connecting Multi-modal Contrastive Representations
Zehan Wang
Yang Zhao
Xize Cheng
Haifeng Huang
Jiageng Liu
...
Lin Li
Yongqiang Wang
Aoxiong Yin
Ziang Zhang
Zhou Zhao
17
22
0
22 May 2023
Target-Aware Spatio-Temporal Reasoning via Answering Questions in
  Dynamics Audio-Visual Scenarios
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios
Yuanyuan Jiang
Jianqin Yin
19
7
0
21 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
32
90
0
14 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOS
ViT
29
1
0
12 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze
  Anticipation
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
19
8
0
06 May 2023
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and
  Segmentation
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation
Shentong Mo
Yapeng Tian
VLM
82
49
0
03 May 2023
Audio-Visual Grouping Network for Sound Localization from Mixtures
Audio-Visual Grouping Network for Sound Localization from Mixtures
Shentong Mo
Yapeng Tian
37
42
0
29 Mar 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Kun Su
Kaizhi Qian
Eli Shlizerman
Antonio Torralba
Chuang Gan
VGen
AI4CE
35
20
0
29 Mar 2023
Egocentric Audio-Visual Object Localization
Egocentric Audio-Visual Object Localization
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
29
28
0
23 Mar 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale
  Benchmark and Baseline
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Tiantian Geng
Teng Wang
Jinming Duan
Runmin Cong
Feng Zheng
25
28
0
22 Mar 2023
Learning Audio-Visual Source Localization via False Negative Aware
  Contrastive Learning
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
Weixuan Sun
Jiayi Zhang
Jianyuan Wang
Zheyuan Liu
Yiran Zhong
Tianpeng Feng
Yandong Guo
Yanhao Zhang
Nick Barnes
SSL
19
43
0
20 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
31
13
0
04 Mar 2023
Adapter Incremental Continual Learning of Efficient Audio Spectrogram
  Transformers
Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers
Nithish Muthuchamy Selvaraj
Xiaobao Guo
A. Kong
Bingquan Shen
Alex C. Kot
CLL
23
8
0
28 Feb 2023
Context Understanding in Computer Vision: A Survey
Context Understanding in Computer Vision: A Survey
Xuan Wang
Zhigang Zhu
11
45
0
10 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
19
1
0
07 Feb 2023
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
  Synthesis
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
VGen
32
27
0
04 Feb 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Jaesung Huh
Jacob Chalk
Evangelos Kazakos
Dima Damen
Andrew Zisserman
EgoV
13
41
0
01 Feb 2023
Audio-Visual Segmentation with Semantics
Audio-Visual Segmentation with Semantics
Jinxing Zhou
Xuyang Shen
Jianyuan Wang
Jiayi Zhang
Weixuan Sun
...
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
38
37
0
30 Jan 2023
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Mohit Bansal
Gedas Bertasius
26
73
0
15 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
32
43
0
09 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
16
26
0
07 Dec 2022
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
35
0
0
05 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
23
51
0
28 Nov 2022
LISA: Localized Image Stylization with Audio via Implicit Neural
  Representation
LISA: Localized Image Stylization with Audio via Implicit Neural Representation
Seung Hyun Lee
Chanyoung Kim
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
22
3
0
21 Nov 2022
Contrastive Positive Sample Propagation along the Audio-Visual Event
  Line
Contrastive Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Dan Guo
Meng Wang
24
54
0
18 Nov 2022
The Lean Data Scientist: Recent Advances towards Overcoming the Data
  Bottleneck
The Lean Data Scientist: Recent Advances towards Overcoming the Data Bottleneck
Chen Shani
Jonathan Zarecki
Dafna Shahaf
21
6
0
15 Nov 2022
PMR: Prototypical Modal Rebalance for Multimodal Learning
PMR: Prototypical Modal Rebalance for Multimodal Learning
Yunfeng Fan
Wenchao Xu
Haozhao Wang
Junxiao Wang
Song Guo
23
60
0
14 Nov 2022
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal
  Retrieval
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal Retrieval
Donghuo Zeng
Yanan Wang
Jianming Wu
K. Ikeda
19
4
0
07 Nov 2022
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source
  Separation
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
Moitreya Chatterjee
N. Ahuja
A. Cherian
33
11
0
29 Oct 2022
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using
  Permutation-Free Loss Function
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function
Qing Wang
Hang Chen
Yannan Jiang
Zhe Wang
Yuyang Wang
Jun Du
Chin-Hui Lee
14
4
0
26 Oct 2022
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio
  Visual Event Localization
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization
Tanvir Mahmud
Diana Marculescu
CLIP
11
31
0
11 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
33
16
0
05 Oct 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
16
60
0
07 Sep 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
44
55
0
20 Aug 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge J. Belongie
19
10
0
21 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Temporal and cross-modal attention for audio-visual zero-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
30
25
0
20 Jul 2022
Online Video Instance Segmentation via Robust Context Fusion
Online Video Instance Segmentation via Robust Context Fusion
Xiang Li
Jinglu Wang
Xiaohao Xu
Bhiksha Raj
Yan Lu
35
5
0
12 Jul 2022
Audio-Visual Segmentation
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
J. Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
28
110
0
11 Jul 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
33
25
0
20 Jun 2022
Discrete Contrastive Diffusion for Cross-Modal Music and Image
  Generation
Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation
Ye Zhu
Yuehua Wu
Kyle Olszewski
Jian Ren
Sergey Tulyakov
Yan Yan
DiffM
20
47
0
15 Jun 2022
Past and Future Motion Guided Network for Audio Visual Event
  Localization
Past and Future Motion Guided Network for Audio Visual Event Localization
Ting-Yen Chen
Jianqin Yin
Jin Tang
16
2
0
08 May 2022
Previous
123456
Next