Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.10558
Cited By
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
21 July 2020
Yapeng Tian
Dingzeyu Li
Chenliang Xu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing"
50 / 129 papers shown
Title
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio
Xudong Xu
Dejan Marković
Jacob Sandakly
Todd Keebler
Steven Krenn
Alexander Richard
20
2
0
01 Nov 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
Yuxin Mao
Jing Zhang
Mochu Xiang
Yiran Zhong
Yuchao Dai
40
34
0
12 Oct 2023
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment
Jaewoo Lee
Jaehong Yoon
Wonjae Kim
Yunji Kim
Sung Ju Hwang
CLL
16
1
0
12 Oct 2023
CM-PIE: Cross-modal perception for interactive-enhanced audio-visual video parsing
Yaru Chen
Ruohao Guo
Xubo Liu
Peipei Wu
Guangyao Li
Zhenbo Li
Wenwu Wang
34
7
0
11 Oct 2023
RegBN: Batch Normalization of Multimodal Data with Regularization
Morteza Ghahremani
Christian Wachinger
28
6
0
01 Oct 2023
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
Kexin Li
Zongxin Yang
Lei Chen
Yezhou Yang
Jun Xiao
VOS
39
51
0
18 Sep 2023
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Shentong Mo
Weiguo Pian
Yapeng Tian
CLL
VLM
37
23
0
11 Sep 2023
Audio-Visual Class-Incremental Learning
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
33
28
0
21 Aug 2023
Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions
Michael Joannou
P. Rotshtein
U. Noppeney
21
0
0
18 Aug 2023
Improving Audio-Visual Segmentation with Bidirectional Generation
Dawei Hao
Yuxin Mao
Bowen He
Xiaodong Han
Yuchao Dai
Yiran Zhong
VOS
VGen
36
30
0
16 Aug 2023
Progressive Spatio-temporal Perception for Audio-Visual Question Answering
Guangyao Li
Wenxuan Hou
Di Hu
31
26
0
10 Aug 2023
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
Zheng Zhang
Zheng Ning
Chenliang Xu
Yapeng Tian
Toby Jia-Jun Li
59
6
0
27 Jul 2023
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
Peng Wu
Jing Liu
Xiangteng He
Yuxin Peng
Peng Wang
Yanning Zhang
48
30
0
24 Jul 2023
Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event Localization
K. Ramakrishnan
15
0
0
12 Jul 2023
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing
Jie Fu
Junyu Gao
Changsheng Xu
31
6
0
05 Jul 2023
AVSegFormer: Audio-Visual Segmentation with Transformer
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
VOS
34
46
0
03 Jul 2023
Learning Unseen Modality Interaction
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
27
3
0
22 Jun 2023
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Yingying Fan
Yu Wu
Bo Du
Yutian Lin
34
8
0
01 Jun 2023
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
Shentong Mo
Pedro Morgado
38
21
0
30 May 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
23
10
0
27 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
20
17
0
22 May 2023
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios
Yuanyuan Jiang
Jianqin Yin
19
7
0
21 May 2023
Annotation-free Audio-Visual Segmentation
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya Zhang
Weidi Xie
VOS
VLM
39
28
0
18 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
43
90
0
14 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOS
ViT
37
1
0
12 May 2023
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation
Shentong Mo
Yapeng Tian
VLM
87
49
0
03 May 2023
Audio-Visual Grouping Network for Sound Localization from Mixtures
Shentong Mo
Yapeng Tian
45
42
0
29 Mar 2023
Egocentric Audio-Visual Object Localization
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
29
30
0
23 Mar 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Tiantian Geng
Teng Wang
Jinming Duan
Runmin Cong
Feng Zheng
30
28
0
22 Mar 2023
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
Weixuan Sun
Jiayi Zhang
Jianyuan Wang
Zheyuan Liu
Yiran Zhong
Tianpeng Feng
Yandong Guo
Yanhao Zhang
Nick Barnes
SSL
27
44
0
20 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
36
13
0
04 Mar 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Jaesung Huh
Jacob Chalk
Evangelos Kazakos
Dima Damen
Andrew Zisserman
EgoV
18
41
0
01 Feb 2023
Audio-Visual Segmentation with Semantics
Jinxing Zhou
Xuyang Shen
Jianyuan Wang
Jiayi Zhang
Weixuan Sun
...
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
46
37
0
30 Jan 2023
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
31
73
0
15 Dec 2022
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Yating Xu
Conghui Hu
G. Lee
VGen
43
0
0
09 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
34
26
0
07 Dec 2022
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
43
0
0
05 Dec 2022
Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss
Shailza Sharma
Abhinav Dhall
Vinay Kumar
V. Bawa
CVBM
27
0
0
20 Nov 2022
Contrastive Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Dan Guo
Meng Wang
29
53
0
18 Nov 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
33
16
0
05 Oct 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
18
60
0
07 Sep 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo
Pedro Morgado
83
64
0
30 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
46
55
0
20 Aug 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
27
10
0
21 Jul 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
39
30
0
20 Jul 2022
Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection
Jiashuo Yu
Jin-Yuan Liu
Ying Cheng
Rui Feng
Yuejie Zhang
21
34
0
12 Jul 2022
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
Jun Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
33
110
0
11 Jul 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
52
19
0
07 Jul 2022
Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation
Jinxian Liu
Chen Ju
Weidi Xie
Ya Zhang
23
38
0
26 Jun 2022
Probing Visual-Audio Representation for Video Highlight Detection via Hard-Pairs Guided Contrastive Learning
Shuaicheng Li
Feng Zhang
Kunlin Yang
Lin-Na Liu
Shinan Liu
Jun Hou
Shuai Yi
45
8
0
21 Jun 2022
Previous
1
2
3
Next