ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.09013
  4. Cited By
Self-Supervised Audio-Visual Co-Segmentation

Self-Supervised Audio-Visual Co-Segmentation

18 April 2019
Andrew Rouditchenko
Hang Zhao
Chuang Gan
Josh H. McDermott
Antonio Torralba
    VLMSSL
ArXiv (abs)PDFHTML

Papers citing "Self-Supervised Audio-Visual Co-Segmentation"

50 / 68 papers shown
Title
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Theodore Barfoot
Luis C. Garcia-Peraza-Herrera
Samet Akcay
Ben Glocker
Tom Vercauteren
UQCV
127
0
0
04 Jun 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
131
29
0
02 Jan 2025
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Shentong Mo
Haofan Wang
72
2
0
31 Aug 2024
Unveiling and Mitigating Bias in Audio Visual Segmentation
Unveiling and Mitigating Bias in Audio Visual Segmentation
Peiwen Sun
Honggang Zhang
Di Hu
91
3
0
23 Jul 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
141
6
0
06 Jun 2024
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise
  Pseudo Labeling
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
83
19
0
03 Jun 2024
Unsupervised Audio-Visual Segmentation with Modality Alignment
Unsupervised Audio-Visual Segmentation with Modality Alignment
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Jiangkang Deng
Xiatian Zhu
VOS
71
6
0
21 Mar 2024
Weakly-Supervised Audio-Visual Segmentation
Weakly-Supervised Audio-Visual Segmentation
Shentong Mo
Bhiksha Raj
VOS
88
13
0
25 Nov 2023
Discovering Sounding Objects by Audio Queries for Audio Visual
  Segmentation
Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation
Shaofei Huang
Han Li
Yuqing Wang
Hongji Zhu
Jiao Dai
Jizhong Han
Wenge Rong
Si Liu
VOS
53
19
0
18 Sep 2023
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Xiatian Zhu
VOS
59
5
0
13 Sep 2023
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Shentong Mo
Weiguo Pian
Yapeng Tian
CLLVLM
87
26
0
11 Sep 2023
AVSegFormer: Audio-Visual Segmentation with Transformer
AVSegFormer: Audio-Visual Segmentation with Transformer
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
VOS
113
52
0
03 Jul 2023
Visually-Guided Sound Source Separation with Audio-Visual Predictive
  Coding
Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding
Zengjie Song
Zhaoxiang Zhang
50
1
0
19 Jun 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event
  Parser
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
50
12
0
27 May 2023
Annotation-free Audio-Visual Segmentation
Annotation-free Audio-Visual Segmentation
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya Zhang
Weidi Xie
VOSVLM
109
30
0
18 May 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Kun Su
Kaizhi Qian
Eli Shlizerman
Antonio Torralba
Chuang Gan
VGenAI4CE
82
20
0
29 Mar 2023
Egocentric Audio-Visual Object Localization
Egocentric Audio-Visual Object Localization
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
58
35
0
23 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
89
14
0
04 Mar 2023
Audio-Visual Segmentation with Semantics
Audio-Visual Segmentation with Semantics
Jinxing Zhou
Xuyang Shen
Jianyuan Wang
Jiayi Zhang
Weixuan Sun
...
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
93
43
0
30 Jan 2023
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled
  Videos
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLMCLIP
81
29
0
14 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
96
52
0
28 Nov 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
132
55
0
20 Aug 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated
  Open-Domain On-Screen Sound Separation
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
119
30
0
20 Jul 2022
Audio-Visual Segmentation
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
Jing Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
93
116
0
11 Jul 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
137
20
0
07 Jul 2022
Multi-Modal Multi-Correlation Learning for Audio-Visual Speech
  Separation
Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation
Xiaoyu Wang
Xiangyu Kong
Xiulian Peng
Yan Lu
62
6
0
04 Jul 2022
Noise-Tolerant Learning for Audio-Visual Action Recognition
Noise-Tolerant Learning for Audio-Visual Action Recognition
Haocheng Han
Qinghua Zheng
Minnan Luo
Kaiyao Miao
Feng Tian
Yuanchun Chen
NoLa
98
9
0
16 May 2022
The Sound of Bounding-Boxes
The Sound of Bounding-Boxes
Takashi Oya
Shohei Iwase
Shigeo Morishima
45
2
0
30 Mar 2022
Localizing Visual Sounds the Easy Way
Localizing Visual Sounds the Easy Way
Shentong Mo
Pedro Morgado
165
81
0
17 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
97
109
0
02 Mar 2022
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
82
27
0
21 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning
  for Visual Sound Separation
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Tanzila Rahman
Mengyu Yang
Leonid Sigal
ViT
69
8
0
26 Oct 2021
Multi-Modulation Network for Audio-Visual Event Localization
Multi-Modulation Network for Audio-Visual Event Localization
Hao Wang
Zhengjun Zha
Liang Li
Xuejin Chen
Jiebo Luo
38
2
0
26 Aug 2021
Saying the Unseen: Video Descriptions via Dialog Agents
Saying the Unseen: Video Descriptions via Dialog Agents
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
69
6
0
26 Jun 2021
Improving Multi-Modal Learning with Uni-Modal Teachers
Improving Multi-Modal Learning with Uni-Modal Teachers
Chenzhuang Du
Tingle Li
Yichen Liu
Zixin Wen
Tianyu Hua
Yue Wang
Hang Zhao
57
47
0
21 Jun 2021
Improving On-Screen Sound Separation for Open-Domain Videos with
  Audio-Visual Self-Attention
Improving On-Screen Sound Separation for Open-Domain Videos with Audio-Visual Self-Attention
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
VLM
81
8
0
17 Jun 2021
Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning
Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning
Shaobo Min
Qi Dai
Hongtao Xie
Chuang Gan
Yongdong Zhang
Jingdong Wang
SSL
54
7
0
13 Jun 2021
Detector-Free Weakly Supervised Grounding by Separation
Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
92
24
0
20 Apr 2021
Self-supervised object detection from audio-visual correspondence
Self-supervised object detection from audio-visual correspondence
Triantafyllos Afouras
Yuki M. Asano
Francois Fagan
Andrea Vedaldi
Florian Metze
SSL
107
47
0
13 Apr 2021
Localizing Visual Sounds the Hard Way
Localizing Visual Sounds the Hard Way
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
90
191
0
06 Apr 2021
Can audio-visual integration strengthen robustness under multimodal
  attacks?
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
102
39
0
05 Apr 2021
Learning Audio-Visual Correlations from Variational Cross-Modal
  Generation
Learning Audio-Visual Correlations from Variational Cross-Modal Generation
Ye Zhu
Yu Wu
Hugo Latapie
Yi Yang
Yan Yan
SSL
113
21
0
05 Feb 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
243
202
0
08 Jan 2021
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of
  On-Screen Sounds
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis
Scott Wisdom
A. Jansen
Shawn Hershey
Tal Remez
D. Ellis
J. Hershey
83
71
0
02 Nov 2020
The Cone of Silence: Speech Separation by Localization
The Cone of Silence: Speech Separation by Localization
Teerapat Jenrungrot
V. Jayaram
S. M. Seitz
Ira Kemelmacher-Shlizerman
78
56
0
12 Oct 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
126
256
0
10 Aug 2020
Learning Video Representations from Textual Web Supervision
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
113
48
0
29 Jul 2020
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events
Chuang Gan
Xiaoyu Chen
Phillip Isola
Antonio Torralba
J. Tenenbaum
58
7
0
27 Jul 2020
Augmentation adversarial training for self-supervised speaker
  recognition
Augmentation adversarial training for self-supervised speaker recognition
Jaesung Huh
Hee-Soo Heo
Jingu Kang
Shinji Watanabe
Joon Son Chung
SSL
126
76
0
23 Jul 2020
Foley Music: Learning to Generate Music from Videos
Foley Music: Learning to Generate Music from Videos
Chuang Gan
Deng Huang
Peihao Chen
J. Tenenbaum
Antonio Torralba
VGen
75
139
0
21 Jul 2020
12
Next