ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.07750
  4. Cited By
Co-Separating Sounds of Visual Objects

Co-Separating Sounds of Visual Objects

16 April 2019
Ruohan Gao
Kristen Grauman
ArXivPDFHTML

Papers citing "Co-Separating Sounds of Visual Objects"

41 / 41 papers shown
Title
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
40
28
0
02 Jan 2025
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
Xavier Juanola
Gloria Haro
Magdalena Fuentes
31
2
0
01 Oct 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
40
2
0
07 Jul 2024
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large
  Multi-Modal Models
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models
David Kurzendörfer
Otniel-Bogdan Mercea
A. Sophia Koepke
Zeynep Akata
VLM
CLIP
26
2
0
09 Apr 2024
Robust Active Speaker Detection in Noisy Environments
Robust Active Speaker Detection in Noisy Environments
Siva Sai Nagender Vasireddy
Chenxu Zhang
Xiaohu Guo
Yapeng Tian
32
0
0
27 Mar 2024
Sound Source Localization is All about Cross-Modal Alignment
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
19
18
0
19 Sep 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
79
6
0
05 May 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
Hanlin Wang
Yilu Wu
Sheng Guo
Limin Wang
VGen
DiffM
63
30
0
26 Mar 2023
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Yating Xu
Conghui Hu
G. Lee
VGen
35
0
0
09 Dec 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo
Pedro Morgado
79
64
0
30 Aug 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated
  Open-Domain On-Screen Sound Separation
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
31
29
0
20 Jul 2022
Audio-Visual Segmentation
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
J. Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
28
110
0
11 Jul 2022
Heterogeneous Separation Consistency Training for Adaptation of
  Unsupervised Speech Separation
Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation
Jiangyu Han
Yanhua Long
20
6
0
23 Apr 2022
Text-Driven Separation of Arbitrary Sounds
Text-Driven Separation of Arbitrary Sounds
Kevin Kilgour
Beat Gfeller
Qingqing Huang
A. Jansen
Scott Wisdom
Marco Tagliasacchi
22
30
0
12 Apr 2022
SoundBeam: Target sound extraction conditioned on sound-class labels and
  enrollment clues for increased performance and continuous learning
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
Marc Delcroix
Jorge Bennasar Vázquez
Tsubasa Ochiai
K. Kinoshita
Yasunori Ohishi
S. Araki
VLM
22
31
0
08 Apr 2022
The Sound of Bounding-Boxes
The Sound of Bounding-Boxes
Takashi Oya
Shohei Iwase
Shigeo Morishima
16
2
0
30 Mar 2022
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
Juan F. Montesinos
V. S. Kadandale
G. Haro
ViT
23
19
0
08 Mar 2022
Visual Sound Localization in the Wild by Cross-Modal Interference
  Erasing
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
6
25
0
13 Feb 2022
Active Audio-Visual Separation of Dynamic Sound Sources
Active Audio-Visual Separation of Dynamic Sound Sources
Sagnik Majumder
Kristen Grauman
19
21
0
02 Feb 2022
Soundify: Matching Sound Effects to Video
Soundify: Matching Sound Effects to Video
David Chuan-En Lin
Anastasis Germanidis
Cristobal Valenzuela
Yining Shi
Nikolas Martelaro
25
16
0
17 Dec 2021
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
Zhijian Yang
Xiaoran Fan
Volkan Isler
H. Park
3DH
16
6
0
01 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
13
27
0
21 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning
  for Visual Sound Separation
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Tanzila Rahman
Mengyu Yang
Leonid Sigal
ViT
21
8
0
26 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
224
1,018
0
13 Oct 2021
A cappella: Audio-visual Singing Voice Separation
A cappella: Audio-visual Singing Voice Separation
Juan F. Montesinos
V. S. Kadandale
G. Haro
38
16
0
20 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
13
53
0
13 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
19
34
0
01 Apr 2021
Beyond Image to Depth: Improving Depth Prediction using Echoes
Beyond Image to Depth: Improving Depth Prediction using Echoes
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
26
37
0
15 Mar 2021
Music source separation conditioned on 3D point clouds
Music source separation conditioned on 3D point clouds
Francesc Lluís
V. Chatziioannou
A. Hofmann
3DPC
24
5
0
03 Feb 2021
Learning Representations from Audio-Visual Spatial Alignment
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
13
121
0
03 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of
  On-Screen Sounds
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis
Scott Wisdom
A. Jansen
Shawn Hershey
Tal Remez
D. Ellis
J. Hershey
26
68
0
02 Nov 2020
Transcription Is All You Need: Learning to Separate Musical Mixtures
  with Score as Supervision
Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision
Yun-Ning Hung
G. Wichern
Jonathan Le Roux
17
12
0
22 Oct 2020
Audio-Visual Event Localization via Recursive Fusion by Joint
  Co-Attention
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
25
59
0
14 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
17
250
0
10 Aug 2020
Multiple Sound Sources Localization from Coarse to Fine
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
23
153
0
13 Jul 2020
AVLnet: Learning Audio-Visual Language Representations from
  Instructional Videos
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David F. Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and
  Sound
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
16
75
0
11 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter
  Network
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
14
23
0
04 Jun 2020
Conditioned Source Separation for Music Instrument Performances
Conditioned Source Separation for Music Instrument Performances
Olga Slizovskaia
G. Haro
E. Gómez
22
38
0
08 Apr 2020
The State of Lifelong Learning in Service Robots: Current Bottlenecks in
  Object Perception and Manipulation
The State of Lifelong Learning in Service Robots: Current Bottlenecks in Object Perception and Manipulation
S. Kasaei
J. Melsen
Floris van Beers
Christiaan Steenkist
K. Vončina
11
12
0
18 Mar 2020
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
27
251
0
10 Dec 2019
1