ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.01665
  4. Cited By
Learning to Separate Object Sounds by Watching Unlabeled Video

Learning to Separate Object Sounds by Watching Unlabeled Video

5 April 2018
Ruohan Gao
Rogerio Feris
Kristen Grauman
    SSL
ArXivPDFHTML

Papers citing "Learning to Separate Object Sounds by Watching Unlabeled Video"

28 / 78 papers shown
Title
Telling Left from Right: Learning Spatial Correspondence of Sight and
  Sound
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
24
75
0
11 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter
  Network
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
22
23
0
04 Jun 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation
VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
171
84
0
04 May 2020
Conditioned Source Separation for Music Instrument Performances
Conditioned Source Separation for Music Instrument Performances
Olga Slizovskaia
G. Haro
E. Gómez
30
38
0
08 Apr 2020
The State of Lifelong Learning in Service Robots: Current Bottlenecks in
  Object Perception and Manipulation
The State of Lifelong Learning in Service Robots: Current Bottlenecks in Object Perception and Manipulation
S. Kasaei
J. Melsen
Floris van Beers
Christiaan Steenkist
K. Vončina
29
12
0
18 Mar 2020
Audiovisual SlowFast Networks for Video Recognition
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
207
0
23 Jan 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
Chuang Gan
Yiwei Zhang
Jiajun Wu
Boqing Gong
J. Tenenbaum
24
137
0
25 Dec 2019
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
29
251
0
10 Dec 2019
ClusterFit: Improving Generalization of Visual Representations
ClusterFit: Improving Generalization of Visual Representations
Xueting Yan
Ishan Misra
Abhinav Gupta
Deepti Ghadiyaram
D. Mahajan
SSL
VLM
27
132
0
06 Dec 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and
  Applications
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
33
52
0
20 Nov 2019
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
35
88
0
24 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual
  Zeroshot Classification and Retrieval of Videos
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos
Kranti K. Parida
Neeraj Matiyali
T. Guha
Gaurav Sharma
VLM
35
41
0
19 Oct 2019
Learning to Have an Ear for Face Super-Resolution
Learning to Have an Ear for Face Super-Resolution
Givi Meishvili
Simon Jenni
Paolo Favaro
SupR
CVBM
33
23
0
27 Sep 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event
  Captioning
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
30
78
0
22 Sep 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
35
91
0
30 Aug 2019
Self-supervised audio representation learning for mobile devices
Self-supervised audio representation learning for mobile devices
Marco Tagliasacchi
Beat Gfeller
Félix de Chaumont Quitry
Dominik Roblek
SSL
AI4TS
6
46
0
24 May 2019
Scaling and Benchmarking Self-Supervised Visual Representation Learning
Scaling and Benchmarking Self-Supervised Visual Representation Learning
Priya Goyal
D. Mahajan
Abhinav Gupta
Ishan Misra
SSL
26
396
0
03 May 2019
Co-Separating Sounds of Visual Objects
Co-Separating Sounds of Visual Objects
Ruohan Gao
Kristen Grauman
33
206
0
16 Apr 2019
The Sound of Motions
The Sound of Motions
Hang Zhao
Chuang Gan
Wei-Chiu Ma
Antonio Torralba
17
251
0
11 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
Alex Schwing
Tamir Hazan
27
69
0
11 Apr 2019
2.5D Visual Sound
2.5D Visual Sound
Ruohan Gao
Kristen Grauman
VGen
27
130
0
11 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
36
20
0
07 Dec 2018
The Visual Centrifuge: Model-Free Layered Video Representations
The Visual Centrifuge: Model-Free Layered Video Representations
Jean-Baptiste Alayrac
João Carreira
Andrew Zisserman
23
48
0
04 Dec 2018
Uncertainty aware audiovisual activity recognition using deep Bayesian
  variational inference
Uncertainty aware audiovisual activity recognition using deep Bayesian variational inference
Mahesh Subedar
R. Krishnan
P. López-Meyer
Omesh Tickoo
Jonathan Huang
BDL
EDL
UQCV
29
0
0
27 Nov 2018
Identify, locate and separate: Audio-visual object extraction in large
  video collections using weak supervision
Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision
Sanjeel Parekh
A. Ozerov
S. Essid
Ngoc Q. K. Duong
P. Pérez
G. Richard
28
16
0
09 Nov 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
51
745
0
10 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
53
426
0
23 Mar 2018
Previous
12