ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.11760
  4. Cited By
Self-supervised Moving Vehicle Tracking with Stereo Sound

Self-supervised Moving Vehicle Tracking with Stereo Sound

25 October 2019
Chuang Gan
Hang Zhao
Peihao Chen
David D. Cox
Antonio Torralba
ArXiv (abs)PDFHTML

Papers citing "Self-supervised Moving Vehicle Tracking with Stereo Sound"

36 / 86 papers shown
Title
Where and When: Space-Time Attention for Audio-Visual Explanations
Where and When: Space-Time Attention for Audio-Visual Explanations
Yanbei Chen
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
45
3
0
04 May 2021
Exploiting Audio-Visual Consistency with Partial Supervision for Spatial
  Audio Generation
Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
Yan-Bo Lin
Y. Wang
103
21
0
03 May 2021
Visually Guided Sound Source Separation and Localization using
  Self-Supervised Motion Representations
Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations
Lingyu Zhu
Esa Rahtu
81
27
0
17 Apr 2021
Self-supervised object detection from audio-visual correspondence
Self-supervised object detection from audio-visual correspondence
Triantafyllos Afouras
Yuki M. Asano
Francois Fagan
Andrea Vedaldi
Florian Metze
SSL
110
47
0
13 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
49
59
0
13 Apr 2021
Localizing Visual Sounds the Hard Way
Localizing Visual Sounds the Hard Way
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
90
191
0
06 Apr 2021
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound
  Separation
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Yapeng Tian
Di Hu
Chenliang Xu
ObjD
85
88
0
05 Apr 2021
TransCenter: Transformers with Dense Representations for Multiple-Object
  Tracking
TransCenter: Transformers with Dense Representations for Multiple-Object Tracking
Yihong Xu
Yutong Ban
Guillaume Delorme
Chuang Gan
Daniela Rus
Xavier Alameda-Pineda
VOT
94
96
0
28 Mar 2021
Discriminative Semantic Transitive Consistency for Cross-Modal Learning
Discriminative Semantic Transitive Consistency for Cross-Modal Learning
Kranti K. Parida
Gaurav Sharma
65
1
0
25 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection
  and Tracking with Sound by Distilling Multimodal Knowledge
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Francisco Rivera Valverde
Juana Valeria Hurtado
Abhinav Valada
95
73
0
01 Mar 2021
Audio-Visual Floorplan Reconstruction
Audio-Visual Floorplan Reconstruction
Senthil Purushwalkam
S. V. A. Garí
V. Ithapu
Carl Schissler
Philip Robinson
Abhinav Gupta
Kristen Grauman
VGen3DV
148
41
0
31 Dec 2020
RSPNet: Relative Speed Perception for Unsupervised Video Representation
  Learning
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
Peihao Chen
Deng Huang
Dongliang He
Xiang Long
Runhao Zeng
Shilei Wen
Mingkui Tan
Chuang Gan
SSL
73
134
0
27 Oct 2020
Discriminative Sounding Objects Localization via Self-supervised
  Audiovisual Matching
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu
Rui Qian
Minyue Jiang
Xiao Tan
Shilei Wen
Errui Ding
Weiyao Lin
Dejing Dou
80
136
0
12 Oct 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
126
256
0
10 Aug 2020
Location-aware Graph Convolutional Networks for Video Question Answering
Location-aware Graph Convolutional Networks for Video Question Answering
Deng Huang
Peihao Chen
Runhao Zeng
Qing Du
Mingkui Tan
Chuang Gan
GNNBDL
107
175
0
07 Aug 2020
Self-supervised Neural Audio-Visual Sound Source Localization via
  Probabilistic Spatial Modeling
Self-supervised Neural Audio-Visual Sound Source Localization via Probabilistic Spatial Modeling
Yoshiki Masuyama
Yoshiaki Bando
Kohei Yatabe
Y. Sasaki
Masaki Onishi
Yasuhiro Oikawa
SSL
93
14
0
28 Jul 2020
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events
Chuang Gan
Xiaoyu Chen
Phillip Isola
Antonio Torralba
J. Tenenbaum
58
7
0
27 Jul 2020
Foley Music: Learning to Generate Music from Videos
Foley Music: Learning to Generate Music from Videos
Chuang Gan
Deng Huang
Peihao Chen
J. Tenenbaum
Antonio Torralba
VGen
75
139
0
21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating
  Source Separation
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
Hang Zhou
Xudong Xu
Dahua Lin
Xiaogang Wang
Ziwei Liu
DiffM
80
84
0
20 Jul 2020
Improving Object Detection with Selective Self-supervised Self-training
Improving Object Detection with Selective Self-supervised Self-training
Yandong Li
Di Huang
Danfeng Qin
Liqiang Wang
Boqing Gong
115
65
0
17 Jul 2020
Generating Visually Aligned Sound from Videos
Generating Visually Aligned Sound from Videos
Peihao Chen
Yang Zhang
Mingkui Tan
Hongdong Xiao
Deng Huang
Chuang Gan
VGen
114
97
0
14 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
69
157
0
13 Jul 2020
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation
Chuang Gan
Jeremy Schwartz
S. Alter
Damian Mrowca
Martin Schrimpf
...
Antonio Torralba
J. DiCarlo
J. Tenenbaum
Josh H. McDermott
Daniel L. K. Yamins
VGen
175
317
0
09 Jul 2020
Video Playback Rate Perception for Self-supervisedSpatio-Temporal
  Representation Learning
Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning
Yuan Yao
Chang-rui Liu
Dezhao Luo
Yu Zhou
QiXiang Ye
81
170
0
20 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from
  Instructional Videos
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
88
142
0
16 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and
  Sound
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
103
76
0
11 Jun 2020
C-SL: Contrastive Sound Localization with Inertial-Acoustic Sensors
C-SL: Contrastive Sound Localization with Inertial-Acoustic Sensors
Majid Mirbagheri
Bardia Doosti
45
2
0
09 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter
  Network
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
103
23
0
04 Jun 2020
Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions
Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions
Di Hu
Lichao Mou
Qingzhong Wang
Junyu Gao
Yuansheng Hua
Dejing Dou
Xiaoxiang Zhu
65
31
0
14 May 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation
VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDESSL
233
84
0
04 May 2020
Music Gesture for Visual Sound Separation
Music Gesture for Visual Sound Separation
Chuang Gan
Deng Huang
Hang Zhao
J. Tenenbaum
Antonio Torralba
97
205
0
20 Apr 2020
Neural Networks Are More Productive Teachers Than Human Raters: Active
  Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model
Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model
Dongdong Wang
Yandong Li
Liqiang Wang
Boqing Gong
76
49
0
31 Mar 2020
Semantic Object Prediction and Spatial Sound Super-Resolution with
  Binaural Sounds
Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds
A. Vasudevan
Dengxin Dai
Luc Van Gool
ObjD
138
45
0
09 Mar 2020
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
Chuang Gan
Yiwei Zhang
Jiajun Wu
Boqing Gong
J. Tenenbaum
82
139
0
25 Dec 2019
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
87
254
0
10 Dec 2019
How To Train Your Deep Multi-Object Tracker
How To Train Your Deep Multi-Object Tracker
Yihong Xu
Aljosa Osep
Yutong Ban
Radu Horaud
Laura Leal-Taixe
Xavier Alameda-Pineda
VOT
106
192
0
15 Jun 2019
Previous
12