ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.11760
  4. Cited By
Self-supervised Moving Vehicle Tracking with Stereo Sound

Self-supervised Moving Vehicle Tracking with Stereo Sound

25 October 2019
Chuang Gan
Hang Zhao
Peihao Chen
David D. Cox
Antonio Torralba
ArXiv (abs)PDFHTML

Papers citing "Self-supervised Moving Vehicle Tracking with Stereo Sound"

50 / 86 papers shown
Title
Audio and Multiscale Visual Cues Driven Cross-modal Transformer for Idling Vehicle Detection
Audio and Multiscale Visual Cues Driven Cross-modal Transformer for Idling Vehicle Detection
Xiwen Li
Ross T. Whitaker
Tolga Tasdizen
58
0
0
15 Apr 2025
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal
  Latent Alignment
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Kim Sung-Bin
Arda Senocak
Hyunwoo Ha
Tae-Hyun Oh
DiffM
214
0
0
09 Dec 2024
AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness
AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness
Yizhuo Yang
Shenghai Yuan
Muqing Cao
Jianfei Yang
Lihua Xie
257
9
0
11 Nov 2024
Joint Audio-Visual Idling Vehicle Detection with Streamlined Input
  Dependencies
Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies
Xiwen Li
Rehman Mohammed
Tristalee Mangin
Surojit Saha
Ross T. Whitaker
Kerry E Kelly
Tolga Tasdizen
82
5
0
28 Oct 2024
Enhancing Sound Source Localization via False Negative Elimination
Enhancing Sound Source Localization via False Negative Elimination
Zengjie Song
Jiangshe Zhang
Yuxi Wang
Junsong Fan
Zhaoxiang Zhang
90
0
0
29 Aug 2024
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Jie Yin
Andrew F. Luo
Yilun Du
A. Cherian
Tim K. Marks
Jonathan Le Roux
Chuang Gan
85
0
0
16 Jul 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
141
6
0
06 Jun 2024
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in
  3D World
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Yining Hong
Zishuo Zheng
Peihao Chen
Yian Wang
Junyan Li
Chuang Gan
90
37
0
16 Jan 2024
Leveraging Visual Supervision for Array-based Active Speaker Detection
  and Localization
Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization
Davide Berghi
Philip J. B. Jackson
64
5
0
21 Dec 2023
Segment Beyond View: Handling Partially Missing Modality for
  Audio-Visual Semantic Segmentation
Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
Renjie Wu
Hu Wang
Feras Dayoub
Hsiang-Ting Chen
68
5
0
14 Dec 2023
SoundCam: A Dataset for Finding Humans Using Room Acoustics
SoundCam: A Dataset for Finding Humans Using Room Acoustics
Mason Wang
Samuel Clarke
Jui-Hsien Wang
Ruohan Gao
Jiajun Wu
75
7
0
06 Nov 2023
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
Yuxin Ye
Wenming Yang
Yapeng Tian
69
10
0
31 Oct 2023
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Jinzheng Zhao
Yong-mei Xu
Xinyuan Qian
Davide Berghi
Peipei Wu
Meng Cui
Jianyuan Sun
Philip J. B. Jackson
Wenwu Wang
BDL
128
7
0
23 Oct 2023
Two vs. Four-Channel Sound Event Localization and Detection
Two vs. Four-Channel Sound Event Localization and Detection
J. Wilkins
Magdalena Fuentes
Luca Bondi
Shabnam Ghaffarzadegan
A. Abavisani
J. P. Bello
110
1
0
23 Sep 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric
  Videos
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSLEgoV
103
4
0
10 Jul 2023
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised
  Audio-Visual Video Parsing
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing
Jie Fu
Junyu Gao
Changsheng Xu
114
9
0
05 Jul 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
DiffMVGen
86
39
0
30 Mar 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Kun Su
Kaizhi Qian
Eli Shlizerman
Antonio Torralba
Chuang Gan
VGenAI4CE
82
20
0
29 Mar 2023
EgoDistill: Egocentric Head Motion Distillation for Efficient Video
  Understanding
EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding
Shuhan Tan
Tushar Nagarajan
Kristen Grauman
94
22
0
05 Jan 2023
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
117
0
0
05 Dec 2022
Leveraging the Video-level Semantic Consistency of Event for
  Audio-visual Event Localization
Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization
Yuanyuan Jiang
Jianqin Yin
Yonghao Dang
68
6
0
11 Oct 2022
Pay Self-Attention to Audio-Visual Navigation
Pay Self-Attention to Audio-Visual Navigation
Yinfeng Yu
Lele Cao
Gang Hua
Xiaohong Liu
Liejun Wang
70
4
0
04 Oct 2022
Image Understands Point Cloud: Weakly Supervised 3D Semantic
  Segmentation via Association Learning
Image Understands Point Cloud: Weakly Supervised 3D Semantic Segmentation via Association Learning
Tianfang Sun
Zhizhong Zhang
Xin Tan
Yanyun Qu
Yuan Xie
Lizhuang Ma
3DPC
118
12
0
16 Sep 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
135
55
0
20 Aug 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
137
20
0
07 Jul 2022
Learning Music-Dance Representations through Explicit-Implicit Rhythm
  Synchronization
Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization
Jiashuo Yu
Junfu Pu
Ying Cheng
Rui Feng
Ying Shan
55
5
0
07 Jul 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
107
28
0
20 Jun 2022
Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual
  Correspondence
Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual Correspondence
Mohammed Alloulah
Maximilian Arnold
SSL
86
2
0
13 Jun 2022
Self-supervised Learning of Audio Representations from Audio-Visual Data
  using Spatial Alignment
Self-supervised Learning of Audio Representations from Audio-Visual Data using Spatial Alignment
Shanshan Wang
Archontis Politis
A. Mesaros
Tuomas Virtanen
SSL
42
7
0
02 Jun 2022
Sound Localization by Self-Supervised Time Delay Estimation
Sound Localization by Self-Supervised Time Delay Estimation
Ziyang Chen
David Fouhey
Andrew Owens
SSL
92
19
0
26 Apr 2022
Self-Supervised Predictive Learning: A Negative-Free Method for Sound
  Source Localization in Visual Scenes
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Zengjie Song
Yuxi Wang
Junsong Fan
Tieniu Tan
Zhaoxiang Zhang
SSL
69
43
0
25 Mar 2022
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture
  Generation
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
Xian Liu
Qianyi Wu
Hang Zhou
Yinghao Xu
Rui Qian
Xinyi Lin
Xiaowei Zhou
Wayne Wu
Bo Dai
Bolei Zhou
SLR
108
105
0
24 Mar 2022
Localizing Visual Sounds the Easy Way
Localizing Visual Sounds the Easy Way
Shentong Mo
Pedro Morgado
165
81
0
17 Mar 2022
Visually Supervised Speaker Detection and Localization via Microphone
  Array
Visually Supervised Speaker Detection and Localization via Microphone Array
Davide Berghi
A. Hilton
Philip J. B. Jackson
61
11
0
07 Mar 2022
Sound Adversarial Audio-Visual Navigation
Sound Adversarial Audio-Visual Navigation
Yinfeng Yu
Wenbing Huang
Gang Hua
Changan Chen
Yikai Wang
Xiaohong Liu
AAML
71
29
0
22 Feb 2022
Self-Supervised Moving Vehicle Detection from Audio-Visual Cues
Self-Supervised Moving Vehicle Detection from Audio-Visual Cues
Jannik Zürn
Wolfram Burgard
SSL
87
8
0
30 Jan 2022
WebUAV-3M: A Benchmark for Unveiling the Power of Million-Scale Deep UAV
  Tracking
WebUAV-3M: A Benchmark for Unveiling the Power of Million-Scale Deep UAV Tracking
Chunhui Zhang
Guanjie Huang
Li Liu
Shan Huang
Yinan Yang
Xiang Wan
Shiming Ge
Dacheng Tao
157
24
0
19 Jan 2022
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Hao Jiang
Calvin Murdock
V. Ithapu
EgoV
89
41
0
06 Jan 2022
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
72
42
0
22 Dec 2021
Audio-Visual Synchronisation in the wild
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
119
40
0
08 Dec 2021
NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of
  3D Scenes
NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes
Suhani Vora
Noha Radwan
Klaus Greff
H. Meyer
Kyle Genova
Mehdi S. M. Sajjadi
Etienne Pot
Andrea Tagliasacchi
Daniel Duckworth
152
127
0
25 Nov 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual
  Event Localization and Video Parsing
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Jiashuo Yu
Ying Cheng
Ruiwei Zhao
Rui Feng
Yuejie Zhang
94
61
0
24 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Structure from Silence: Learning Scene Structure from Ambient Sound
Ziyang Chen
Xixi Hu
Andrew Owens
112
26
0
10 Nov 2021
Space-Time Memory Network for Sounding Object Localization in Videos
Space-Time Memory Network for Sounding Object Localization in Videos
Sizhe Li
Yapeng Tian
Chenliang Xu
51
10
0
10 Nov 2021
Audio-visual Representation Learning for Anomaly Events Detection in
  Crowds
Audio-visual Representation Learning for Anomaly Events Detection in Crowds
Junyuan Gao
Maoguo Gong
Xuelong Li
99
24
0
28 Oct 2021
Learning 3D Semantic Segmentation with only 2D Image Supervision
Learning 3D Semantic Segmentation with only 2D Image Supervision
Kyle Genova
Xiaoqi Yin
Abhijit Kundu
C. Pantofaru
Forrester Cole
Avneesh Sud
B. Brewington
B. Shucker
Thomas Funkhouser
3DPC
60
81
0
21 Oct 2021
MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared
  Person Re-Identification
MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification
Yajun Gao
Tengfei Liang
Yi Jin
Xiaoyan Gu
Wu Liu
Yidong Li
Congyan Lang
CVBM
80
60
0
21 Oct 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$
  Videos
Pano-AVQA: Grounded Audio-Visual Question Answering on 360∘^\circ∘ Videos
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
100
86
0
11 Oct 2021
Core Challenges in Embodied Vision-Language Planning
Core Challenges in Embodied Vision-Language Planning
Jonathan M Francis
Nariaki Kitamura
Felix Labelle
Xiaopeng Lu
Ingrid Navarro
Jean Oh
LM&Ro
144
48
0
26 Jun 2021
Improving Ultrasound Tongue Image Reconstruction from Lip Images Using
  Self-supervised Learning and Attention Mechanism
Improving Ultrasound Tongue Image Reconstruction from Lip Images Using Self-supervised Learning and Attention Mechanism
Haiyang Liu
Jihang Zhang
48
4
0
20 Jun 2021
12
Next