ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.03641
  4. Cited By
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
v1v2 (latest)

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

10 April 2018
Andrew Owens
Alexei A. Efros
    SSL
ArXiv (abs)PDFHTML

Papers citing "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"

50 / 491 papers shown
Title
Audio-driven Talking Face Generation with Stabilized Synchronization
  Loss
Audio-driven Talking Face Generation with Stabilized Synchronization LossEuropean Conference on Computer Vision (ECCV), 2023
Dogucan Yaman
Fevziye Irem Eyiokur
Leonard Barmann
H. K. Ekenel
Alexander Waibel
CVBM
364
10
0
18 Jul 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric
  Videos
Learning Spatial Features from Audio-Visual Correspondence in Egocentric VideosComputer Vision and Pattern Recognition (CVPR), 2023
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSLEgoV
305
8
0
10 Jul 2023
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised
  Audio-Visual Video Parsing
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing
Jie Fu
Junyu Gao
Changsheng Xu
231
17
0
05 Jul 2023
Visually-Guided Sound Source Separation with Audio-Visual Predictive
  Coding
Visually-Guided Sound Source Separation with Audio-Visual Predictive CodingIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Zengjie Song
Zhaoxiang Zhang
159
5
0
19 Jun 2023
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes
  with Spatiotemporal Annotations of Sound Events
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound EventsNeural Information Processing Systems (NeurIPS), 2023
Kazuki Shimada
Archontis Politis
Parthasaarathy Sudarsanam
D. Krause
Kengo Uchida
...
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Maria Sandsten
Yuki Mitsufuji
234
84
0
15 Jun 2023
Video-to-Music Recommendation using Temporal Alignment of Segments
Video-to-Music Recommendation using Temporal Alignment of SegmentsIEEE transactions on multimedia (IEEE TMM), 2023
Laure Prétet
G. Richard
Clement Souchier
Geoffroy Peeters
AI4TS
133
19
0
12 Jun 2023
Learning Fine-grained View-Invariant Representations from Unpaired
  Ego-Exo Videos via Temporal Alignment
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal AlignmentNeural Information Processing Systems (NeurIPS), 2023
Zihui Xue
Kristen Grauman
EgoV
234
47
0
08 Jun 2023
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real
  Objects
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real ObjectsComputer Vision and Pattern Recognition (CVPR), 2023
Ruohan Gao
Yiming Dou
Hao Li
Tanmay Agarwal
Jeannette Bohg
Yunzhu Li
Li Fei-Fei
Jiajun Wu
138
50
0
01 Jun 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household
  Agents that See and Hear
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and HearIEEE International Conference on Robotics and Automation (ICRA), 2023
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
301
14
0
01 Jun 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event
  Parser
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event ParserNeural Information Processing Systems (NeurIPS), 2023
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
212
22
0
27 May 2023
Real-Time Idling Vehicles Detection using Combined Audio-Visual Deep
  Learning
Real-Time Idling Vehicles Detection using Combined Audio-Visual Deep Learning
Xiwen Li
Tristalee Mangin
Surojit Saha
Evan K. Blanchard
Di Tang
Henry Poppe
Nathan Searle
Ouk Choi
Kerry E Kelly
Ross T. Whitaker
138
9
0
23 May 2023
Annotation-free Audio-Visual Segmentation
Annotation-free Audio-Visual SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya Zhang
Weidi Xie
VOSVLM
348
46
0
18 May 2023
How does Contrastive Learning Organize Images?
How does Contrastive Learning Organize Images?
Yunzhe Zhang
Yao Lu
Qi Xuan
SSL
140
2
0
17 May 2023
ImageBind: One Embedding Space To Bind Them All
ImageBind: One Embedding Space To Bind Them AllComputer Vision and Pattern Recognition (CVPR), 2023
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
512
1,278
0
09 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze
  Anticipation
Listen to Look into the Future: Audio-Visual Egocentric Gaze AnticipationEuropean Conference on Computer Vision (ECCV), 2023
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
323
16
0
06 May 2023
Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based
  Action Recognition
Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based Action RecognitionImage and Vision Computing (IVC), 2023
Sergio Valcarcel Macua
Yongqiang Tang
Zhizhong Zhang
Wensheng Zhang
227
14
0
03 May 2023
Conditional Generation of Audio from Video via Foley Analogies
Conditional Generation of Audio from Video via Foley AnalogiesComputer Vision and Pattern Recognition (CVPR), 2023
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
177
58
0
17 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation LearningComputer Vision and Pattern Recognition (CVPR), 2023
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
370
3
0
12 Apr 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
299
82
0
31 Mar 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Sound to Visual Scene Generation by Audio-to-Visual Latent AlignmentComputer Vision and Pattern Recognition (CVPR), 2023
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
DiffMVGen
184
52
0
30 Mar 2023
Egocentric Auditory Attention Localization in Conversations
Egocentric Auditory Attention Localization in ConversationsComputer Vision and Pattern Recognition (CVPR), 2023
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
219
23
0
28 Mar 2023
Joint fMRI Decoding and Encoding with Latent Embedding Alignment
Joint fMRI Decoding and Encoding with Latent Embedding Alignment
Xuelin Qian
Yikai Wang
Yanwei Fu
Xinwei Sun
Xiangyang Xue
Jianfeng Feng
160
8
0
26 Mar 2023
ViPFormer: Efficient Vision-and-Pointcloud Transformer for Unsupervised
  Pointcloud Understanding
ViPFormer: Efficient Vision-and-Pointcloud Transformer for Unsupervised Pointcloud UnderstandingIEEE International Conference on Robotics and Automation (ICRA), 2023
Hongyu Sun
Yongcai Wang
Xudong Cai
Xuewei Bai
Deying Li
ViT3DPC
259
8
0
25 Mar 2023
Egocentric Audio-Visual Object Localization
Egocentric Audio-Visual Object LocalizationComputer Vision and Pattern Recognition (CVPR), 2023
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
171
44
0
23 Mar 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale
  Benchmark and Baseline
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and BaselineComputer Vision and Pattern Recognition (CVPR), 2023
Tiantian Geng
Teng Wang
Yanfu Zhang
Runmin Cong
Feng Zheng
181
59
0
22 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual
  Transformers
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedImViT
127
2
0
21 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
412
66
0
21 Mar 2023
Sound Localization from Motion: Jointly Learning Sound Direction and
  Camera Rotation
Sound Localization from Motion: Jointly Learning Sound Direction and Camera RotationIEEE International Conference on Computer Vision (ICCV), 2023
Ziyang Chen
Shengyi Qian
Andrew Owens
224
18
0
20 Mar 2023
A Light Weight Model for Active Speaker Detection
A Light Weight Model for Active Speaker DetectionComputer Vision and Pattern Recognition (CVPR), 2023
Junhua Liao
Haihan Duan
Kanghui Feng
Wanbing Zhao
Yanbing Yang
Liangyin Chen
200
61
0
08 Mar 2023
Audio-Visual Contrastive Learning with Temporal Self-Supervision
Audio-Visual Contrastive Learning with Temporal Self-SupervisionAAAI Conference on Artificial Intelligence (AAAI), 2023
Simon Jenni
Alexander Black
John Collomosse
SSL
182
24
0
15 Feb 2023
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
  Synthesis
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene SynthesisNeural Information Processing Systems (NeurIPS), 2023
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
VGen
329
58
0
04 Feb 2023
Neural Target Speech Extraction: An Overview
Neural Target Speech Extraction: An OverviewIEEE Signal Processing Magazine (IEEE Signal Process. Mag.), 2023
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
180
133
0
31 Jan 2023
Audio-Visual Segmentation with Semantics
Audio-Visual Segmentation with SemanticsInternational Journal of Computer Vision (IJCV), 2023
Jinxing Zhou
Xuyang Shen
Jianyuan Wang
Jiayi Zhang
Weixuan Sun
...
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
164
72
0
30 Jan 2023
Skeleton-based Action Recognition through Contrasting Two-Stream
  Spatial-Temporal Networks
Skeleton-based Action Recognition through Contrasting Two-Stream Spatial-Temporal NetworksIEEE transactions on multimedia (IEEE TMM), 2023
Chen Pang
Xuequan Lu
Lei Lyu
233
32
0
27 Jan 2023
Zorro: the masked multimodal transformer
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
207
23
0
23 Jan 2023
Novel-View Acoustic Synthesis
Novel-View Acoustic SynthesisComputer Vision and Pattern Recognition (CVPR), 2023
Changan Chen
Alexander Richard
Roman Shapovalov
V. Ithapu
Natalia Neverova
Kristen Grauman
Andrea Vedaldi
198
45
0
20 Jan 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
LoCoNet: Long-Short Context Network for Active Speaker DetectionComputer Vision and Pattern Recognition (CVPR), 2023
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
200
28
0
19 Jan 2023
EXIF as Language: Learning Cross-Modal Associations Between Images and
  Camera Metadata
EXIF as Language: Learning Cross-Modal Associations Between Images and Camera MetadataComputer Vision and Pattern Recognition (CVPR), 2023
Chenhao Zheng
Ayush Shrivastava
Andrew Owens
VLM
327
22
0
11 Jan 2023
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Self-Supervised Video Forensics by Audio-Visual Anomaly DetectionComputer Vision and Pattern Recognition (CVPR), 2023
Chao Feng
Ziyang Chen
Andrew Owens
224
108
0
04 Jan 2023
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video LearnersNeural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
286
73
0
15 Dec 2022
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Vision Transformers are Parameter-Efficient Audio-Visual LearnersComputer Vision and Pattern Recognition (CVPR), 2022
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
288
106
0
15 Dec 2022
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled
  Videos
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled VideosInternational Conference on Learning Representations (ICLR), 2022
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLMCLIP
245
35
0
14 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked AutoencodersIEEE International Conference on Computer Vision (ICCV), 2022
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
278
56
0
09 Dec 2022
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Motion and Context-Aware Audio-Visual Conditioned Video PredictionBritish Machine Vision Conference (BMVC), 2022
Yating Xu
Conghui Hu
G. Lee
VGen
345
1
0
09 Dec 2022
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
See, Hear, and Feel: Smart Sensory Fusion for Robotic ManipulationConference on Robot Learning (CoRL), 2022
Hao Li
Yizhi Zhang
Junzhe Zhu
Shaoxiong Wang
Michelle A. Lee
Huazhe Xu
Edward H. Adelson
Li Fei-Fei
Ruohan Gao
Jiajun Wu
175
88
0
07 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
iQuery: Instruments as Queries for Audio-Visual Sound SeparationComputer Vision and Pattern Recognition (CVPR), 2022
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
267
38
0
07 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active
  Speaker Detection
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker DetectionIEEE Open Journal of Signal Processing (JOSP), 2022
Rahul Sharma
Shrikanth Narayanan
197
11
0
01 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in MixturesComputer Vision and Pattern Recognition (CVPR), 2022
Xixi Hu
Ziyang Chen
Andrew Owens
181
65
0
28 Nov 2022
Touch and Go: Learning from Human-Collected Vision and Touch
Touch and Go: Learning from Human-Collected Vision and TouchNeural Information Processing Systems (NeurIPS), 2022
Fengyu Yang
Chenyang Ma
Jiacheng Zhang
Jing Zhu
Wenzhen Yuan
Andrew Owens
212
89
0
22 Nov 2022
Unifying Tracking and Image-Video Object Detection
Unifying Tracking and Image-Video Object Detection
Peirong Liu
Rui Wang
Pengchuan Zhang
Omid Poursaeed
Yipin Zhou
Xuefei Cao
Sreya . Dutta Roy
Ashish Shah
Ser-Nam Lim
169
0
0
20 Nov 2022
Previous
123456...8910
Next