Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.03641
Cited By
v1
v2 (latest)
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
10 April 2018
Andrew Owens
Alexei A. Efros
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"
50 / 491 papers shown
Title
Space-Time Memory Network for Sounding Object Localization in Videos
British Machine Vision Conference (BMVC), 2021
Sizhe Li
Yapeng Tian
Chenliang Xu
119
12
0
10 Nov 2021
Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos
Minglang Qiao
Yufan Liu
Mai Xu
Xin Deng
Bing Li
Weiming Hu
Ali Borji
CVBM
110
5
0
05 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Tanzila Rahman
Mengyu Yang
Leonid Sigal
ViT
130
8
0
26 Oct 2021
Self-Supervised Visual Representation Learning Using Lightweight Architectures
Prathamesh Sonawane
Sparsh Drolia
Saqib Nizam Shamsi
Bhargav Jain
SSL
102
1
0
21 Oct 2021
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
229
48
0
19 Oct 2021
Self-Supervised Representation Learning: Introduction, Advances and Challenges
Linus Ericsson
Henry Gouk
Chen Change Loy
Timothy M. Hospedales
SSL
OOD
AI4TS
198
338
0
18 Oct 2021
HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive Media
Anargyros Chatzitofis
Leonidas Saroglou
Prodromos Boutis
Petros Drakoulis
N. Zioulis
...
C. Charbonnier
Pablo César
D. Zarpalas
Stefanos D. Kollias
P. Daras
3DH
176
57
0
14 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
139
0
0
13 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
876
1,442
0
13 Oct 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360
∘
^\circ
∘
Videos
IEEE International Conference on Computer Vision (ICCV), 2021
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
279
105
0
11 Oct 2021
3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple Objects from Video
Justin Wilson
Ming-Chia Lin
84
1
0
05 Oct 2021
Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
Zixu Zhao
Yueming Jin
Pheng-Ann Heng
SSL
165
22
0
28 Sep 2021
Click-through Rate Prediction with Auto-Quantized Contrastive Learning
Yujie Pan
Jiangchao Yao
Bo Han
Kunyang Jia
Ya Zhang
Hongxia Yang
MQ
161
19
0
27 Sep 2021
Visual Scene Graphs for Audio Source Separation
IEEE International Conference on Computer Vision (ICCV), 2021
Moitreya Chatterjee
Jonathan Le Roux
Narendra Ahuja
A. Cherian
192
40
0
24 Sep 2021
V-SlowFast Network for Efficient Visual Sound Separation
Xiangjie Sui
Esa Rahtu
222
12
0
18 Sep 2021
Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers
Neural Information Processing Systems (NeurIPS), 2021
Nikita Dvornik
Isma Hadji
Konstantinos G. Derpanis
Animesh Garg
Allan D. Jepson
143
62
0
26 Aug 2021
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning
IEEE International Conference on Computer Vision (ICCV), 2021
Wei Feng
Yuanjiang Wang
Lihua Ma
Ye Yuan
Fangqiu Yi
SSL
120
13
0
24 Aug 2021
Exploring Data Aggregation and Transformations to Generalize across Visual Domains
Antono DÍnnocente
OOD
150
0
0
20 Aug 2021
Learning to Cut by Watching Movies
IEEE International Conference on Computer Vision (ICCV), 2021
Alejandro Pardo
Fabian Caba Heilbron
Juan Carlos León Alcázar
Ali K. Thabet
Guohao Li
VGen
313
24
0
09 Aug 2021
The Right to Talk: An Audio-Visual Transformer Approach
IEEE International Conference on Computer Vision (ICCV), 2021
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
198
38
0
06 Aug 2021
Self-Supervised Multi-Modal Alignment for Whole Body Medical Imaging
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2021
Rhydian Windsor
A. Jamaludin
T. Kadir
Andrew Zisserman
190
18
0
14 Jul 2021
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection
ACM Multimedia (ACM MM), 2021
Ruijie Tao
Zexu Pan
Rohan Kumar Das
Xinyuan Qian
Mike Zheng Shou
Haizhou Li
182
216
0
14 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Neural Information Processing Systems (NeurIPS), 2021
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
528
688
0
30 Jun 2021
Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization
VISIGRAPP (VISIGRAPP), 2021
Anurag Bagchi
Jazib Mahmood
Dolton Fernandes
Ravi Kiran Sarvadevabhatla
351
32
0
27 Jun 2021
Saying the Unseen: Video Descriptions via Dialog Agents
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
170
8
0
26 Jun 2021
Improving Multi-Modal Learning with Uni-Modal Teachers
Chenzhuang Du
Tingle Li
Yichen Liu
Zixin Wen
Tianyu Hua
Yue Wang
Hang Zhao
107
65
0
21 Jun 2021
Improving On-Screen Sound Separation for Open-Domain Videos with Audio-Visual Self-Attention
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
VLM
227
8
0
17 Jun 2021
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
Maja Pantic
SSL
132
58
0
16 Jun 2021
Watching Too Much Television is Good: Self-Supervised Audio-Visual Representation Learning from Movies and TV Shows
Mahdi M. Kalayeh
Nagendra Kamath
Lingyi Liu
Ashok Chandrashekar
SSL
97
3
0
16 Jun 2021
Seeing Through Clouds in Satellite Images
Mingmin Zhao
Peder Olsen
Ranveer Chandra
103
32
0
15 Jun 2021
Learning Audio-Visual Dereverberation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Changan Chen
Wei-Ju Sun
David Harwath
Kristen Grauman
191
35
0
14 Jun 2021
Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning
Shaobo Min
Jingdong Sun
Hongtao Xie
Chuang Gan
Yongdong Zhang
Jingdong Wang
SSL
131
6
0
13 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
258
11
0
12 Jun 2021
Cross-Domain First Person Audio-Visual Action Recognition through Relative Norm Alignment
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
195
13
0
03 Jun 2021
APES: Audiovisual Person Search in Untrimmed Video
Juan Carlos León Alcázar
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbeláez
Guohao Li
Fabian Caba Heilbron
104
6
0
03 Jun 2021
Dual Normalization Multitasking for Audio-Visual Sounding Object Localization
Tokuhiro Nishikawa
Daiki Shimada
Jerry Jun Yokono
82
0
0
01 Jun 2021
VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Srijan Das
Rui Dai
Di Yang
Francois Bremond
ViT
282
82
0
17 May 2021
Move2Hear: Active Audio-Visual Source Separation
IEEE International Conference on Computer Vision (ICCV), 2021
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
161
47
0
15 May 2021
Home Action Genome: Cooperative Compositional Action Understanding
Computer Vision and Pattern Recognition (CVPR), 2021
Nishant Rai
Haofeng Chen
Jingwei Ji
Rishi Desai
Kazuki Kozuka
Shun Ishizaka
Ehsan Adeli
Juan Carlos Niebles
91
85
0
11 May 2021
Representation Learning via Global Temporal Alignment and Cycle-Consistency
Computer Vision and Pattern Recognition (CVPR), 2021
Isma Hadji
Konstantinos G. Derpanis
Allan D. Jepson
AI4TS
239
61
0
11 May 2021
Motion-Augmented Self-Training for Video Recognition at Smaller Scale
IEEE International Conference on Computer Vision (ICCV), 2021
Kirill Gavrilyuk
Mihir Jain
I. Karmanov
Cees G. M. Snoek
127
24
0
04 May 2021
Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
AAAI Conference on Artificial Intelligence (AAAI), 2021
Yan-Bo Lin
Y. Wang
192
22
0
03 May 2021
Points2Sound: From mono to binaural audio using 3D point cloud scenes
EURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process), 2021
Francesc Lluís
V. Chatziioannou
A. Hofmann
3DPC
285
7
0
26 Apr 2021
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
Computer Vision and Pattern Recognition (CVPR), 2021
Hang Zhou
Yasheng Sun
Wayne Wu
Chen Change Loy
Xiaogang Wang
Ziwei Liu
CVBM
287
426
0
22 Apr 2021
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Computer Vision and Pattern Recognition (CVPR), 2021
Yanbei Chen
Yongqin Xian
A. Sophia Koepke
Ying Shan
Zeynep Akata
256
95
0
22 Apr 2021
A cappella: Audio-visual Singing Voice Separation
British Machine Vision Conference (BMVC), 2021
Juan F. Montesinos
V. S. Kadandale
G. Haro
300
20
0
20 Apr 2021
Detector-Free Weakly Supervised Grounding by Separation
IEEE International Conference on Computer Vision (ICCV), 2021
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
162
31
0
20 Apr 2021
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition
Zejia Weng
Zuxuan Wu
Hengduo Li
Yue Yu
Yu-Gang Jiang
240
5
0
20 Apr 2021
Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Xiangjie Sui
Esa Rahtu
140
30
0
17 Apr 2021
Self-supervised object detection from audio-visual correspondence
Computer Vision and Pattern Recognition (CVPR), 2021
Triantafyllos Afouras
Yuki M. Asano
Francois Fagan
Andrea Vedaldi
Florian Metze
SSL
303
53
0
13 Apr 2021
Previous
1
2
3
...
10
5
6
7
8
9
Next