Objects that Sound

18 December 2017

Papers citing "Objects that Sound"

50 / 116 papers shown

Title
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos Saghir Alfasly Jian Lu C. Xu Yuru Zou 34 18 0 06 Mar 2022
Audio Self-supervised Learning: A Survey Shuo Liu Adria Mallol-Ragolta Emilia Parada-Cabeleiro Kun Qian Xingshuo Jing Alexander Kathan Bin Hu Bjoern W. Schuller SSL 29 106 0 02 Mar 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing Xian Liu Rui Qian Hang Zhou Di Hu Weiyao Lin Ziwei Liu Bolei Zhou Xiaowei Zhou 6 25 0 13 Feb 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection A. Haliassos Rodrigo Mira Stavros Petridis M. Pantic CVBM 27 125 0 18 Jan 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks A. Vasudevan Dengxin Dai Luc Van Gool SSL 31 6 0 04 Jan 2022
Cross Modal Retrieval with Querybank Normalisation Simion-Vlad Bogolin Ioana Croitoru Hailin Jin Yang Liu Samuel Albanie 25 84 0 23 Dec 2021
Class-aware Sounding Objects Localization via Audiovisual Correspondence Di Hu Yake Wei Rui Qian Weiyao Lin Ruihua Song Ji-Rong Wen 24 41 0 22 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading Leyuan Qu C. Weber S. Wermter 28 23 0 09 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval Nina Shvetsova Brian Chen Andrew Rouditchenko Samuel Thomas Brian Kingsbury Rogerio Feris David F. Harwath James R. Glass Hilde Kuehne ViT 23 129 0 08 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video Rishabh Garg Ruohan Gao Kristen Grauman 15 27 0 21 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention Kranti K. Parida Siddharth Srivastava Gaurav Sharma MDE 31 20 0 15 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation Tanzila Rahman Mengyu Yang Leonid Sigal ViT 21 8 0 26 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning Haider Al-Tahan Y. Mohsenzadeh SSL AI4TS 27 0 0 13 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,018 0 13 Oct 2021
$Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos$ Pano-AVQA: Grounded Audio-Visual Question Answering on 360 $^\circ$ Videos Heeseung Yun Youngjae Yu Wonsuk Yang Kangil Lee Gunhee Kim 25 78 0 11 Oct 2021
Audio-to-Image Cross-Modal Generation Maciej Żelaszczyk Jacek Mañdziuk DiffM 46 15 0 27 Sep 2021
Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction Hailong Ning Bin Zhao Zhanxuan Hu Lang He Ercheng Pei 25 10 0 17 Sep 2021
Parsing Birdsong with Deep Audio Embeddings Irina Tolkova Brian Chu Marcel Hedman Stefan Kahl Holger Klinck 31 10 0 20 Aug 2021
Attention Bottlenecks for Multimodal Fusion Arsha Nagrani Shan Yang Anurag Arnab A. Jansen Cordelia Schmid Chen Sun 25 541 0 30 Jun 2021
Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design Yue Cao Payel Das Vijil Chenthamarakshan Pin-Yu Chen Igor Melnyk Yang Shen 21 45 0 24 Jun 2021
Unsupervised Discriminative Learning of Sounds for Audio Event Classification Sascha Hornauer Ke Li Stella X. Yu Shabnam Ghaffarzadegan Liu Ren SSL 21 5 0 19 May 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning Christoph Feichtenhofer Haoqi Fan Bo Xiong Ross B. Girshick Kaiming He SSL AI4TS 23 257 0 29 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios Xudong Xu Hang Zhou Ziwei Liu Bo Dai Xiaogang Wang Dahua Lin DiffM 13 53 0 13 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning Yan-Bo Lin Hung-Yu Tseng Hsin-Ying Lee Yen-Yu Lin Ming-Hsuan Yang SSL 19 34 0 01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning Adrià Recasens Pauline Luc Jean-Baptiste Alayrac Luyu Wang Ross Hemsley ... Florent Altché M. Valko Jean-Bastien Grill Aaron van den Oord Andrew Zisserman SSL AI4TS 23 127 0 30 Mar 2021
Supervised Contrastive Replay: Revisiting the Nearest Class Mean Classifier in Online Class-Incremental Continual Learning Zheda Mai Ruiwen Li Hyunwoo J. Kim Scott Sanner CLL 23 180 0 22 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning Mandela Patrick Yuki M. Asano Bernie Huang Ishan Misra Florian Metze Joao Henriques Andrea Vedaldi AI4TS 16 33 0 18 Mar 2021
Beyond Image to Depth: Improving Depth Prediction using Echoes Kranti K. Parida Siddharth Srivastava Gaurav Sharma MDE 26 37 0 15 Mar 2021
Perceiver: General Perception with Iterative Attention Andrew Jaegle Felix Gimeno Andrew Brock Andrew Zisserman Oriol Vinyals João Carreira VLM ViT MDE 48 973 0 04 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge Francisco Rivera Valverde Juana Valeria Hurtado Abhinav Valada 26 72 0 01 Mar 2021
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction Samyak Jain P. Yarlagadda Shreyank Jyoti Shyamgopal Karthik Subramanian Ramanathan Vineet Gandhi ViT 17 65 0 11 Dec 2020
Learning to dance: A graph convolutional adversarial network to generate realistic dance motions from audio João P. Ferreira Thiago M. Coutinho Thiago L. Gomes J. F. Neto Rafael Azevedo Renato Martins Erickson R. Nascimento GAN 28 68 0 25 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations Linchao Zhu Yi Yang ViT 29 417 0 14 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment Pedro Morgado Yi Li Nuno Vasconcelos SSL 13 121 0 03 Nov 2020
Listening to Sounds of Silence for Speech Denoising Ruilin Xu Rundi Wu Y. Ishiwaka Carl Vondrick Changxi Zheng 15 32 0 22 Oct 2020
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention Bin Duan Hao Tang Wei Wang Ziliang Zong Guowei Yang Yan Yan 25 59 0 14 Aug 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning Ying Cheng Ruize Wang Zhihao Pan Rui Feng Yuejie Zhang SSL 14 106 0 13 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video Triantafyllos Afouras Andrew Owens Joon Son Chung Andrew Zisserman SSL 17 250 0 10 Aug 2020
Multiple Sound Sources Localization from Coarse to Fine Rui Qian Di Hu Heinrich Dinkel Mengyue Wu N. Xu Weiyao Lin 23 153 0 13 Jul 2020
Self-Supervised MultiModal Versatile Networks Jean-Baptiste Alayrac Adrià Recasens R. Schneider Relja Arandjelović Jason Ramapuram J. Fauw Lucas Smaira Sander Dieleman Andrew Zisserman SSL 40 371 0 29 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Andrew Rouditchenko Angie Boggust David F. Harwath Brian Chen D. Joshi ... Rogerio Feris Brian Kingsbury M. Picheny Antonio Torralba James R. Glass SSL 22 141 0 16 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound Karren D. Yang Bryan C. Russell Justin Salamon SSL 16 75 0 11 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network Lingyu Zhu Esa Rahtu 14 23 0 04 Jun 2020
What Makes for Good Views for Contrastive Learning? Yonglong Tian Chen Sun Ben Poole Dilip Krishnan Cordelia Schmid Phillip Isola SSL 11 1,302 0 20 May 2020
Conditioned Source Separation for Music Instrument Performances Olga Slizovskaia G. Haro E. Gómez 22 38 0 08 Apr 2020
Audiovisual SlowFast Networks for Video Recognition Fanyi Xiao Yong Jae Lee Kristen Grauman Jitendra Malik Christoph Feichtenhofer 194 205 0 23 Jan 2020
Deep Audio-Visual Learning: A Survey Hao Zhu Mandi Luo Rui Wang A. Zheng R. He 29 156 0 14 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network A. Tsiami Petros Koutras Petros Maragos 16 73 0 09 Jan 2020
Listen to Look: Action Recognition by Previewing Audio Ruohan Gao Tae-Hyun Oh Kristen Grauman Lorenzo Torresani VLM 27 251 0 10 Dec 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications Arda Senocak Tae-Hyun Oh Junsik Kim Ming-Hsuan Yang In So Kweon SSL 19 52 0 20 Nov 2019