Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos

19 October 2019

Papers citing "Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos"

9 / 9 papers shown

Title
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds E. Shaar Ariel Shaulov Gal Chechik Lior Wolf VLM 41 0 0 17 Mar 2025
Towards Open-Vocabulary Audio-Visual Event Localization Jinxing Zhou D. Guo Ruohao Guo Yuxin Mao Jingjing Hu Yiran Zhong Xiaojun Chang M. Wang VLM 46 4 0 18 Nov 2024
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models David Kurzendörfer Otniel-Bogdan Mercea A. Sophia Koepke Zeynep Akata VLM CLIP 26 2 0 09 Apr 2024
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers James Gunn Zygmunt Lenyk Anuj Sharma Andrea Donati Alexandru Buburuzan John Redford Romain Mueller MDE 35 8 0 22 Dec 2023
Temporal and cross-modal attention for audio-visual zero-shot learning Otniel-Bogdan Mercea Thomas Hummel A. Sophia Koepke Zeynep Akata 32 25 0 20 Jul 2022
Learning Speaker-specific Lip-to-Speech Generation Munender Varshney Ravindra Yadav Vinay P. Namboodiri R. Hegde 16 7 0 04 Jun 2022
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention Kranti K. Parida Siddharth Srivastava Gaurav Sharma MDE 31 20 0 15 Nov 2021
CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based Image Retrieval Ushasi Chaudhuri Biplab Banerjee A. Bhattacharya Mihai Datcu 23 29 0 20 Apr 2021
Beyond Image to Depth: Improving Depth Prediction using Echoes Kranti K. Parida Siddharth Srivastava Gaurav Sharma MDE 33 37 0 15 Mar 2021