Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping

Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping

19 September 2023

Papers citing "Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping"

12 / 12 papers shown

Title
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality Ziyue Huang Hongxi Yan Qiqi Zhan Shuai Yang Mingming Zhang Chenkai Zhang Yiming Lei Zeming Liu Qingjie Liu Y. Wang 42 0 0 28 Mar 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis Angelos Zavras Dimitrios Michail Xiao Xiang Zhu Begum Demir Ioannis Papoutsis VLM 81 0 0 13 Feb 2025
TaxaBind: A Unified Embedding Space for Ecological Applications S. Sastry Subash Khanal A. Dhakal Adeel Ahmad Nathan Jacobs 53 6 0 01 Nov 2024
GEOBIND: Binding Text, Image, and Audio through Satellite Images A. Dhakal Subash Khanal S. Sastry Adeel Ahmad Nathan Jacobs 28 2 0 17 Apr 2024
GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis S. Sastry Subash Khanal A. Dhakal Nathan Jacobs 44 6 0 09 Apr 2024
Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment Angelos Zavras Dimitrios Michail Begum Demir Ioannis Papoutsis VLM 17 11 0 15 Feb 2024
Mission Critical -- Satellite Data is a Distinct Modality in Machine Learning Esther Rolf Konstantin Klemmer Caleb Robinson Hannah Kerner 21 35 0 02 Feb 2024
BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping S. Sastry Subash Khanal A. Dhakal Di Huang Nathan Jacobs 38 9 0 29 Oct 2023
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning Colorado Reed Ritwik Gupta Shufan Li S. Brockman Christopher Funk Brian Clipp Kurt Keutzer Salvatore Candido M. Uyttendaele Trevor Darrell 113 165 0 30 Dec 2022
Audio Retrieval with WavText5K and CLAP Training Soham Deshmukh Benjamin Elizalde Huaming Wang 3DV CLIP 113 50 0 28 Sep 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection Ke Chen Xingjian Du Bilei Zhu Zejun Ma Taylor Berg-Kirkpatrick Shlomo Dubnov ViT 114 262 0 02 Feb 2022
Masked Autoencoders Are Scalable Vision Learners Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross B. Girshick ViT TPM 258 7,337 0 11 Nov 2021