Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.10667
Cited By
Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping
19 September 2023
Subash Khanal
S. Sastry
A. Dhakal
Nathan Jacobs
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping"
12 / 12 papers shown
Title
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Chenkai Zhang
Yiming Lei
Zeming Liu
Qingjie Liu
Y. Wang
42
0
0
28 Mar 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Begum Demir
Ioannis Papoutsis
VLM
81
0
0
13 Feb 2025
TaxaBind: A Unified Embedding Space for Ecological Applications
S. Sastry
Subash Khanal
A. Dhakal
Adeel Ahmad
Nathan Jacobs
55
6
0
01 Nov 2024
GEOBIND: Binding Text, Image, and Audio through Satellite Images
A. Dhakal
Subash Khanal
S. Sastry
Adeel Ahmad
Nathan Jacobs
30
2
0
17 Apr 2024
GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis
S. Sastry
Subash Khanal
A. Dhakal
Nathan Jacobs
47
6
0
09 Apr 2024
Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment
Angelos Zavras
Dimitrios Michail
Begum Demir
Ioannis Papoutsis
VLM
17
11
0
15 Feb 2024
Mission Critical -- Satellite Data is a Distinct Modality in Machine Learning
Esther Rolf
Konstantin Klemmer
Caleb Robinson
Hannah Kerner
24
35
0
02 Feb 2024
BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
S. Sastry
Subash Khanal
A. Dhakal
Di Huang
Nathan Jacobs
38
9
0
29 Oct 2023
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
Colorado Reed
Ritwik Gupta
Shufan Li
S. Brockman
Christopher Funk
Brian Clipp
Kurt Keutzer
Salvatore Candido
M. Uyttendaele
Trevor Darrell
113
165
0
30 Dec 2022
Audio Retrieval with WavText5K and CLAP Training
Soham Deshmukh
Benjamin Elizalde
Huaming Wang
3DV
CLIP
113
50
0
28 Sep 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
114
262
0
02 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
1