14
0

Aligning Proteins and Language: A Foundation Model for Protein Retrieval

Abstract

This paper aims to retrieve proteins with similar structures and semantics from large-scale protein dataset, facilitating the functional interpretation of protein structures derived by structural determination methods like cryo-Electron Microscopy (cryo-EM). Motivated by the recent progress of vision-language models (VLMs), we propose a CLIP-style framework for aligning 3D protein structures with functional annotations using contrastive learning. For model training, we propose a large-scale dataset of approximately 200,000 protein-caption pairs with rich functional descriptors. We evaluate our model in both in-domain and more challenging cross-database retrieval on Protein Data Bank (PDB) and Electron Microscopy Data Bank (EMDB) dataset, respectively. In both cases, our approach demonstrates promising zero-shot retrieval performance, highlighting the potential of multimodal foundation models for structure-function understanding in protein biology.

View on arXiv
@article{wu2025_2506.08023,
  title={ Aligning Proteins and Language: A Foundation Model for Protein Retrieval },
  author={ Qifeng Wu and Zhengzhe Liu and Han Zhu and Yizhou Zhao and Daisuke Kihara and Min Xu },
  journal={arXiv preprint arXiv:2506.08023},
  year={ 2025 }
}
Comments on this paper