ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.07276
20
60

BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

11 October 2023
Qizhi Pei
Wei Zhang
Jinhua Zhu
Kehan Wu
Kaiyuan Gao
Lijun Wu
Yingce Xia
Rui Yan
ArXivPDFHTML
Abstract

Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery. However, current models exhibit several limitations, such as the generation of invalid molecular SMILES, underutilization of contextual information, and equal treatment of structured and unstructured knowledge. To address these issues, we propose BioT5\mathbf{BioT5}BioT5, a comprehensive pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations. BioT5\mathbf{BioT5}BioT5 utilizes SELFIES for 100100%100 robust molecular representations and extracts knowledge from the surrounding context of bio-entities in unstructured biological literature. Furthermore, BioT5\mathbf{BioT5}BioT5 distinguishes between structured and unstructured knowledge, leading to more effective utilization of information. After fine-tuning, BioT5 shows superior performance across a wide range of tasks, demonstrating its strong capability of capturing underlying relations and properties of bio-entities. Our code is available at \href\href{https://github.com/QizhiPei/BioT5}{Github}\href.

View on arXiv
Comments on this paper