Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations

22 April 2025

Abstract

Understanding the binding specificity between T-cell receptors (TCRs) and peptide-major histocompatibility complexes (pMHCs) is central to immunotherapy and vaccine development. However, current predictive models struggle with generalization, especially in data-scarce settings and when faced with novel epitopes. We present LANTERN (Large lAnguage model-powered TCR-Enhanced Recognition Network), a deep learning framework that combines large-scale protein language models with chemical representations of peptides. By encoding TCR \b{eta}-chain sequences using ESM-1b and transforming peptide sequences into SMILES strings processed by MolFormer, LANTERN captures rich biological and chemical features critical for TCR-peptide recognition. Through extensive benchmarking against existing models such as ChemBERTa, TITAN, and NetTCR, LANTERN demonstrates superior performance, particularly in zero-shot and few-shot learning scenarios. Our model also benefits from a robust negative sampling strategy and shows significant clustering improvements via embedding analysis. These results highlight the potential of LANTERN to advance TCR-pMHC binding prediction and support the development of personalized immunotherapies.

View on arXiv

@article{qi2025_2505.01433,
  title={ Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations },
  author={ Cong Qi and Hanzhang Fang and Siqi jiang and Tianxing Hu and Wei Zhi },
  journal={arXiv preprint arXiv:2505.01433},
  year={ 2025 }
}

Comments on this paper