Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations

Understanding the binding specificity between T-cell receptors (TCRs) and peptide-major histocompatibility complexes (pMHCs) is central to immunotherapy and vaccine development. However, current predictive models struggle with generalization, especially in data-scarce settings and when faced with novel epitopes. We present LANTERN (Large lAnguage model-powered TCR-Enhanced Recognition Network), a deep learning framework that combines large-scale protein language models with chemical representations of peptides. By encoding TCR \b{eta}-chain sequences using ESM-1b and transforming peptide sequences into SMILES strings processed by MolFormer, LANTERN captures rich biological and chemical features critical for TCR-peptide recognition. Through extensive benchmarking against existing models such as ChemBERTa, TITAN, and NetTCR, LANTERN demonstrates superior performance, particularly in zero-shot and few-shot learning scenarios. Our model also benefits from a robust negative sampling strategy and shows significant clustering improvements via embedding analysis. These results highlight the potential of LANTERN to advance TCR-pMHC binding prediction and support the development of personalized immunotherapies.
View on arXiv@article{qi2025_2505.01433, title={ Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations }, author={ Cong Qi and Hanzhang Fang and Siqi jiang and Tianxing Hu and Wei Zhi }, journal={arXiv preprint arXiv:2505.01433}, year={ 2025 } }