The design of protein sequences with desired functionalities is a fundamental task in protein engineering. Deep generative methods, such as autoregressive models and diffusion models, have greatly accelerated the discovery of novel protein sequences. However, these methods mainly focus on local or shallow residual semantics and suffer from low inference efficiency, large modeling space and high training cost. To address these challenges, we introduce ProtFlow, a fast flow matching-based protein sequence design framework that operates on embeddings derived from semantically meaningful latent space of protein language models. By compressing and smoothing the latent space, ProtFlow enhances performance while training on limited computational resources. Leveraging reflow techniques, ProtFlow enables high-quality single-step sequence generation. Additionally, we develop a joint design pipeline for the design scene of multichain proteins. We evaluate ProtFlow across diverse protein design tasks, including general peptides and long-chain proteins, antimicrobial peptides, and antibodies. Experimental results demonstrate that ProtFlow outperforms task-specific methods in these applications, underscoring its potential and broad applicability in computational protein sequence design and analysis.
View on arXiv@article{kong2025_2504.10983, title={ ProtFlow: Fast Protein Sequence Design via Flow Matching on Compressed Protein Language Model Embeddings }, author={ Zitai Kong and Yiheng Zhu and Yinlong Xu and Hanjing Zhou and Mingzhe Yin and Jialu Wu and Hongxia Xu and Chang-Yu Hsieh and Tingjun Hou and Jian Wu }, journal={arXiv preprint arXiv:2504.10983}, year={ 2025 } }