RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence ModelingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025 |
AutoMixer: Checkpoint Artifacts as Automatic Data MixersAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding HelpsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling LawAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Just Go Parallel: Improving the Multilingual Capabilities of Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-ExpertsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |