
Title |
|---|
![]() Key, Value, Compress: A Systematic Exploration of KV Cache Compression TechniquesIEEE Custom Integrated Circuits Conference (CICC), 2025 |
![]() LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational ComplexityComputer Vision and Pattern Recognition (CVPR), 2024 |
![]() Hymba: A Hybrid-head Architecture for Small Language ModelsInternational Conference on Learning Representations (ICLR), 2024 |
![]() Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient AttentionsInternational Conference on Learning Representations (ICLR), 2024 |
![]() How to Train Long-Context Language Models (Effectively)Annual Meeting of the Association for Computational Linguistics (ACL), 2024 |
![]() Gated Slot Attention for Efficient Linear-Time Sequence ModelingNeural Information Processing Systems (NeurIPS), 2024 |
![]() Kraken: Inherently Parallel Transformers For Efficient Multi-Device
InferenceNeural Information Processing Systems (NeurIPS), 2024 |