GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression

16 July 2024

Papers citing "GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression"

1 / 1 papers shown

Title
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference You Wu Haoyi Wu Kewei Tu 24 3 0 18 Oct 2024