Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.15443
Cited By
When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models
24 February 2025
Weilan Wang
Yu Mao
Dongdong Tang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models"
1 / 1 papers shown
Title
FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference
Hongchao Du
Shangyu Wu
Arina Kharlamova
Nan Guan
Chun Jason Xue
49
1
0
04 Mar 2025
1