Benchmarking the Energy Savings with Speculative Decoding Strategies
Rohit Dutta
Paramita Koley
Soham Poddar
Janardan Misra
Sanjay Podder
Naveen Balani
Saptarshi Ghosh
Niloy Ganguly
Main:5 Pages
3 Figures
Bibliography:2 Pages
10 Tables
Appendix:5 Pages
Abstract
Speculative decoding has emerged as an effective method to reduce latency and inference cost of LLM inferences. However, there has been inadequate attention towards the energy requirements of these models. To address this gap, this paper presents a comprehensive survey of energy requirements of speculative decoding strategies, with detailed analysis on how various factors -- model size and family, speculative decoding strategies, and dataset characteristics -- influence the energy optimizations.
View on arXivComments on this paper
