Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques

15 July 2025

Yiqi Liu

ArXiv (abs)PDF HTML Github (381★)

Main:12 Pages

28 Figures

Bibliography:3 Pages

2 Tables

Appendix:1 Pages

Abstract

To meet the increasing demand of deep learning (DL) models, AI chips are employing both off-chip memory (e.g., HBM) and high-bandwidth low-latency interconnect for direct inter-core data exchange. However, it is not easy to explore the efficiency of these inter-core connected AI (ICCA) chips, due to a fundamental tussle among compute (per-core execution), communication (inter-core data exchange), and I/O (off-chip data access).

View on arXiv

Comments on this paper