158
v1v2v3 (latest)

When Routers, Switches and Interconnects Compute: A processing-in-interconnect Paradigm for Scalable Neuromorphic AI

Main:30 Pages
8 Figures
Bibliography:11 Pages
1 Tables
Appendix:21 Pages
Abstract

Routing, switching, and the interconnect fabric are essential components in implementing large-scale neuromorphic computing architectures. While this fabric plays only a supporting role in the process of computing, for large AI workloads, this fabric ultimately determines the overall system's performance, such as energy consumption and speed. In this paper, we offer a potential solution to address this bottleneck by addressing two fundamental questions: (a) What computing paradigms are inherent in existing routing, switching, and interconnect systems, and how can they be used to implement a Processing-in-Interconnect (π2\pi^2) computing paradigm? and (b) How to train π2\pi^2 network on standard AI benchmarks? To address the first question, we demonstrate that all operations required for typical AI workloads can be mapped onto delays, causality, time-outs, packet drops, and broadcast operations, all of which are already implemented in current packet-switching and packet-routing hardware. {We then show that existing buffering and traffic-shaping embedded algorithms can be minimally modified to implement π2\pi^2 neuron models and synaptic operations. To address the second question, we show how a knowledge distillation framework can be used to train and cross-map well-established neural network topologies onto π2\pi^2 architectures without any degradation in the generalization performance. Our analysis show that the effective energy utilization of a π2\pi^2 network is significantly higher than that of other neuromorphic computing platforms; as a result, we believe that the π2\pi^2 paradigm offers a more scalable architectural path toward achieving brain-scale AI inference.

View on arXiv
Comments on this paper