When Routers, Switches and Interconnects Compute: A processing-in-interconnect Paradigm for Scalable Neuromorphic AI
Routing, switching, and the interconnect fabric are essential components in implementing large-scale neuromorphic computing architectures. While this fabric plays only a supporting role in the process of computing, for large AI workloads, this fabric ultimately determines the overall system's performance, such as energy consumption and speed. In this paper, we offer a potential solution to address this bottleneck by addressing two fundamental questions: (a) What computing paradigms are inherent in existing routing, switching, and interconnect systems, and how can they be used to implement a Processing-in-Interconnect () computing paradigm? and (b) How to train network on standard AI benchmarks? To address the first question, we demonstrate that all operations required for typical AI workloads can be mapped onto delays, causality, time-outs, packet drops, and broadcast operations, all of which are already implemented in current packet-switching and packet-routing hardware. {We then show that existing buffering and traffic-shaping embedded algorithms can be minimally modified to implement neuron models and synaptic operations. To address the second question, we show how a knowledge distillation framework can be used to train and cross-map well-established neural network topologies onto architectures without any degradation in the generalization performance. Our analysis show that the effective energy utilization of a network is significantly higher than that of other neuromorphic computing platforms; as a result, we believe that the paradigm offers a more scalable architectural path toward achieving brain-scale AI inference.
View on arXiv