ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.17887
72
2

A Parallel Scan Algorithm in the Tensor Core Unit Model

26 November 2024
Anastasios Zouzias
William F. McColl
    LRM
ArXiv (abs)PDFHTML
Abstract

We present a parallel scan (prefix sum) algorithm in the Tensor Core Unit (TCU) model of computation. The TCU model assumes that multiplication between two square matrices of constant size sss is a basic operation. In the (s2,ℓ)(s^2, \ell)(s2,ℓ)-TCU model, we show that for inputs of size nnn, the algorithm has depth at most 2⌊log⁡s(n)⌋2\lfloor \log_s (n)\rfloor2⌊logs​(n)⌋ and runs in O(n(1+ℓ/s2)/p+(s2+ℓ)log⁡s(n))O(n(1 + \ell /s^2)/p + (s^2 + \ell) \log_s (n))O(n(1+ℓ/s2)/p+(s2+ℓ)logs​(n)) time assuming ppp tensor core units. Equivalently, the algorithm performs O(n/s2)O(n/s^2)O(n/s2) multiplications of square matrices of size s.

View on arXiv
Comments on this paper