Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2407.08608
Cited By

FlashAttention-3: Fast and Accurate Attention with Asynchrony and
Low-precision

v1v2 (latest)

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

11 July 2024

Ganesh Bikshandi

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (23064★)

Papers citing "FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision"

50 / 136 papers shown

Fast LLM Post-training via Decoupled and Fastest-of-N Speculation

Fast LLM Post-training via Decoupled and Fastest-of-N Speculation

...

525

0

0

24 Dec 2025

RELIC: Interactive Video World Model with Long-Horizon Memory

RELIC: Interactive Video World Model with Long-Horizon Memory

...

Kalyan Sunkavalli

421

24

0

03 Dec 2025

PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

206

2

0

03 Dec 2025

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

333

1

0

28 Nov 2025

IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference

IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference

374

1

0

26 Nov 2025

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

...

137

9

0

25 Nov 2025

Block Cascading: Training Free Acceleration of Block-Causal Video Models

Block Cascading: Training Free Acceleration of Block-Causal Video Models

Hmrishav Bandyopadhyay

Nikhil Pinnaparaju

202

2

0

25 Nov 2025

HunyuanVideo 1.5 Technical Report

HunyuanVideo 1.5 Technical Report

...

465

44

0

24 Nov 2025

NeAR: Coupled Neural Asset-Renderer Stack

NeAR: Coupled Neural Asset-Renderer Stack

...

228

0

0

23 Nov 2025

AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

Nandita Vijaykumar

163

3

0

19 Nov 2025

Global Cross-Time Attention Fusion for Enhanced Solar Flare Prediction from Multivariate Time Series

Global Cross-Time Attention Fusion for Enhanced Solar Flare Prediction from Multivariate Time Series

S. F. Boubrahimi

194

0

0

17 Nov 2025

MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity

MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity

Vladimír Macko

Vladimír Boža

167

3

0

17 Nov 2025

LEMUR: Large scale End-to-end MUltimodal Recommendation

LEMUR: Large scale End-to-end MUltimodal RecommendationComputers & graphics (CG), 2024

...

286

6

0

14 Nov 2025

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

250

7

0

11 Nov 2025

TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task

TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task

Gizem Gümüşçekiçci

135

0

0

10 Nov 2025

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

213

7

0

09 Nov 2025

Hilbert-Guided Sparse Local Attention

Hilbert-Guided Sparse Local Attention

175

0

0

08 Nov 2025

Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving

Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving

298

2

0

08 Nov 2025

Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation

Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation

David Ryckelynck

Yannick Tillier

Etienne Decencière

425

2

0

07 Nov 2025

DuetServe: Harmonizing Prefill and Decode for LLM Serving via Adaptive GPU Multiplexing

DuetServe: Harmonizing Prefill and Decode for LLM Serving via Adaptive GPU Multiplexing

Hossein Entezari Zarch

126

1

0

06 Nov 2025

Flashlight: PyTorch Compiler Extensions to Accelerate Attention Variants

Flashlight: PyTorch Compiler Extensions to Accelerate Attention Variants

Zelal Su "Lain" Mustafaoglu

Angélica Moreira

Roshan Dathathri

243

0

0

03 Nov 2025

MotionStream: Real-Time Video Generation with Interactive Motion Controls

MotionStream: Real-Time Video Generation with Interactive Motion Controls

489

33

0

03 Nov 2025

Optimizing Attention on GPUs by Exploiting GPU Architectural NUMA Effects

Optimizing Attention on GPUs by Exploiting GPU Architectural NUMA Effects

Mansi Choudhary

Karthik Sangaiah

103

0

0

03 Nov 2025

Democratizing LLM Efficiency: From Hyperscale Optimizations to Universal Deployability

Democratizing LLM Efficiency: From Hyperscale Optimizations to Universal Deployability

128

0

0

03 Nov 2025

Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse

Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse

...

Xiaojiang Zhang

218

2

0

01 Nov 2025

SpecAttn: Speculating Sparse Attention

SpecAttn: Speculating Sparse Attention

164

0

0

31 Oct 2025

Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving

Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving

158

0

0

30 Oct 2025

Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling

Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling

302

7

0

28 Oct 2025

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

211

16

0

23 Oct 2025

Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous References

Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous References

Alexander Collins

Bastian Hagedorn

Evghenii Gaburov

...

175

3

0

16 Oct 2025

video-SALMONN S: Memory-Enhanced Streaming Audio-Visual LLM

video-SALMONN S: Memory-Enhanced Streaming Audio-Visual LLM

127

1

0

13 Oct 2025

MIRAGE: Runtime Scheduling for Multi-Vector Image Retrieval with Hierarchical Decomposition

MIRAGE: Runtime Scheduling for Multi-Vector Image Retrieval with Hierarchical Decomposition

Xiang Chen

179

2

0

10 Oct 2025

From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill

From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill

164

1

0

09 Oct 2025

Vectorized FlashAttention with Low-cost Exponential Computation in RISC-V Vector Processors

Vectorized FlashAttention with Low-cost Exponential Computation in RISC-V Vector Processors

Vasileios Titopoulos

K. Alexandridis

G. Dimitrakopoulos

156

0

0

08 Oct 2025

The Anatomy of a Triton Attention Kernel

The Anatomy of a Triton Attention Kernel

Burkhard Ringlein

Jan van Lunteren

118

2

0

07 Oct 2025

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

Abedelkadir Asi

282

6

0

06 Oct 2025

The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures

The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures

Alexander Fichtl

175

0

0

06 Oct 2025

Emergent Coordination in Multi-Agent Language Models

Emergent Coordination in Multi-Agent Language Models

Christoph Riedl

181

1

0

05 Oct 2025

Accelerating Attention with Basis Decomposition

Accelerating Attention with Basis Decomposition

198

0

0

02 Oct 2025

Litespark Technical Report: High-Throughput, Energy-Efficient LLM Training Framework

Litespark Technical Report: High-Throughput, Energy-Efficient LLM Training Framework

Nii Osae Osae Dade

Moinul Hossain Rahat

171

0

0

02 Oct 2025

A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration

A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration

Pratik Chaudhari

159

0

0

29 Sep 2025

UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation

UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation

...

227

8

0

29 Sep 2025

Pretraining Large Language Models with NVFP4

Pretraining Large Language Models with NVFP4

Felix Abecassis

...

Zhongbo Zhu

394

27

0

29 Sep 2025

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

...

Joseph E. Gonzalez

233

22

0

28 Sep 2025

Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment

Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment

275

0

0

24 Sep 2025

Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute

Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute

Fiodar Kazhamiaka

Melanie Nakagawa

Ricardo Bianchini

200

5

0

24 Sep 2025

Mamba Modulation: On the Length Generalization of Mamba

Mamba Modulation: On the Length Generalization of Mamba

Philippe Langlais

374

0

0

23 Sep 2025

CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure

CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure

222

2

0

23 Sep 2025

Patent Language Model Pretraining with ModernBERT

Patent Language Model Pretraining with ModernBERT

Amirhossein Yousefiramandi

400

2

0

18 Sep 2025

When Inverse Data Outperforms: Exploring the Pitfalls of Mixed Data in Multi-Stage Fine-Tuning

When Inverse Data Outperforms: Exploring the Pitfalls of Mixed Data in Multi-Stage Fine-Tuning

185

0

0

16 Sep 2025

Page 1 of 3