v1v2v3 (latest)

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection

Neural Information Processing Systems (NeurIPS), 2020

22 March 2020

Jiwei Li

Papers citing "SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection"

16 / 16 papers shown

BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers

Patrik Okanovic

Sameer Deshmukh

Grzegorz Kwa'sniewski

...

249

03 Jul 2025

Model Compression and Efficient Inference for Large Language Models: A Survey

378

15 Feb 2024

ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and RoadmapsReliability Engineering & System Safety (Reliab. Eng. Syst. Saf.), 2023

444

109

10 May 2023

Museformer: Transformer with Fine- and Coarse-Grained Attention for Music GenerationNeural Information Processing Systems (NeurIPS), 2022

Xu Tan

346

19 Oct 2022

CAB: Comprehensive Attention Benchmarking on Long Sequence ModelingInternational Conference on Machine Learning (ICML), 2022

689

14 Oct 2022

Hierarchical Graph Transformer with Adaptive Node SamplingNeural Information Processing Systems (NeurIPS), 2022

Zaixin Zhang

Qi Liu

Qingyong Hu

Cheekong Lee

361

128

08 Oct 2022

Sparse Attentive Memory Network for Click-through Rate Prediction with Long SequencesInternational Conference on Information and Knowledge Management (CIKM), 2022

240

08 Aug 2022

Attention Mechanism in Neural Networks: Where it Comes and Where it Goes

Derya Soydaner

3DV

358

340

27 Apr 2022

Faster Nearest Neighbor Machine Translation

Jiwei Li

194

15 Dec 2021

GNN-LM: Language Modeling based on Global Contexts via GNN

Jiwei Li

618

17 Oct 2021

Layer-wise Model Pruning based on Mutual InformationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Jiwei Li

182

28 Aug 2021

A Survey of TransformersAI Open (AO), 2021

Tianyang Lin

Yuxin Wang

Xiangyang Liu

Xipeng Qiu

ViT

651

1,457

08 Jun 2021

Fast Nearest Neighbor Machine TranslationFindings (Findings), 2021

Jiwei Li

358

30 May 2021

Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries

Jiwei Li

289

14 Oct 2020

O(n)

Connections are Expressive Enough: Universal Approximability of Sparse Transformers

Srinadh Bhojanapalli

Sanjiv Kumar

274

08 Jun 2020

An Attentive Survey of Attention Models

597

745

05 Apr 2019