Generating Long Sequences with Sparse Transformers

23 April 2019

Papers citing "Generating Long Sequences with Sparse Transformers"

50 / 1,283 papers shown

Simplified and Generalized Masked Diffusion for Discrete DataNeural Information Processing Systems (NeurIPS), 2024

611

289

17 Jan 2025

Likelihood Training of Cascaded Diffusion Models via Hierarchical Volume-preserving MapsInternational Conference on Learning Representations (ICLR), 2025

459

13 Jan 2025

Tensor Product Attention Is All You Need

787

11 Jan 2025

Hidden Entity Detection from GitHub Leveraging Large Language Models

229

08 Jan 2025

Powerful Design of Small Vision Transformer on CIFAR10

Gent Wu

ViT

252

07 Jan 2025

Single-Channel Distance-Based Source Separation for Mobile GPU in Outdoor and Indoor EnvironmentsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

216

06 Jan 2025

267

06 Jan 2025

A Study on Context Length and Efficient Transformers for Biomedical Image Analysis

Sarah M. Hooper

Hui Xue

ViT MedIm

03 Jan 2025

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

...

539

03 Jan 2025

Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

365

21 Dec 2024

Advances in Transformers for Robotic Applications: A Review

Nikunj Sanghai

Nik Bear Brown

AI4CE

378

13 Dec 2024

Non-Normal Diffusion Models

Henry Li

VLM DiffM

266

10 Dec 2024

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning

426

04 Dec 2024

Knowledge-Enhanced Conversational Recommendation via Transformer-based Sequential Modelling

460

03 Dec 2024

TSUBF-Net: Trans-Spatial UNet-like Network with Bi-direction Fusion for Segmentation of Adenoid Hypertrophy in CT

178

01 Dec 2024

Rank It, Then Ask It: Input Reranking for Maximizing the Performance of LLMs on Symmetric Tasks

Mohsen Dehghankar

Abolfazl Asudeh

239

30 Nov 2024

Does Self-Attention Need Separate Weights in Transformers?North American Chapter of the Association for Computational Linguistics (NAACL), 2024

Md. Kowsher

Nusrat Jahan Prottasha

Chun-Nam Yu

O. Garibay

Niloofar Yousefi

1.1K

30 Nov 2024

StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training

Kaustubh Ponkshe

Venkatapathy Subramanian

Natwar Modani

Ganesh Ramakrishnan

218

25 Nov 2024

Selective Attention: Enhancing Transformer through Principled Context ControlNeural Information Processing Systems (NeurIPS), 2024

Xuechen Zhang

Xiangyu Chang

Mingchen Li

Amit K. Roy-Chowdhury

Jiasi Chen

Samet Oymak

260

19 Nov 2024

Squeezed Attention: Accelerating Long Context Length LLM InferenceAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Coleman Hooper

Sehoon Kim

Hiva Mohammadzadeh

Monishwaran Maheswaran

607

14 Nov 2024

TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language ModelsInternational Workshop on Information Forensics and Security (WIFS), 2024

118

11 Nov 2024

SPARTAN: A Sparse Transformer World Model Attending to What Matters

523

11 Nov 2024

EviRerank: Adaptive Evidence Construction for Long-Document LLM Reranking

211

09 Nov 2024

Reducing Distraction in Long-Context Language Models by Focused Learning

190

08 Nov 2024

k

NN Attention Demystified: A Theoretical Exploration for Scalable Transformers

Themistoklis Haris

289

06 Nov 2024

LiVOS: Light Video Object Segmentation with Gated Linear MatchingComputer Vision and Pattern Recognition (CVPR), 2024

278

05 Nov 2024

The Evolution of RWKV: Advancements in Efficient Language Modeling

Akul Datta

VLM

188

05 Nov 2024

LASER: Attention with Exponential Transformation

Sai Surya Duvvuri

Inderjit Dhillon

180

05 Nov 2024

Training Compute-Optimal Protein Language ModelsbioRxiv (bioRxiv), 2024

312

04 Nov 2024

Music Foundation Model as Generic Booster for Music Downstream Tasks

...

529

02 Nov 2024

Context-Aware Token Selection and Packing for Enhanced Vision Transformer

Tianyi Zhang

B. Li

Jae-sun Seo

Yu Cao

176

31 Oct 2024

ALISE: Accelerating Large Language Model Serving with Speculative SchedulingInternational Conference on Computer Aided Design (ICCAD), 2024

Youpeng Zhao

Jun Wang

173

31 Oct 2024

BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference

222

30 Oct 2024

Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning

Haitz Sáez de Ocáriz Borde

255

29 Oct 2024

Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments

296

28 Oct 2024

Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Aosong Feng

Rex Ying

Leandros Tassiulas

247

28 Oct 2024

The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI

Fulu Li

24 Oct 2024

Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation

188

24 Oct 2024

TabDPT: Scaling Tabular Foundation Models on Real Data

493

23 Oct 2024

CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-ExpertsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

593

21 Oct 2024

HyQE: Ranking Contexts with Hypothetical Query EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

177

20 Oct 2024

MoDification: Mixture of Depths Made EasyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

...

Min Zhang

204

18 Oct 2024

Rethinking Transformer for Long Contextual Histopathology Whole Slide Image AnalysisNeural Information Processing Systems (NeurIPS), 2024

Lin Yang

295

18 Oct 2024

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

...

553

17 Oct 2024

Prompt Compression for Large Language Models: A SurveyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Zongqian Li

Yinhong Liu

Yixuan Su

Nigel Collier

309

16 Oct 2024

In-context KV-Cache Eviction for LLMs via Attention-Gate

Tianqi Hou

310

15 Oct 2024

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

...

Jianfeng Gao

271

155

14 Oct 2024

Towards Better Multi-head Attention via Channel-wise Sample Permutation

Shen Yuan

Hongteng Xu

260

14 Oct 2024

ChakmaNMT: Machine Translation for a Low-Resource and Endangered Language via Transliteration

110

14 Oct 2024

ChuLo: Chunk-Level Key Information Representation for Long Document UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

454

14 Oct 2024