Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
50 / 1,283 papers shown
Simplified and Generalized Masked Diffusion for Discrete Data
Neural Information Processing Systems (NeurIPS), 2024
Jiaxin Shi
Kehang Han
Zehao Wang
Arnaud Doucet
Michalis K. Titsias
DiffM
611
289
0
17 Jan 2025
Likelihood Training of Cascaded Diffusion Models via Hierarchical Volume-preserving Maps
International Conference on Learning Representations (ICLR), 2025
Henry Li
Ronen Basri
Y. Kluger
DiffM
459
3
0
13 Jan 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
787
30
0
11 Jan 2025
Hidden Entity Detection from GitHub Leveraging Large Language Models
Lu Gan
Martin Blum
Danilo Dessi
Brigitte Mathiak
Ralf Schenkel
Stefan Dietze
229
2
0
08 Jan 2025
Powerful Design of Small Vision Transformer on CIFAR10
Gent Wu
ViT
252
2
0
07 Jan 2025
Single-Channel Distance-Based Source Separation for Mobile GPU in Outdoor and Indoor Environments
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Hanbin Bae
Byungjun Kang
Jiwon Kim
Jaeyong Hwang
Hosang Sung
Hoon-Young Cho
3DV
216
0
0
06 Jan 2025
Foundations of GenIR
Jiaxin Mao
Jingtao Zhan
Wenshu Fan
267
0
0
06 Jan 2025
A Study on Context Length and Efficient Transformers for Biomedical Image Analysis
Sarah M. Hooper
Hui Xue
ViT
MedIm
51
0
0
03 Jan 2025
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu
Meng Chen
Baotong Lu
Huiqiang Jiang
Zhenhua Han
...
Jianchao Tan
Chong Chen
Fan Yang
Yue Yang
Lili Qiu
539
81
0
03 Jan 2025
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhisong Zhang
Yan Wang
Xinting Huang
Tianqing Fang
Han Zhang
Chenlong Deng
Shuaiyi Li
Dong Yu
365
15
0
21 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
378
5
0
13 Dec 2024
Non-Normal Diffusion Models
Henry Li
VLM
DiffM
266
1
0
10 Dec 2024
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong
Zhuoming Liu
Yin Li
Liwei Wang
426
21
0
04 Dec 2024
Knowledge-Enhanced Conversational Recommendation via Transformer-based Sequential Modelling
Jie Zou
Aixin Sun
Cheng Long
Evangelos Kanoulas
LMTD
460
9
0
03 Dec 2024
TSUBF-Net: Trans-Spatial UNet-like Network with Bi-direction Fusion for Segmentation of Adenoid Hypertrophy in CT
Rulin Zhou
Yingjie Feng
Guankun Wang
Xiaopin Zhong
Zongze Wu
Qiang Wu
Xi Zhang
MedIm
178
1
0
01 Dec 2024
Rank It, Then Ask It: Input Reranking for Maximizing the Performance of LLMs on Symmetric Tasks
Mohsen Dehghankar
Abolfazl Asudeh
239
1
0
30 Nov 2024
Does Self-Attention Need Separate Weights in Transformers?
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Md. Kowsher
Nusrat Jahan Prottasha
Chun-Nam Yu
O. Garibay
Niloofar Yousefi
1.1K
3
0
30 Nov 2024
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training
Kaustubh Ponkshe
Venkatapathy Subramanian
Natwar Modani
Ganesh Ramakrishnan
218
0
0
25 Nov 2024
Selective Attention: Enhancing Transformer through Principled Context Control
Neural Information Processing Systems (NeurIPS), 2024
Xuechen Zhang
Xiangyu Chang
Mingchen Li
Amit K. Roy-Chowdhury
Jiasi Chen
Samet Oymak
260
10
0
19 Nov 2024
Squeezed Attention: Accelerating Long Context Length LLM Inference
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Monishwaran Maheswaran
June Paik
Michael W. Mahoney
Kemal Kurniawan
Amir Gholami
Amir Gholami
607
32
0
14 Nov 2024
TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models
International Workshop on Information Forensics and Security (WIFS), 2024
Matheus Simão
Fabiano Prado
Omar Abdul Wahab
Anderson Avila
118
0
0
11 Nov 2024
SPARTAN: A Sparse Transformer World Model Attending to What Matters
Anson Lei
Bernhard Schölkopf
Ingmar Posner
CML
523
6
0
11 Nov 2024
EviRerank: Adaptive Evidence Construction for Long-Document LLM Reranking
Minghan Li
Eric Gaussier
Juntao Li
Guodong Zhou
ALM
211
0
0
09 Nov 2024
Reducing Distraction in Long-Context Language Models by Focused Learning
Zijun Wu
Bingyuan Liu
Ran Yan
Lei Chen
Thomas Delteil
RALM
190
9
0
08 Nov 2024
k
k
k
NN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Themistoklis Haris
289
0
0
06 Nov 2024
LiVOS: Light Video Object Segmentation with Gated Linear Matching
Computer Vision and Pattern Recognition (CVPR), 2024
Qin Liu
Jianfeng Wang
Zhiyong Yang
Linjie Li
Kevin Qinghong Lin
Marc Niethammer
Lijuan Wang
VOS
278
4
0
05 Nov 2024
The Evolution of RWKV: Advancements in Efficient Language Modeling
Akul Datta
VLM
188
1
0
05 Nov 2024
LASER: Attention with Exponential Transformation
Sai Surya Duvvuri
Inderjit Dhillon
180
2
0
05 Nov 2024
Training Compute-Optimal Protein Language Models
bioRxiv (bioRxiv), 2024
Xingyi Cheng
Bo Chen
Pan Li
Jing Gong
Jie Tang
Le Song
312
29
0
04 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
529
6
0
02 Nov 2024
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
Tianyi Zhang
B. Li
Jae-sun Seo
Yu Cao
176
1
0
31 Oct 2024
ALISE: Accelerating Large Language Model Serving with Speculative Scheduling
International Conference on Computer Aided Design (ICCAD), 2024
Youpeng Zhao
Jun Wang
173
2
0
31 Oct 2024
BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference
Junqi Zhao
Zhijin Fang
Shu Li
Shaohui Yang
Shichao He
222
5
0
30 Oct 2024
Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning
Haitz Sáez de Ocáriz Borde
Artem Lukoianov
Anastasis Kratsios
Michael M. Bronstein
Xiaowen Dong
GNN
255
4
0
29 Oct 2024
Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments
Yuzhe Yang
Yipeng Du
Ahmad Farhan
Claudio Angione
Yue Zhao
Harry Yang
Fielding Johnston
James Buban
Patrick Colangelo
296
0
0
28 Oct 2024
Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Aosong Feng
Rex Ying
Leandros Tassiulas
247
3
0
28 Oct 2024
The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI
Fulu Li
96
0
0
24 Oct 2024
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
188
6
0
24 Oct 2024
TabDPT: Scaling Tabular Foundation Models on Real Data
Junwei Ma
Valentin Thomas
Rasa Hosseinzadeh
Hamidreza Kamkari
Alex Labach
Jesse C. Cresswell
Keyvan Golestan
Guangwei Yu
Anthony L. Caterini
M. Volkovs
LMTD
493
8
0
23 Oct 2024
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zhenpeng Su
Xing Wu
Zijia Lin
Yizhe Xiong
Minxuan Lv
Guangyuan Ma
Hui Chen
Songlin Hu
Guiguang Ding
MoE
593
5
0
21 Oct 2024
HyQE: Ranking Contexts with Hypothetical Query Embeddings
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Weichao Zhou
Jiaxin Zhang
Hilaf Hasson
Anu Singh
Wenchao Li
RALM
177
6
0
20 Oct 2024
MoDification: Mixture of Depths Made Easy
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
C. Zhang
M. Zhong
Qimeng Wang
Xuantao Lu
Zheyu Ye
...
Yan Gao
Yao Hu
Kehai Chen
Min Zhang
Dawei Song
VLM
MoE
204
2
0
18 Oct 2024
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Neural Information Processing Systems (NeurIPS), 2024
Honglin Li
Yunlong Zhang
Pingyi Chen
Honglin Li
Chenglu Zhu
Lin Yang
MedIm
295
12
0
18 Oct 2024
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Yizhao Gao
Zhichen Zeng
Dayou Du
Shijie Cao
Hayden Kwok-Hay So
...
Junjie Lai
Mao Yang
Ting Cao
Fan Yang
M. Yang
553
69
0
17 Oct 2024
Prompt Compression for Large Language Models: A Survey
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zongqian Li
Yinhong Liu
Yixuan Su
Nigel Collier
MQ
309
42
0
16 Oct 2024
In-context KV-Cache Eviction for LLMs via Attention-Gate
Zihao Zeng
Bokai Lin
Tianqi Hou
Hao Zhang
Zhijie Deng
310
8
0
15 Oct 2024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Mu Cai
Reuben Tan
Jianrui Zhang
Bocheng Zou
Kai Zhang
...
Yao Dou
J. Park
Jianfeng Gao
Yong Jae Lee
Jianwei Yang
271
155
0
14 Oct 2024
Towards Better Multi-head Attention via Channel-wise Sample Permutation
Shen Yuan
Hongteng Xu
260
2
0
14 Oct 2024
ChakmaNMT: Machine Translation for a Low-Resource and Endangered Language via Transliteration
Aunabil Chakma
Aditya Chakma
Soham Khisa
Chumui Tripura
Masum Hasan
Rifat Shahriyar
110
3
0
14 Oct 2024
ChuLo: Chunk-Level Key Information Representation for Long Document Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yan Li
Soyeon Caren Han
Yue Dai
Feiqi Cao
454
1
0
14 Oct 2024
Previous
1
2
3
4
5
6
...
24
25
26
Next