Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Home
Papers
1901.02860
Cited By
v1
v2
v3 (latest)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"
50 / 2,022 papers shown
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
Ziqin Zhou
Yifan Yang
Yue Yang
Tianyu He
Houwen Peng
Kai Qiu
Qi Dai
Lili Qiu
Chong Luo
Lingqiao Liu
DiffM
VGen
175
5
0
14 Mar 2025
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Mari Ashiga
Wei Jie
Fan Wu
Vardan K. Voskanyan
Fateme Dinmohammadi
P. Brookes
Jingzhi Gong
Zheng Wang
331
8
0
13 Mar 2025
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
International Conference on Learning Representations (ICLR), 2025
Marianne Arriola
Aaron Gokaslan
Justin T Chiu
Zhihan Yang
Zhixuan Qi
Jiaqi Han
Subham Sekhar Sahoo
Volodymyr Kuleshov
DiffM
624
140
0
12 Mar 2025
Open-World Skill Discovery from Unsegmented Demonstrations
Jingwen Deng
Zihao Wang
Shaofei Cai
Hoang Trung-Dung
Yitao Liang
223
3
0
11 Mar 2025
Context-aware Biases for Length Extrapolation
Ali Veisi
Hamidreza Amirzadeh
Amir Mansourian
563
2
0
11 Mar 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
847
4
0
10 Mar 2025
Learning Transformer-based World Models with Contrastive Predictive Coding
International Conference on Learning Representations (ICLR), 2025
Maxime Burchi
Radu Timofte
355
11
0
06 Mar 2025
L
2
^2
2
M: Mutual Information Scaling Law for Long-Context Language Modeling
Zhuo Chen
Oriol Mayné i Comas
Zhuotao Jin
Di Luo
Marin Soljacic
305
5
0
06 Mar 2025
ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment
Shaofei Cai
Zhancun Mu
Hoang Trung-Dung
Yitao Liang
292
8
0
04 Mar 2025
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Yujiao Yang
Jing Lian
Linhui Li
MoE
353
0
0
04 Mar 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
International Conference on Learning Representations (ICLR), 2025
Laziz U. Abdullaev
Tan M. Nguyen
556
4
0
02 Mar 2025
Revisiting Kernel Attention with Correlated Gaussian Process Representation
Conference on Uncertainty in Artificial Intelligence (UAI), 2025
Long Minh Bui
Tho Tran Huu
Duy-Tung Dinh
T. Nguyen
Trong Nghia Hoang
362
5
0
27 Feb 2025
Sliding Window Attention Training for Efficient Large Language Models
Zichuan Fu
Wentao Song
Longji Xu
X. Wu
Yefeng Zheng
Yingying Zhang
Derong Xu
Xuetao Wei
Tong Xu
Xiangyu Zhao
468
8
0
26 Feb 2025
How Vital is the Jurisprudential Relevance: Law Article Intervened Legal Case Retrieval and Matching
Nuo Xu
Peijie Wang
Zi Liang
Junzhou Zhao
X. Guan
AILaw
268
0
0
25 Feb 2025
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
Interspeech (Interspeech), 2024
Khanh Le
Duc Thanh Chau
AI4TS
271
2
0
24 Feb 2025
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
237
6
0
24 Feb 2025
Enhancing RWKV-based Language Models for Long-Sequence Text Generation
Xinghan Pan
332
1
0
21 Feb 2025
RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention
Pattern Recognition (Pattern Recogn.), 2024
Bochao Zou
Zizheng Guo
Jiansheng Chen
Junbao Zhuo
Weiran Huang
Huimin Ma
ViT
AI4TS
345
1
0
21 Feb 2025
ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Khanh Le
Tuan Vu Ho
Dung Tran
Duc Thanh Chau
187
4
0
20 Feb 2025
FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
Bingzhe Zhao
Ke Cheng
Aomufei Yuan
Yuxuan Tian
Ruiguang Zhong
Chengchen Hu
Tong Yang
Lian Yu
330
1
0
19 Feb 2025
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuri Kuratov
M. Arkhipov
Aydar Bulatov
Andrey Kravchenko
318
15
0
18 Feb 2025
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
Sihyun Yu
Meera Hahn
Dan Kondratyuk
Jinwoo Shin
Agrim Gupta
José Lezama
Irfan Essa
David A. Ross
Jonathan Huang
DiffM
VGen
689
5
0
18 Feb 2025
Continuous Diffusion Model for Language Modeling
Jaehyeong Jo
Sung Ju Hwang
202
3
0
17 Feb 2025
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
Muhammad Waseem Akram
Stefano Dettori
V. Colla
Giorgio Buttazzo
321
1
0
17 Feb 2025
Associative Recurrent Memory Transformer
Ivan Rodkin
Yuri Kuratov
Aydar Bulatov
Andrey Kravchenko
287
12
0
17 Feb 2025
Theoretical Benefit and Limitation of Diffusion Language Model
Guhao Feng
Yihan Geng
Jian Guan
Wei Wu
Liwei Wang
Di He
DiffM
372
1
0
13 Feb 2025
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Sumin An
Junyoung Sung
Wonpyo Park
Chanjun Park
Paul Hongsuck Seo
613
0
0
10 Feb 2025
Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During Training
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Deven Mahesh Mistry
Anooshka Bajaj
Yash Aggarwal
Sahaj Singh Maini
Zoran Tiganj
84
4
0
09 Feb 2025
LM2: Large Memory Models
Jikun Kang
Wenqi Wu
Filippos Christianos
Alex J. Chan
Fraser Greenlee
George Thomas
Marvin Purtorab
Andy Toulis
KELM
316
7
0
09 Feb 2025
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
394
20
0
09 Feb 2025
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Neural Information Processing Systems (NeurIPS), 2025
Adam Stooke
Rohit Prabhavalkar
K. Sim
P. M. Mengibar
380
2
0
06 Feb 2025
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Gabriel Lindenmaier
Sean Papay
Sebastian Padó
354
0
0
02 Feb 2025
Music Generation using Human-In-The-Loop Reinforcement Learning
BigData Congress [Services Society] (BSS), 2023
Aju Ani Justus
125
3
0
28 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Feng-Long Xie
403
35
0
24 Jan 2025
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
International Conference on Computational Linguistics (COLING), 2024
Thibaut Thonet
Jos Rozen
Laurent Besacier
RALM
462
7
0
20 Jan 2025
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
312
1
0
10 Jan 2025
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
H. S. Bovbjerg
Jan Østergaard
Jesper Jensen
Zheng-Hua Tan
276
1
0
06 Jan 2025
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)
R. Mamidi
Zijiao Chen
Subba Reddy Oota
R. Bapi
G. Jobard
F. Alexandre
X. Hinaut
3DV
AI4CE
391
23
0
31 Dec 2024
Investigating Length Issues in Document-level Machine Translation
Ziqian Peng
Rachel Bawden
François Yvon
344
4
0
23 Dec 2024
L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression
AAAI Conference on Artificial Intelligence (AAAI), 2024
Junxuan Zhang
Zhengxue Cheng
Yan Zhao
Shihao Wang
Dajiang Zhou
Guo Lu
Li Song
312
4
0
21 Dec 2024
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
International Conference on Learning Representations (ICLR), 2024
Pengxiang Li
Lu Yin
Shiwei Liu
288
11
0
18 Dec 2024
Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Elvis Nunez
Luca Zancato
Benjamin Bowman
Aditya Golatkar
Wei Xia
Stefano Soatto
466
7
0
17 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
373
5
0
13 Dec 2024
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models
International Conference on Computational Linguistics (COLING), 2024
Haoran Lian
Junmin Chen
Wei Huang
Yizhe Xiong
Wenping Hu
...
Hui Chen
Jianwei Niu
Zijia Lin
Fuzheng Zhang
Di Zhang
248
2
0
10 Dec 2024
KITE-DDI: A Knowledge graph Integrated Transformer Model for accurately predicting Drug-Drug Interaction Events from Drug SMILES and Biomedical Knowledge Graph
IEEE Access (IEEE Access), 2024
Azwad Tamir
Jiann-Shiun Yuan
183
2
0
08 Dec 2024
FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
Yijiao Wang
Shiju Wang
Shenhan Zhu
Fangcheng Fu
Xinyi Liu
Xuefeng Xiao
Huixia Li
Jiashi Li
Faming Wu
Tengjiao Wang
371
0
0
02 Dec 2024
CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives
Armin Saghafian
Amirmohammad Izadi
Negin Hashemi Dijujin
M. Baghshah
454
0
0
29 Nov 2024
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
Fahao Chen
Peng Li
Zicong Hong
Zhou Su
Song Guo
MoMe
MoE
255
3
0
23 Nov 2024
Transforming NLU with Babylon: A Case Study in Development of Real-time, Edge-Efficient, Multi-Intent Translation System for Automated Drive-Thru Ordering
Mostafa Varzaneh
Pooja Voladoddi
Tanmay Bakshi
Uma Gunturi
228
0
0
22 Nov 2024
Financial Risk Assessment via Long-term Payment Behavior Sequence Folding
Industrial Conference on Data Mining (IDM), 2024
Yiran Qiao
Yateng Tang
Xiang Ao
Qi Yuan
Ziming Liu
Chen Shen
Xuehao Zheng
218
0
0
22 Nov 2024
Previous
1
2
3
4
5
...
39
40
41
Next