v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown

HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models

175

14 Mar 2025

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

331

13 Mar 2025

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language ModelsInternational Conference on Learning Representations (ICLR), 2025

624

140

12 Mar 2025

Open-World Skill Discovery from Unsegmented Demonstrations

223

11 Mar 2025

Context-aware Biases for Length Extrapolation

Ali Veisi

Hamidreza Amirzadeh

Amir Mansourian

563

11 Mar 2025

eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference

847

10 Mar 2025

Learning Transformer-based World Models with Contrastive Predictive CodingInternational Conference on Learning Representations (ICLR), 2025

Maxime Burchi

Radu Timofte

355

06 Mar 2025

^2

M: Mutual Information Scaling Law for Long-Context Language Modeling

305

06 Mar 2025

ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment

292

04 Mar 2025

Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer

353

04 Mar 2025

Transformer Meets Twicing: Harnessing Unattended Residual InformationInternational Conference on Learning Representations (ICLR), 2025

Laziz U. Abdullaev

Tan M. Nguyen

556

02 Mar 2025

Revisiting Kernel Attention with Correlated Gaussian Process RepresentationConference on Uncertainty in Artificial Intelligence (UAI), 2025

362

27 Feb 2025

Sliding Window Attention Training for Efficient Large Language Models

468

26 Feb 2025

How Vital is the Jurisprudential Relevance: Law Article Intervened Legal Case Retrieval and Matching

268

25 Feb 2025

Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context MaskingInterspeech (Interspeech), 2024

Khanh Le

Duc Thanh Chau

AI4TS

271

24 Feb 2025

The Role of Sparsity for Length Generalization in Transformers

237

24 Feb 2025

Enhancing RWKV-based Language Models for Long-Sequence Text Generation

Xinghan Pan

332

21 Feb 2025

RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse AttentionPattern Recognition (Pattern Recogn.), 2024

345

21 Feb 2025

ChunkFormer: Masked Chunking Conformer For Long-Form Speech TranscriptionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

187

20 Feb 2025

FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference

330

19 Feb 2025

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space CapacityAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

318

18 Feb 2025

MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation

689

18 Feb 2025

Continuous Diffusion Model for Language Modeling

Jaehyeong Jo

Sung Ju Hwang

202

17 Feb 2025

ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition

Muhammad Waseem Akram

Stefano Dettori

V. Colla

Giorgio Buttazzo

321

17 Feb 2025

Associative Recurrent Memory Transformer

287

17 Feb 2025

Theoretical Benefit and Limitation of Diffusion Language Model

372

13 Feb 2025

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

613

10 Feb 2025

Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During TrainingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

09 Feb 2025

LM2: Large Memory Models

316

09 Feb 2025

The Curse of Depth in Large Language Models

394

09 Feb 2025

Aligner-Encoders: Self-Attention Transformers Can Be Self-TransducersNeural Information Processing Systems (NeurIPS), 2025

380

06 Feb 2025

Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures

Gabriel Lindenmaier

Sean Papay

Sebastian Padó

354

02 Feb 2025

Music Generation using Human-In-The-Loop Reinforcement LearningBigData Congress [Services Society] (BSS), 2023

Aju Ani Justus

125

28 Jan 2025

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

403

24 Jan 2025

ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language ModelsInternational Conference on Computational Linguistics (COLING), 2024

462

20 Jan 2025

Benchmarking Rotary Position Embeddings for Automatic Speech Recognition

312

10 Jan 2025

Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining

276

06 Jan 2025

Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)

391

31 Dec 2024

Investigating Length Issues in Document-level Machine Translation

Ziqian Peng

Rachel Bawden

François Yvon

344

23 Dec 2024

L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text CompressionAAAI Conference on Artificial Intelligence (AAAI), 2024

312

21 Dec 2024

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LNInternational Conference on Learning Representations (ICLR), 2024

Pengxiang Li

Lu Yin

Shiwei Liu

288

18 Dec 2024

Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models

466

17 Dec 2024

Advances in Transformers for Robotic Applications: A Review

Nikunj Sanghai

Nik Bear Brown

AI4CE

373

13 Dec 2024

Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language ModelsInternational Conference on Computational Linguistics (COLING), 2024

...

248

10 Dec 2024

KITE-DDI: A Knowledge graph Integrated Transformer Model for accurately predicting Drug-Drug Interaction Events from Drug SMILES and Biomedical Knowledge GraphIEEE Access (IEEE Access), 2024

Azwad Tamir

Jiann-Shiun Yuan

183

08 Dec 2024

FlexSP: Accelerating Large Language Model Training via Flexible Sequence ParallelismInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024

371

02 Dec 2024

CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives

Armin Saghafian

Amirmohammad Izadi

Negin Hashemi Dijujin

M. Baghshah

454

29 Nov 2024

Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation

255

23 Nov 2024

Transforming NLU with Babylon: A Case Study in Development of Real-time, Edge-Efficient, Multi-Intent Translation System for Automated Drive-Thru Ordering

228

22 Nov 2024

Financial Risk Assessment via Long-term Payment Behavior Sequence FoldingIndustrial Conference on Data Mining (IDM), 2024

218

22 Nov 2024

All Papers

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"