v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown

Temporal Chunking Enhances Recognition of Implicit Sequential Patterns

Dhireesha Kudithipudi

273

31 May 2025

ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations

272

29 May 2025

A New Deep-learning-Based Approach For mRNA Optimization: High Fidelity, Computation Efficiency, and Multiple Optimization Factors

136

29 May 2025

Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems

Christopher Ormerod

246

28 May 2025

Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers

Yukun Zhang

Xueqing Zhou

AI4TS

157

27 May 2025

PIPE: Physics-Informed Position Encoding for Alignment of Satellite Images and Time Series

193

27 May 2025

ID-Align: RoPE-Conscious Position Remapping for Dynamic High-Resolution Adaptation in Vision-Language Models

Bozhou Li

Wentao Zhang

VLM

180

27 May 2025

Transformers in Protein: A Survey

338

26 May 2025

MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

292

26 May 2025

Anchored Diffusion Language Model

Litu Rout

Constantine Caramanis

Sanjay Shakkottai

362

24 May 2025

LatentLLM: Attention-Aware Joint Tensor Compression

231

23 May 2025

Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling

1.0K

23 May 2025

Training Long-Context LLMs Efficiently via Chunk-wise OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

22 May 2025

SELF: Self-Extend the Context Length With Logistic Growth Function

271

22 May 2025

LLM-Based Emulation of the Radio Resource Control Layer: Towards AI-Native RAN Protocols

365

22 May 2025

Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes

Zixun Guo

Simon Dixon

248

21 May 2025

dKV-Cache: The Cache for Diffusion Language Models

421

21 May 2025

Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning

Mahesh Godavarti

CoGe

241

21 May 2025

NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative Contexts

312

20 May 2025

CoRank: LLM-Based Compact Reranking with Document Features for Scientific Retrieval

345

19 May 2025

Bishop: Sparsified Bundling Spiking Transformers on Heterogeneous Cores with Error-Constrained PruningInternational Symposium on Computer Architecture (ISCA), 2025

265

18 May 2025

Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

Markos A. Katsoulakis

242

16 May 2025

Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes

Ashok Arora

Neetesh Kumar

236

16 May 2025

ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention

319

15 May 2025

Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons

Andrew Kiruluta

Preethi Raju

Priscilla Burity

110

09 May 2025

Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait

...

236

07 May 2025

A Character-based Diffusion Embedding Algorithm for Enhancing the Generation Quality of Generative Linguistic Steganographic Texts

343

02 May 2025

Compact Recurrent Transformer with Persistent Memory

346

02 May 2025

Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures

Heng-Sheng Chang

P. Mehta

291

01 May 2025

Polysemy of Synthetic Neurons Towards a New Type of Explanatory Categorical Vector Spaces

Michael Veillet-Guillem

MILM

290

30 Apr 2025

From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models

Andrew Kiruluta

224

29 Apr 2025

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

...

239

26 Apr 2025

StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation

...

370

22 Apr 2025

SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training

212

20 Apr 2025

LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers

M. Chowdhury

Md Rifat Ur Rahman

Akil Ahmad Taki

225

19 Apr 2025

CacheFormer: High Attention-Based Segment CachingApplied Informatics (AI), 2025

Sushant Singh

A. Mahmood

224

18 Apr 2025

KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference

374

14 Apr 2025

CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers

Yoshihiro Yamada

ViT

316

09 Apr 2025

Feedback-Enhanced Hallucination-Resistant Vision-Language Model for Real-Time Scene Understanding

Zahir Alsulaimawi

140

07 Apr 2025

On Vanishing Variance in Transformer Length Generalization

258

03 Apr 2025

Semantic Adapter for Universal Text Embeddings: Diagnosing and Mitigating Negation Blindness to Enhance Universality

Hongliu Cao

420

01 Apr 2025

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching

Yuxuan Zhu

Ali Falahati

David H. Yang

Mohammad Mohammadi Amiri

315

01 Apr 2025

A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives

564

01 Apr 2025

TRA: Better Length Generalisation with Threshold Relative Attention

547

29 Mar 2025

SocialGen: Modeling Multi-Human Social Interaction with Language Models

270

28 Mar 2025

Resona: Improving Context Copying in Linear Recurrence Models with Retrieval

Prasanna Parthasarathi

439

28 Mar 2025

Semi-supervised Node Importance Estimation with Informative Distribution Modeling for Uncertainty RegularizationThe Web Conference (WWW), 2025

486

26 Mar 2025

Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing

452

20 Mar 2025

Intra-neuronal attention within language models Relationships between activation and semantics

Corbet Alois Georgeon

Michael Veillet-Guillem

MILM

259

17 Mar 2025

A Survey on Transformer Context Extension: Approaches and Evaluation

520

17 Mar 2025