ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXiv (abs)PDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown
EpiK-Eval: Evaluation for Language Models as Epistemic Models
EpiK-Eval: Evaluation for Language Models as Epistemic ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Gabriele Prato
Jerry Huang
Prasannna Parthasarathi
Shagun Sodhani
Sarath Chandar
ELM
248
6
0
23 Oct 2023
Meta learning with language models: Challenges and opportunities in the
  classification of imbalanced text
Meta learning with language models: Challenges and opportunities in the classification of imbalanced text
Apostol T. Vassilev
Honglan Jin
Munawar Hasan
264
1
0
23 Oct 2023
Retrieval-Augmented Chain-of-Thought in Semi-structured Domains
Retrieval-Augmented Chain-of-Thought in Semi-structured Domains
Vaibhav Mavi
Abulhair Saparov
Chen Zhao
LRM
279
10
0
22 Oct 2023
Conversational Speech Recognition by Learning Audio-textual Cross-modal
  Contextual Representation
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual RepresentationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Kun Wei
Bei Li
Hang Lv
Quan Lu
Ning Jiang
Lei Xie
389
11
0
22 Oct 2023
Enhanced Low-Dimensional Sensing Mapless Navigation of Terrestrial
  Mobile Robots Using Double Deep Reinforcement Learning Techniques
Enhanced Low-Dimensional Sensing Mapless Navigation of Terrestrial Mobile Robots Using Double Deep Reinforcement Learning Techniques
Linda Dotto de Moraes
V. A. Kich
A. H. Kolling
J. A. Bottega
Ricardo B. Grando
A. R. Cukla
D. T. Gamarra
160
1
0
20 Oct 2023
Multi-level Contrastive Learning for Script-based Character
  Understanding
Multi-level Contrastive Learning for Script-based Character Understanding
Dawei Li
Hengyuan Zhang
Yanran Li
Shiping Yang
277
17
0
20 Oct 2023
Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
  Relative Pose Encoding
Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding
Zhejun Zhang
Alexander Liniger
Daniel Gehrig
Fisher Yu
Luc Van Gool
301
55
0
19 Oct 2023
A Predictive Factor Analysis of Social Biases and Task-Performance in
  Pretrained Masked Language Models
A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models
Yi Zhou
Jose Camacho-Collados
Danushka Bollegala
436
7
0
19 Oct 2023
The Locality and Symmetry of Positional Encodings
The Locality and Symmetry of Positional Encodings
Lihu Chen
Gaël Varoquaux
Fabian M. Suchanek
185
1
0
19 Oct 2023
Efficient Long-Range Transformers: You Need to Attend More, but Not
  Necessarily at Every Layer
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every LayerConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Qingru Zhang
Dhananjay Ram
Cole Hawkins
Sheng Zha
Tuo Zhao
278
22
0
19 Oct 2023
From Interpolation to Extrapolation: Complete Length Generalization for
  Arithmetic Transformers
From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers
Shaoxiong Duan
Yining Shi
Wei Xu
281
16
0
18 Oct 2023
Long-form Simultaneous Speech Translation: Thesis Proposal
Long-form Simultaneous Speech Translation: Thesis ProposalInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
Peter Polák
3DV
208
3
0
17 Oct 2023
Heterogenous Memory Augmented Neural Networks
Heterogenous Memory Augmented Neural Networks
Zihan Qiu
Zhen Liu
Shuicheng Yan
Shanghang Zhang
Jie Fu
205
0
0
17 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Approximating Two-Layer Feedforward Networks for Efficient TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
420
23
0
16 Oct 2023
A Survey on Video Diffusion Models
A Survey on Video Diffusion ModelsACM Computing Surveys (ACM Comput. Surv.), 2023
Zhen Xing
Qijun Feng
Haoran Chen
Jingdong Sun
Hang-Rui Hu
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
EGVMVGen
457
220
0
16 Oct 2023
Repetition In Repetition Out: Towards Understanding Neural Text
  Degeneration from the Data Perspective
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data PerspectiveNeural Information Processing Systems (NeurIPS), 2023
Huayang Li
Tian Lan
Z. Fu
Deng Cai
Lemao Liu
Nigel Collier
Taro Watanabe
Yixuan Su
206
28
0
16 Oct 2023
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation
Yingwei Ma
Yue Yu
Shanshan Li
Yu Jiang
Yong Guo
Yuanliang Zhang
Yutao Xie
Xiangke Liao
184
13
0
16 Oct 2023
Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels
Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels
Thomas Jiralerspong
Flemming Kondrup
Doina Precup
Khimya Khetarpal
144
0
0
16 Oct 2023
CoCoFormer: A controllable feature-rich polyphonic music generation
  method
CoCoFormer: A controllable feature-rich polyphonic music generation method
Jiuyang Zhou
Tengfei Niu
Hong Zhu
Xingping Wang
235
0
0
15 Oct 2023
STORM: Efficient Stochastic Transformer based World Models for
  Reinforcement Learning
STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning
Weipu Zhang
Gang Wang
Jian Sun
Yetian Yuan
Gao Huang
239
95
0
14 Oct 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language
  Models
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLMVLM
409
70
0
13 Oct 2023
MemGPT: Towards LLMs as Operating Systems
MemGPT: Towards LLMs as Operating Systems
Charles Packer
Sarah Wooders
Kevin Lin
Vivian Fang
Shishir G. Patil
Ion Stoica
Alfons Kemper
RALM
1.7K
333
0
12 Oct 2023
Cross-Episodic Curriculum for Transformer Agents
Cross-Episodic Curriculum for Transformer AgentsNeural Information Processing Systems (NeurIPS), 2023
Lucy Xiaoyang Shi
Yunfan Jiang
Jake Grigsby
Linxi "Jim" Fan
Yuke Zhu
167
9
0
12 Oct 2023
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
GROOT: Learning to Follow Instructions by Watching Gameplay VideosInternational Conference on Learning Representations (ICLR), 2023
Shaofei Cai
Bowei Zhang
Zihao Wang
Xiaojian Ma
Hoang Trung-Dung
Yitao Liang
322
38
0
12 Oct 2023
DASpeech: Directed Acyclic Transformer for Fast and High-quality
  Speech-to-Speech Translation
DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech TranslationNeural Information Processing Systems (NeurIPS), 2023
Qingkai Fang
Yan Zhou
Yangzhou Feng
210
16
0
11 Oct 2023
Argumentative Stance Prediction: An Exploratory Study on Multimodality
  and Few-Shot Learning
Argumentative Stance Prediction: An Exploratory Study on Multimodality and Few-Shot LearningWorkshop on Argument Mining (ArgMining), 2023
Arushi Sharma
Abhibha Gupta
Maneesh Bilalpur
182
7
0
11 Oct 2023
Humans and language models diverge when predicting repeating text
Humans and language models diverge when predicting repeating textConference on Computational Natural Language Learning (CoNLL), 2023
Aditya R. Vaidya
Javier S. Turek
Alexander G. Huth
247
10
0
10 Oct 2023
Walking Down the Memory Maze: Beyond Context Limit through Interactive
  Reading
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Howard Chen
Ramakanth Pasunuru
Jason Weston
Asli Celikyilmaz
RALM
334
116
0
08 Oct 2023
Uncovering hidden geometry in Transformers via disentangling position
  and context
Uncovering hidden geometry in Transformers via disentangling position and context
Jiajun Song
Yiqiao Zhong
248
14
0
07 Oct 2023
Higher-Order DeepTrails: Unified Approach to *Trails
Higher-Order DeepTrails: Unified Approach to *TrailsLernen, Wissen, Daten, Analysen (LWA), 2023
Tobias Koopmann
Jan Pfister
André Markus
Astrid Carolus
Carolin Wienrich
Andreas Hotho
AI4TS
78
0
0
06 Oct 2023
Investigating Alternative Feature Extraction Pipelines For Clinical Note
  Phenotyping
Investigating Alternative Feature Extraction Pipelines For Clinical Note Phenotyping
Daniel Neil
121
0
0
05 Oct 2023
Neural architecture impact on identifying temporally extended
  Reinforcement Learning tasks
Neural architecture impact on identifying temporally extended Reinforcement Learning tasks
Victor Vadakechirayath George
OffRL
157
0
0
04 Oct 2023
Retrieval meets Long Context Large Language Models
Retrieval meets Long Context Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Peng Xu
Ming-Yu Liu
Xianchao Wu
Lawrence C. McAfee
Chen Zhu
Zihan Liu
Sandeep Subramanian
Evelina Bakhturina
Mohammad Shoeybi
Bryan Catanzaro
RALMLRM
458
112
0
04 Oct 2023
Memoria: Resolving Fateful Forgetting Problem through Human-Inspired
  Memory Architecture
Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory ArchitectureInternational Conference on Machine Learning (ICML), 2023
Sangjun Park
Jinyeong Bak
CLL
288
6
0
04 Oct 2023
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for
  Transformer Layers
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer LayersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yiming Wang
Jinyu Li
207
11
0
03 Oct 2023
Dodo: Dynamic Contextual Compression for Decoder-only LMs
Dodo: Dynamic Contextual Compression for Decoder-only LMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Guanghui Qin
Corby Rosset
Ethan C. Chau
Nikhil Rao
Benjamin Van Durme
198
17
0
03 Oct 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by
  Language-based Semantic Alignment
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic AlignmentInternational Conference on Learning Representations (ICLR), 2023
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
...
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLMMLLM
758
340
0
03 Oct 2023
The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers under Fully Homomorphic Encryption on the Torus
The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers under Fully Homomorphic Encryption on the Torus
Rickard Brannvall
Andrei Stoian
170
0
0
03 Oct 2023
A Framework for Inference Inspired by Human Memory Mechanisms
A Framework for Inference Inspired by Human Memory MechanismsInternational Conference on Learning Representations (ICLR), 2023
Xiangyu Zeng
Jie Lin
Piao Hu
Ruizheng Huang
Zhicheng Zhang
192
4
0
01 Oct 2023
GrowLength: Accelerating LLMs Pretraining by Progressively Growing
  Training Length
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Chia-Yuan Chang
Helen Zhou
157
15
0
01 Oct 2023
Self-Supervised Open-Ended Classification with Small Visual Language
  Models
Self-Supervised Open-Ended Classification with Small Visual Language Models
Mohammad Mahdi Derakhshani
Ivona Najdenkoska
Cees G. M. Snoek
M. Worring
Yuki M. Asano
VLM
416
0
0
30 Sep 2023
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm
Contextual Biasing with the Knuth-Morris-Pratt Matching AlgorithmInterspeech (Interspeech), 2023
Weiran Wang
Zelin Wu
D. Caseiro
Tsendsuren Munkhdalai
K. Sim
...
Rohit Prabhavalkar
Zhong Meng
Ding Zhao
Tara N. Sainath
P. M. Mengibar
245
11
0
29 Sep 2023
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of
  Agents
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents
Marco Pleines
Matthias Pallasch
Frank Zimmer
Mike Preuss
OffRL
320
10
0
29 Sep 2023
LatticeGen: A Cooperative Framework which Hides Generated Text in a
  Lattice for Privacy-Aware Generation on Cloud
LatticeGen: A Cooperative Framework which Hides Generated Text in a Lattice for Privacy-Aware Generation on Cloud
Mengke Zhang
Tianxing He
Tianle Wang
Lu Mi
Fatemehsadat Mireshghallah
Binyi Chen
Hao Wang
Yulia Tsvetkov
225
2
0
29 Sep 2023
PROSE: Predicting Operators and Symbolic Expressions using Multimodal
  Transformers
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
211
22
0
28 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
273
23
0
28 Sep 2023
Unsupervised Pretraining for Fact Verification by Language Model
  Distillation
Unsupervised Pretraining for Fact Verification by Language Model DistillationInternational Conference on Learning Representations (ICLR), 2023
A. Bazaga
Pietro Lio
Bo Dai
HILM
351
5
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector QuantizationInternational Conference on Learning Representations (ICLR), 2023
Albert Mohwald
250
26
0
28 Sep 2023
At Which Training Stage Does Code Data Help LLMs Reasoning?
At Which Training Stage Does Code Data Help LLMs Reasoning?International Conference on Learning Representations (ICLR), 2023
Xiaogang Jia
Yue Liu
Yue Yu
Yuanliang Zhang
Yu Jiang
Changjian Wang
Shanshan Li
LRMSyDa
363
90
0
28 Sep 2023
Attention Sorting Combats Recency Bias In Long Context Language Models
Attention Sorting Combats Recency Bias In Long Context Language Models
A. Peysakhovich
Adam Lerer
LRMRALM
323
82
0
28 Sep 2023
Previous
123...111213...394041
Next
Page 12 of 41
Pageof 41