ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXiv (abs)PDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,017 papers shown
Title
Exploration of Masked and Causal Language Modelling for Text Generation
Exploration of Masked and Causal Language Modelling for Text Generation
Nicolo Micheletti
Samuel Belkadi
Lifeng Han
Goran Nenadic
197
11
0
21 May 2024
Mamba in Speech: Towards an Alternative to Self-Attention
Mamba in Speech: Towards an Alternative to Self-Attention
Xiangyu Zhang
Qiquan Zhang
Hexin Liu
Tianyi Xiao
Xinyuan Qian
Beena Ahmed
E. Ambikairajah
Haizhou Li
Julien Epps
Mamba
307
86
0
21 May 2024
LeaPformer: Enabling Linear Transformers for Autoregressive and
  Simultaneous Tasks via Learned Proportions
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned ProportionsInternational Conference on Machine Learning (ICML), 2024
Victor Agostinelli
Sanghyun Hong
Lizhong Chen
KELM
175
3
0
18 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large
  Language Model Serving
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
224
5
0
18 May 2024
Layer-Condensed KV Cache for Efficient Inference of Large Language
  Models
Layer-Condensed KV Cache for Efficient Inference of Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Haoyi Wu
Kewei Tu
MQ
278
35
0
17 May 2024
A Hybrid Deep Learning Framework for Stock Price Prediction Considering
  the Investor Sentiment of Online Forum Enhanced by Popularity
A Hybrid Deep Learning Framework for Stock Price Prediction Considering the Investor Sentiment of Online Forum Enhanced by Popularity
Huiyu Li
Junhua Hu
87
0
0
17 May 2024
Positional encoding is not the same as context: A study on positional encoding for sequential recommendation
Positional encoding is not the same as context: A study on positional encoding for sequential recommendation
Alejo López-Ávila
Jinhua Du
Abbas Shimary
Ze Li
219
5
0
16 May 2024
Robust Singing Voice Transcription Serves Synthesis
Robust Singing Voice Transcription Serves SynthesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Ruiqi Li
Yu Zhang
Yongqi Wang
Zhiqing Hong
Rongjie Huang
Zhou Zhao
208
16
0
16 May 2024
Enhancing Maritime Trajectory Forecasting via H3 Index and Causal
  Language Modelling (CLM)
Enhancing Maritime Trajectory Forecasting via H3 Index and Causal Language Modelling (CLM)
Nicolas Drapier
Aladine Chetouani
A. Chateigner
110
5
0
15 May 2024
Positional Knowledge is All You Need: Position-induced Transformer (PiT)
  for Operator Learning
Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator LearningInternational Conference on Machine Learning (ICML), 2024
Junfeng Chen
Kailiang Wu
383
10
0
15 May 2024
A Survey on Transformers in NLP with Focus on Efficiency
A Survey on Transformers in NLP with Focus on Efficiency
Wazib Ansar
Saptarsi Goswami
Amlan Chakrabarti
MedIm
269
11
0
15 May 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
Improving Transformers with Dynamically Composable Multi-Head AttentionInternational Conference on Machine Learning (ICML), 2024
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
179
5
0
14 May 2024
Automated Deep Learning for Load Forecasting
Automated Deep Learning for Load Forecasting
Julie Keisler
Sandra Claudel
Gilles Cabriel
Margaux Brégère
AI4TS
165
3
0
14 May 2024
MambaOut: Do We Really Need Mamba for Vision?
MambaOut: Do We Really Need Mamba for Vision?Computer Vision and Pattern Recognition (CVPR), 2024
Weihao Yu
Xinchao Wang
Mamba
245
159
0
13 May 2024
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment
  Generation
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment GenerationInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Jianyi Chen
Wei Xue
Xu Tan
Zhen Ye
Qi-fei Liu
Yi-Ting Guo
123
4
0
13 May 2024
Towards Subgraph Isomorphism Counting with Graph Kernels
Towards Subgraph Isomorphism Counting with Graph Kernels
Xin Liu
Weiqi Wang
Jiaxin Bai
Yangqiu Song
146
1
0
13 May 2024
Transforming the Bootstrap: Using Transformers to Compute Scattering
  Amplitudes in Planar N = 4 Super Yang-Mills Theory
Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory
Tianji Cai
G. W. Merz
Franccois Charton
Niklas Nolte
Matthias Wilhelm
K. Cranmer
Lance J. Dixon
293
22
0
09 May 2024
Multi-Stream Keypoint Attention Network for Sign Language Recognition
  and Translation
Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation
Mo Guan
Yan Wang
Guangkun Ma
Jiarui Liu
Mingzu Sun
SLR
178
12
0
09 May 2024
Smurfs: Multi-Agent System using Context-Efficient DFSDT for Tool Planning
Smurfs: Multi-Agent System using Context-Efficient DFSDT for Tool PlanningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Junzhi Chen
Juhao Liang
Benyou Wang
LLMAG
169
4
0
09 May 2024
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat SpecificityAAAI Conference on Artificial Intelligence (AAAI), 2024
Zhufeng Li
S. S. Cranganore
Nicholas D. Youngblut
Niki Kilbertus
286
4
0
09 May 2024
Lightweight Spatial Modeling for Combinatorial Information Extraction
  From Documents
Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents
Yanfei Dong
Lambert Deng
Jiazheng Zhang
Xiaodong Yu
Ting Lin
Francesco Gelli
Soujanya Poria
W. Lee
163
0
0
08 May 2024
SUTRA: Scalable Multilingual Language Model Architecture
SUTRA: Scalable Multilingual Language Model Architecture
Abhijit Bendale
Michael Sapienza
Steven Ripplinger
Simon Gibbs
Jaewon Lee
Pranav Mistry
LRMELM
185
8
0
07 May 2024
A Transformer with Stack Attention
A Transformer with Stack Attention
Jiaoda Li
Jennifer C. White
Mrinmaya Sachan
Robert Bamler
191
4
0
07 May 2024
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion
  Transformer
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
Zhuoyi Yang
Heyang Jiang
Wenyi Hong
Jiayan Teng
Wendi Zheng
Yuxiao Dong
Ming Ding
Jie Tang
SupR
104
10
0
07 May 2024
AniTalker: Animate Vivid and Diverse Talking Faces through
  Identity-Decoupled Facial Motion Encoding
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion EncodingACM Multimedia (MM), 2024
Tao Liu
Feilong Chen
Shuai Fan
Chenpeng Du
Qi Chen
Xie Chen
Kai Yu
DiffMPINN
177
54
0
06 May 2024
Compressing Long Context for Enhancing RAG with AMR-based Concept
  Distillation
Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation
Kaize Shi
Xueyao Sun
Qing Li
Guandong Xu
212
20
0
06 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGenLM&Ro
278
74
0
06 May 2024
Transformer-Enhanced Motion Planner: Attention-Guided Sampling for
  State-Specific Decision Making
Transformer-Enhanced Motion Planner: Attention-Guided Sampling for State-Specific Decision Making
Zhuang Lei
Jingdong Zhao
Yuntao Li
Zichun Xu
Liangliang Zhao
Hong Liu
163
3
0
30 Apr 2024
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting
  Human Language Comprehension Metrics
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics
J. Michaelov
Catherine Arnett
Benjamin Bergen
172
5
0
30 Apr 2024
Decoding Radiologists' Intentions: A Novel System for Accurate Region
  Identification in Chest X-ray Image Analysis
Decoding Radiologists' Intentions: A Novel System for Accurate Region Identification in Chest X-ray Image Analysis
Akash Awasthi
Safwan Ahmad
Bryant Le
Hien Nguyen
93
2
0
29 Apr 2024
Research and application of artificial intelligence based webshell
  detection model: A literature review
Research and application of artificial intelligence based webshell detection model: A literature review
Mingrui Ma
Lansheng Han
Chunjie Zhou
268
5
0
28 Apr 2024
Setting up the Data Printer with Improved English to Ukrainian Machine
  Translation
Setting up the Data Printer with Improved English to Ukrainian Machine Translation
Yurii Paniv
Dmytro Chaplynskyi
Nikita Trynus
Volodymyr Kyrylov
AI4CE
242
3
0
23 Apr 2024
Enhancing Length Extrapolation in Sequential Models with
  Pointer-Augmented Neural Memory
Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory
Hung Le
D. Nguyen
Kien Do
Svetha Venkatesh
T. Tran
177
0
0
18 Apr 2024
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Jingmin Sun
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
AI4CE
319
33
0
18 Apr 2024
Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning:
  A Comparative Study
Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study
Zooey Nguyen
Anthony Annunziata
Vinh Luong
Sang Dinh
Quynh Le
Anh Hai Ha
Chanh Le
Hong An Phan
Shruti Raghavan
Christopher Nguyen
LRM
137
7
0
17 Apr 2024
AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large
  Language Models for Extracting Cognitive Pathways from Social Media Texts
AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts
Meng Jiang
Y. Yu
Qing Zhao
Jianqiang Li
Changwei Song
...
Wei-dong Zhai
Dan Luo
Xiaoqin Wang
Guanghui Fu
Bing Xiang Yang
142
3
0
17 Apr 2024
Position Engineering: Boosting Large Language Models through Positional
  Information Manipulation
Position Engineering: Boosting Large Language Models through Positional Information Manipulation
Zhiyuan He
Huiqiang Jiang
Zilong Wang
Yuqing Yang
Luna Qiu
Lili Qiu
LLMAG
81
12
0
17 Apr 2024
Hierarchical Context Merging: Better Long Context Understanding for
  Pre-trained LLMs
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
Woomin Song
Seunghyuk Oh
Sangwoo Mo
Jaehyung Kim
Sukmin Yun
Jung-Woo Ha
Jinwoo Shin
170
29
0
16 Apr 2024
TEL'M: Test and Evaluation of Language Models
TEL'M: Test and Evaluation of Language Models
G. Cybenko
Joshua Ackerman
Paul Lintilhac
ALMELM
305
1
0
16 Apr 2024
TransformerFAM: Feedback attention is working memory
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
332
17
0
14 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive
  Review and Analysis of Paradigms and Fine-Tuning Strategies
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
232
13
0
13 Apr 2024
NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT
NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT
Xinzhe Zheng
Sijie Ji
Yipeng Pan
Kaiwen Zhang
Chenshu Wu
239
2
0
13 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
155
48
0
12 Apr 2024
Leave No Context Behind: Efficient Infinite Context Transformers with
  Infini-attention
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Tsendsuren Munkhdalai
Manaal Faruqui
Siddharth Gopal
LRMLLMAGCLL
269
158
0
10 Apr 2024
Bidirectional Long-Range Parser for Sequential Data Understanding
Bidirectional Long-Range Parser for Sequential Data Understanding
George Leotescu
Daniel Voinea
A. Popa
185
1
0
08 Apr 2024
Learning Correlation Structures for Vision Transformers
Learning Correlation Structures for Vision Transformers
Manjin Kim
Paul Hongsuck Seo
Cordelia Schmid
Minsu Cho
ViT
245
25
0
05 Apr 2024
Training LLMs over Neurally Compressed Text
Training LLMs over Neurally Compressed Text
Brian Lester
Jaehoon Lee
A. Alemi
Jeffrey Pennington
Adam Roberts
Jascha Narain Sohl-Dickstein
Noah Constant
175
9
0
04 Apr 2024
A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded
  Dialogue Generation
A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue GenerationInternational Conference on Language Resources and Evaluation (LREC), 2024
Jifan Yu
Xiaohan Zhang
Yifan Xu
Xuanyu Lei
Zijun Yao
Jing Zhang
Lei Hou
Juanzi Li
HILM
245
4
0
04 Apr 2024
Streaming Dense Video Captioning
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
221
72
0
01 Apr 2024
Green AI: Exploring Carbon Footprints, Mitigation Strategies, and Trade
  Offs in Large Language Model Training
Green AI: Exploring Carbon Footprints, Mitigation Strategies, and Trade Offs in Large Language Model Training
Vivian Liu
Yiqiao Yin
270
38
0
01 Apr 2024
Previous
123...789...394041
Next