ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXivPDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 604 papers shown
Title
Exploration of Masked and Causal Language Modelling for Text Generation
Exploration of Masked and Causal Language Modelling for Text Generation
Nicolo Micheletti
Samuel Belkadi
Lifeng Han
Goran Nenadic
44
6
0
21 May 2024
Positional encoding is not the same as context: A study on positional encoding for sequential recommendation
Positional encoding is not the same as context: A study on positional encoding for sequential recommendation
Alejo López-Ávila
Jinhua Du
Abbas Shimary
Ze Li
38
1
0
16 May 2024
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency
  for Tool Planning
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning
Junzhi Chen
Juhao Liang
Benyou Wang
LLMAG
28
3
0
09 May 2024
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity
Zhufeng Li
S. S. Cranganore
Nicholas D. Youngblut
Niki Kilbertus
47
2
0
09 May 2024
Lightweight Spatial Modeling for Combinatorial Information Extraction
  From Documents
Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents
Yanfei Dong
Lambert Deng
Jiazheng Zhang
Xiaodong Yu
Ting Lin
Francesco Gelli
Soujanya Poria
W. Lee
40
0
0
08 May 2024
SUTRA: Scalable Multilingual Language Model Architecture
SUTRA: Scalable Multilingual Language Model Architecture
Abhijit Bendale
Michael Sapienza
Steven Ripplinger
Simon Gibbs
Jaewon Lee
Pranav Mistry
LRM
ELM
36
4
0
07 May 2024
Navigating the Landscape of Large Language Models: A Comprehensive
  Review and Analysis of Paradigms and Fine-Tuning Strategies
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
44
7
0
13 Apr 2024
Green AI: Exploring Carbon Footprints, Mitigation Strategies, and Trade
  Offs in Large Language Model Training
Green AI: Exploring Carbon Footprints, Mitigation Strategies, and Trade Offs in Large Language Model Training
Vivian Liu
Yiqiao Yin
40
11
0
01 Apr 2024
RealKIE: Five Novel Datasets for Enterprise Key Information Extraction
RealKIE: Five Novel Datasets for Enterprise Key Information Extraction
Benjamin Townsend
Madison May
Christopher Wells
SyDa
37
0
0
29 Mar 2024
Equipping Sketch Patches with Context-Aware Positional Encoding for Graphic Sketch Representation
Equipping Sketch Patches with Context-Aware Positional Encoding for Graphic Sketch Representation
Sicong Zang
Zhijun Fang
34
0
0
26 Mar 2024
Large Language Models for Blockchain Security: A Systematic Literature Review
Large Language Models for Blockchain Security: A Systematic Literature Review
Zheyuan He
Zihao Li
Sen Yang
Ao Qiao
Xiaosong Zhang
Xiapu Luo
Ting Chen
Ting Chen
PILM
42
14
0
21 Mar 2024
Investigating Recurrent Transformers with Dynamic Halt
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
39
1
0
01 Feb 2024
Positional Encoding Helps Recurrent Neural Networks Handle a Large
  Vocabulary
Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary
Takashi Morita
16
3
0
31 Jan 2024
When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges
When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges
Wang Chao
Jiaxuan Zhao
Licheng Jiao
Lingling Li
Fang Liu
Shuyuan Yang
72
13
0
19 Jan 2024
Hyperspectral Image Denoising via Spatial-Spectral Recurrent Transformer
Hyperspectral Image Denoising via Spatial-Spectral Recurrent Transformer
Guanyiman Fu
Fengchao Xiong
Jianfeng Lu
Jun Zhou
Jiantao Zhou
Yuntao Qian
ViT
19
11
0
31 Dec 2023
Zebra: Extending Context Window with Layerwise Grouped Local-Global
  Attention
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
29
7
0
14 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
26
9
0
13 Dec 2023
MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness
MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness
Xiaoyun Xu
Shujian Yu
Jingzheng Wu
S. Picek
AAML
35
0
0
08 Dec 2023
Active Foundational Models for Fault Diagnosis of Electrical Motors
Active Foundational Models for Fault Diagnosis of Electrical Motors
Sriram Anbalagan
GP SaiShashank
D. Agarwal
Balasubramaniam Natarajan
Babji Srinivasan
AI4CE
21
0
0
27 Nov 2023
Looped Transformers are Better at Learning Learning Algorithms
Looped Transformers are Better at Learning Learning Algorithms
Liu Yang
Kangwook Lee
Robert D. Nowak
Dimitris Papailiopoulos
24
24
0
21 Nov 2023
Shedding the Bits: Pushing the Boundaries of Quantization with
  Minifloats on FPGAs
Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs
Shivam Aggarwal
Hans Jakob Damsgaard
Alessandro Pappalardo
Giuseppe Franco
Thomas B. Preußer
Michaela Blott
Tulika Mitra
MQ
19
5
0
21 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for
  Histopathology Whole Slide Image Analysis
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
35
4
0
21 Nov 2023
Argumentation Element Annotation Modeling using XLNet
Argumentation Element Annotation Modeling using XLNet
Christopher M. Ormerod
Amy Burkhardt
Mackenzie Young
Susan Lottridge
28
2
0
10 Nov 2023
TorchDEQ: A Library for Deep Equilibrium Models
TorchDEQ: A Library for Deep Equilibrium Models
Zhengyang Geng
J. Zico Kolter
VLM
56
12
0
28 Oct 2023
MemGPT: Towards LLMs as Operating Systems
MemGPT: Towards LLMs as Operating Systems
Charles Packer
Sarah Wooders
Kevin Lin
Vivian Fang
Shishir G. Patil
Ion Stoica
Joseph E. Gonzalez
RALM
34
127
0
12 Oct 2023
Argumentative Stance Prediction: An Exploratory Study on Multimodality
  and Few-Shot Learning
Argumentative Stance Prediction: An Exploratory Study on Multimodality and Few-Shot Learning
Arushi Sharma
Abhibha Gupta
Maneesh Bilalpur
19
4
0
11 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
29
15
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
28
15
0
28 Sep 2023
Interactive Distillation of Large Single-Topic Corpora of Scientific
  Papers
Interactive Distillation of Large Single-Topic Corpora of Scientific Papers
N. Solovyev
Ryan Barron
Manish Bhattarai
M. Eren
Kim Ø. Rasmussen
Boian S. Alexandrov
11
1
0
19 Sep 2023
BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer
BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer
Kunkun Pang
Dafei Qin
Yingruo Fan
Julian Habekost
Takaaki Shiratori
Junichi Yamagishi
Taku Komura
SLR
ViT
21
19
0
07 Sep 2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge
  2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Zhihang Xu
Shaofei Zhang
Xi Wang
Jiajun Zhang
Wenning Wei
Lei He
Sheng Zhao
16
2
0
06 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Xiaozhong Liu
78
31
0
27 Aug 2023
Detecting Spells in Fantasy Literature with a Transformer Based
  Artificial Intelligence
Detecting Spells in Fantasy Literature with a Transformer Based Artificial Intelligence
Marcel Moravek
Alexander Zender
Andreas Müller
10
0
0
07 Aug 2023
DETR Doesn't Need Multi-Scale or Locality Design
DETR Doesn't Need Multi-Scale or Locality Design
Yutong Lin
Yuhui Yuan
Zheng-Wei Zhang
Chen Li
Nanning Zheng
Han Hu
37
5
0
03 Aug 2023
Attention over pre-trained Sentence Embeddings for Long Document
  Classification
Attention over pre-trained Sentence Embeddings for Long Document Classification
Amine Abdaoui
Sourav Dutta
22
1
0
18 Jul 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for
  Speech Recognition and Understanding
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
S. Bhattacharya
26
6
0
12 Jul 2023
Learning to Solve Constraint Satisfaction Problems with Recurrent
  Transformer
Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer
Zhun Yang
Adam Ishay
Joohyung Lee
35
9
0
10 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts
Lost in the Middle: How Language Models Use Long Contexts
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Percy Liang
RALM
40
1,403
0
06 Jul 2023
LEA: Improving Sentence Similarity Robustness to Typos Using Lexical
  Attention Bias
LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention Bias
Mario Almagro
Emilio Almazán
Diego Ortego
David Jiménez
23
3
0
06 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
41
151
0
05 Jul 2023
Implicit Memory Transformer for Computationally Efficient Simultaneous
  Speech Translation
Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation
Matthew Raffel
Lizhong Chen
9
5
0
03 Jul 2023
Shiftable Context: Addressing Training-Inference Context Mismatch in
  Simultaneous Speech Translation
Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation
Matthew Raffel
Drew Penney
Lizhong Chen
16
3
0
03 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph
  Reading
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
19
5
0
03 Jul 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
25
8
0
26 Jun 2023
Towards Effective and Compact Contextual Representation for Conformer
  Transducer Speech Recognition Systems
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Mingyu Cui
Jiawen Kang
Jiajun Deng
Xiaoyue Yin
Yutao Xie
Xie Chen
Xunying Liu
27
8
0
23 Jun 2023
Graph Inductive Biases in Transformers without Message Passing
Graph Inductive Biases in Transformers without Message Passing
Liheng Ma
Chen Lin
Derek Lim
Adriana Romero Soriano
P. Dokania
Mark J. Coates
Philip H. S. Torr
Ser-Nam Lim
AI4CE
31
85
0
27 May 2023
Passive learning of active causal strategies in agents and language
  models
Passive learning of active causal strategies in agents and language models
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Ishita Dasgupta
A. Nam
Jane X. Wang
29
15
0
25 May 2023
Focus Your Attention (with Adaptive IIR Filters)
Focus Your Attention (with Adaptive IIR Filters)
Shahar Lutati
Itamar Zimerman
Lior Wolf
32
9
0
24 May 2023
When Does Monolingual Data Help Multilingual Translation: The Role of
  Domain and Model Scale
When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale
Christos Baziotis
Biao Zhang
Alexandra Birch
Barry Haddow
30
2
0
23 May 2023
DAPR: A Benchmark on Document-Aware Passage Retrieval
DAPR: A Benchmark on Document-Aware Passage Retrieval
Kexin Wang
Nils Reimers
Iryna Gurevych
18
5
0
23 May 2023
Previous
12345...111213
Next