ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXivPDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 621 papers shown
Title
Out of One, Many: Using Language Models to Simulate Human Samples
Out of One, Many: Using Language Models to Simulate Human Samples
Lisa P. Argyle
Ethan C. Busby
Nancy Fulda
Joshua R Gubler
Christopher Rytting
David Wingate
SyDa
45
549
0
14 Sep 2022
Activity report analysis with automatic single or multispan answer
  extraction
Activity report analysis with automatic single or multispan answer extraction
R. Choudhary
A. Sridhar
Erik M. Visser
16
1
0
09 Sep 2022
Features Fusion Framework for Multimodal Irregular Time-series Events
Features Fusion Framework for Multimodal Irregular Time-series Events
Peiwang Tang
Xianchao Zhang
AI4TS
26
2
0
05 Sep 2022
Recurrent Convolutional Neural Networks Learn Succinct Learning
  Algorithms
Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms
Surbhi Goel
Sham Kakade
Adam Tauman Kalai
Cyril Zhang
32
1
0
01 Sep 2022
Unified Fully and Timestamp Supervised Temporal Action Segmentation via
  Sequence to Sequence Translation
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation
Nadine Behrmann
S. Golestaneh
Zico Kolter
Juergen Gall
M. Noroozi
22
72
0
01 Sep 2022
Deep Sparse Conformer for Speech Recognition
Deep Sparse Conformer for Speech Recognition
Xianchao Wu
20
2
0
01 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
Efficient Sparsely Activated Transformers
Efficient Sparsely Activated Transformers
Salar Latifi
Saurav Muralidharan
M. Garland
MoE
19
2
0
31 Aug 2022
K-Order Graph-oriented Transformer with GraAttention for 3D Pose and
  Shape Estimation
K-Order Graph-oriented Transformer with GraAttention for 3D Pose and Shape Estimation
Weixi Zhao
Weiqiang Wang
ViT
3DPC
21
2
0
24 Aug 2022
Lost in Context? On the Sense-wise Variance of Contextualized Word
  Embeddings
Lost in Context? On the Sense-wise Variance of Contextualized Word Embeddings
Yile Wang
Yue Zhang
19
4
0
20 Aug 2022
Adam Can Converge Without Any Modification On Update Rules
Adam Can Converge Without Any Modification On Update Rules
Yushun Zhang
Congliang Chen
Naichen Shi
Ruoyu Sun
Zhimin Luo
18
62
0
20 Aug 2022
Parallel Hierarchical Transformer with Attention Alignment for
  Abstractive Multi-Document Summarization
Parallel Hierarchical Transformer with Attention Alignment for Abstractive Multi-Document Summarization
Ye Ma
Lu Zong
24
0
0
16 Aug 2022
Abstractive Meeting Summarization: A Survey
Abstractive Meeting Summarization: A Survey
Virgile Rennard
Guokan Shang
Julie Hunter
Michalis Vazirgiannis
32
15
0
08 Aug 2022
Enhancing the Robustness via Adversarial Learning and Joint
  Spatial-Temporal Embeddings in Traffic Forecasting
Enhancing the Robustness via Adversarial Learning and Joint Spatial-Temporal Embeddings in Traffic Forecasting
Juyong Jiang
Binqing Wu
Ling-Hao Chen
Kai Zhang
Sunghun Kim
AI4TS
30
18
0
05 Aug 2022
PointConvFormer: Revenge of the Point-based Convolution
PointConvFormer: Revenge of the Point-based Convolution
Wenxuan Wu
Li Fuxin
Qi Shan
3DPC
25
30
0
04 Aug 2022
Learning from flowsheets: A generative transformer model for
  autocompletion of flowsheets
Learning from flowsheets: A generative transformer model for autocompletion of flowsheets
Gabriel Vogel
Lukas Schulze Balhorn
Artur M. Schweidtmann
AI4CE
35
33
0
01 Aug 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention
  and Its Linearization
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
26
9
0
01 Aug 2022
Is Attention All That NeRF Needs?
Is Attention All That NeRF Needs?
T. MukundVarma
Peihao Wang
Xuxi Chen
Tianlong Chen
Subhashini Venugopalan
Zhangyang Wang
ViT
30
107
0
27 Jul 2022
3D Siamese Transformer Network for Single Object Tracking on Point
  Clouds
3D Siamese Transformer Network for Single Object Tracking on Point Clouds
Le Hui
Lingpeng Wang
Ling-Yu Tang
Kaihao Lan
Jin Xie
Jian Yang
ViT
3DPC
31
59
0
25 Jul 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
33
9
0
24 Jul 2022
Learning Object Placement via Dual-path Graph Completion
Learning Object Placement via Dual-path Graph Completion
Siyuan Zhou
Liu Liu
Li Niu
Liqing Zhang
31
24
0
23 Jul 2022
NUWA-Infinity: Autoregressive over Autoregressive Generation for
  Infinite Visual Synthesis
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Chenfei Wu
Jian Liang
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Lijuan Wang
Zicheng Liu
Yuejian Fang
Nan Duan
VGen
27
72
0
20 Jul 2022
Learning Sequence Representations by Non-local Recurrent Neural Memory
Learning Sequence Representations by Non-local Recurrent Neural Memory
Wenjie Pei
Xin Feng
Canmiao Fu
Qi Cao
Guangming Lu
Yu-Wing Tai
AI4TS
24
1
0
20 Jul 2022
Vision Transformers: From Semantic Segmentation to Dense Prediction
Vision Transformers: From Semantic Segmentation to Dense Prediction
Li Zhang
Jiachen Lu
Sixiao Zheng
Xinxuan Zhao
Xiatian Zhu
Yanwei Fu
Tao Xiang
Jianfeng Feng
Philip H. S. Torr
ViT
27
7
0
19 Jul 2022
Conditional DETR V2: Efficient Detection Transformer with Box Queries
Conditional DETR V2: Efficient Detection Transformer with Box Queries
Xiaokang Chen
Fangyun Wei
Gang Zeng
Jingdong Wang
ViT
27
33
0
18 Jul 2022
Recurrent Memory Transformer
Recurrent Memory Transformer
Aydar Bulatov
Yuri Kuratov
Mikhail Burtsev
CLL
13
102
0
14 Jul 2022
Eliminating Gradient Conflict in Reference-based Line-Art Colorization
Eliminating Gradient Conflict in Reference-based Line-Art Colorization
Zekun Li
Zhengyang Geng
Zhao Kang
Wenyu Chen
Yibo Yang
21
35
0
13 Jul 2022
ReLyMe: Improving Lyric-to-Melody Generation by Incorporating
  Lyric-Melody Relationships
ReLyMe: Improving Lyric-to-Melody Generation by Incorporating Lyric-Melody Relationships
Chen Zhang
Luchin Chang
Songruoyao Wu
Xu Tan
Tao Qin
Tie-Yan Liu
Kecheng Zhang
19
15
0
12 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
21
143
0
06 Jul 2022
Improving Transformer-based Conversational ASR by Inter-Sentential
  Attention Mechanism
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism
Kun Wei
Pengcheng Guo
Ning Jiang
48
11
0
02 Jul 2022
Long Range Language Modeling via Gated State Spaces
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
34
231
0
27 Jun 2022
VLCap: Vision-Language with Contrastive Learning for Coherent Video
  Paragraph Captioning
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
Kashu Yamazaki
Sang Truong
Khoa T. Vo
Michael Kidd
Chase Rainwater
Khoa Luu
Ngan Le
VLM
CoGe
11
25
0
26 Jun 2022
CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
Qihang Yu
Huiyu Wang
Dahun Kim
Siyuan Qiao
Maxwell D. Collins
Yukun Zhu
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
ViT
MedIm
32
90
0
17 Jun 2022
Characteristics of Harmful Text: Towards Rigorous Benchmarking of
  Language Models
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
...
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
25
49
0
16 Jun 2022
PInKS: Preconditioned Commonsense Inference with Minimal Supervision
PInKS: Preconditioned Commonsense Inference with Minimal Supervision
Ehsan Qasemi
Piyush Khanna
Qiang Ning
Muhao Chen
ReLM
LRM
27
8
0
16 Jun 2022
Identifying Electrocardiogram Abnormalities Using a
  Handcrafted-Rule-Enhanced Neural Network
Identifying Electrocardiogram Abnormalities Using a Handcrafted-Rule-Enhanced Neural Network
Yu Bian
Jintai Chen
Xiaojun Chen
Xiaoxian Yang
Da Chen
Jian Wu
33
9
0
16 Jun 2022
Recurrent Transformer Variational Autoencoders for Multi-Action Motion
  Synthesis
Recurrent Transformer Variational Autoencoders for Multi-Action Motion Synthesis
Rania Briq
Chuhang Zou
L. Pishchulin
Christopher Broaddus
Juergen Gall
21
1
0
14 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
54
527
0
13 Jun 2022
GateHUB: Gated History Unit with Background Suppression for Online
  Action Detection
GateHUB: Gated History Unit with Background Suppression for Online Action Detection
Junwen Chen
Gaurav Mittal
Ye Yu
Yu Kong
Mei Chen
41
33
0
09 Jun 2022
Meet You Halfway: Explaining Deep Learning Mysteries
Meet You Halfway: Explaining Deep Learning Mysteries
Oriel BenShmuel
AAML
FedML
FAtt
OOD
22
0
0
09 Jun 2022
Online Neural Diarization of Unlimited Numbers of Speakers Using Global
  and Local Attractors
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors
Shota Horiguchi
Shinji Watanabe
Leibny Paola García-Perera
Yuki Takashima
Y. Kawaguchi
39
23
0
06 Jun 2022
Learning Speaker-specific Lip-to-Speech Generation
Learning Speaker-specific Lip-to-Speech Generation
Munender Varshney
Ravindra Yadav
Vinay P. Namboodiri
R. Hegde
16
7
0
04 Jun 2022
Deep Transformer Q-Networks for Partially Observable Reinforcement
  Learning
Deep Transformer Q-Networks for Partially Observable Reinforcement Learning
Kevin Esslinger
Robert W. Platt
Chris Amato
OffRL
29
35
0
02 Jun 2022
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
  Mechanisms in Sequence Learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
115
17
0
30 May 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
63
2,024
0
27 May 2022
Contrastive Siamese Network for Semi-supervised Speech Recognition
Contrastive Siamese Network for Semi-supervised Speech Recognition
S. Khorram
Jaeyoung Kim
Anshuman Tripathi
Han Lu
Qian Zhang
Hasim Sak
SSL
19
11
0
27 May 2022
Do we really need temporal convolutions in action segmentation?
Do we really need temporal convolutions in action segmentation?
Dazhao Du
Bing-Huang Su
Yu Li
Zhongang Qi
Hui Xiong
Ying Shan
ViT
21
16
0
26 May 2022
DT-SV: A Transformer-based Time-domain Approach for Speaker Verification
DT-SV: A Transformer-based Time-domain Approach for Speaker Verification
Nan Zhang
Jianzong Wang
Zhenhou Hong
Chendong Zhao
Xiaoyang Qu
Jing Xiao
29
5
0
26 May 2022
Training Language Models with Memory Augmentation
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
234
128
0
25 May 2022
Deep Learning Meets Software Engineering: A Survey on Pre-Trained Models
  of Source Code
Deep Learning Meets Software Engineering: A Survey on Pre-Trained Models of Source Code
Changan Niu
Chuanyi Li
Bin Luo
Vincent Ng
SyDa
VLM
47
48
0
24 May 2022
Previous
123456...111213
Next