v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown

TMI! Finetuned Models Leak Private Information from their Pretraining DataProceedings on Privacy Enhancing Technologies (PoPETs), 2023

305

01 Jun 2023

Exposing Attention Glitches with Flip-Flop Language ModelingNeural Information Processing Systems (NeurIPS), 2023

223

01 Jun 2023

STEVE-1: A Generative Model for Text-to-Behavior in MinecraftNeural Information Processing Systems (NeurIPS), 2023

Jimmy Ba

352

01 Jun 2023

Inspecting Spoken Language Understanding from Kids for Basic Math Learning at HomeWorkshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2023

190

01 Jun 2023

Monotonic Location Attention for Length GeneralizationInternational Conference on Machine Learning (ICML), 2023

Jishnu Ray Chowdhury

Cornelia Caragea

LLMAG

177

31 May 2023

The Impact of Positional Encoding on Length Generalization in TransformersNeural Information Processing Systems (NeurIPS), 2023

Amirhossein Kazemnejad

Inkit Padhi

Karthikeyan N. Ramamurthy

Payel Das

Siva Reddy

390

312

31 May 2023

Blockwise Parallel Transformer for Large Context Models

Hao Liu

Pieter Abbeel

277

30 May 2023

NetHack is Hard to HackNeural Information Processing Systems (NeurIPS), 2023

Ulyana Piterbarg

Lerrel Pinto

Rob Fergus

269

30 May 2023

HyperConformer: Multi-head HyperMixer for Efficient Speech RecognitionInterspeech (Interspeech), 2023

160

29 May 2023

A Quantitative Review on Language Model Efficiency Research

Meng Jiang

Hy Dang

Lingbo Tong

206

28 May 2023

Graph Inductive Biases in Transformers without Message PassingInternational Conference on Machine Learning (ICML), 2023

Liheng Ma

Chen Lin

Derek Lim

Adriana Romero Soriano

Ser-Nam Lim

260

151

27 May 2023

Slide, Constrain, Parse, Repeat: Synchronous SlidingWindows for Document AMR Parsing

Yara Rizk

Tahira Naseem

Ramón Fernández Astudillo

Radu Florian

Salim Roukos

172

26 May 2023

Sentence-Incremental Neural Coreference ResolutionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

268

26 May 2023

Randomized Positional Encodings Boost Length Generalization of TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

236

128

26 May 2023

Landmark Attention: Random-Access Infinite Context Length for TransformersNeural Information Processing Systems (NeurIPS), 2023

Amirkeivan Mohtashami

Martin Jaggi

LLMAG

341

197

25 May 2023

Passive learning of active causal strategies in agents and language modelsNeural Information Processing Systems (NeurIPS), 2023

424

25 May 2023

Focus Your Attention (with Adaptive IIR Filters)Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Shahar Lutati

Itamar Zimerman

Lior Wolf

343

24 May 2023

InterFormer: Interactive Local and Global Features Fusion for Automatic Speech RecognitionInterspeech (Interspeech), 2023

Xinyuan Qian

130

24 May 2023

AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient ContentNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Shuyang Cao

Lu Wang

250

24 May 2023

Adapting Language Models to Compress ContextsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Alexander Wettig

289

258

24 May 2023

Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yinghan Long

Sayeed Shafayet Chowdhury

Kaushik Roy

330

24 May 2023

From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Kayhan Batmanghelich

180

23 May 2023

When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model ScaleNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

397

23 May 2023

DAPR: A Benchmark on Document-Aware Passage RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Kexin Wang

Nils Reimers

Iryna Gurevych

368

23 May 2023

NarrativeXL: A Large-scale Dataset For Long-Term Memory ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

A. Moskvichev

Ky-Vinh Mai

RALM

212

23 May 2023

VideoLLM: Modeling Video Sequence with Large Language Models

Yifei Huang

...

Yi Wang

Yu Qiao

264

114

22 May 2023

GNCformer Enhanced Self-attention for Automatic Speech Recognition

145

22 May 2023

FIT: Far-reaching Interleaved Transformers

Ting-Li Chen

Lala Li

326

22 May 2023

EE-TTS: Emphatic Expressive TTS with Linguistic InformationInterspeech (Interspeech), 2023

152

20 May 2023

Reducing Sequence Length by Predicting Edit Operations with Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Masahiro Kaneko

Naoaki Okazaki

242

19 May 2023

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

329

19 May 2023

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding TasksInterspeech (Interspeech), 2023

Kwangyoun Kim

236

18 May 2023

Deep Multiple Instance Learning with Distance-Aware Self-Attention

Georg Wolflein

Lucie Charlotte Magister

Pietro Lio

David J. Harrison

Ognjen Arandjelovic

174

17 May 2023

CageViT: Convolutional Activation Guided Efficient Vision Transformer

Jingkuan Song

154

17 May 2023

Mimetic Initialization of Self-Attention LayersInternational Conference on Machine Learning (ICML), 2023

Asher Trockman

J. Zico Kolter

252

16 May 2023

Machine-Made Media: Monitoring the Mobilization of Machine-Generated Articles on Misinformation and Mainstream News WebsitesInternational Conference on Web and Social Media (ICWSM), 2023

Hans W. A. Hanley

Zakir Durumeric

DeLMO

332

16 May 2023

Tailoring Instructions to Student's Learning Levels Boosts Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

350

16 May 2023

Exploring In-Context Learning Capabilities of Foundation Models for Generating Knowledge Graphs from Text

H. Khorashadizadeh

Nandana Mihindukulasooriya

Sanju Tiwari

Jinghua Groppe

Sven Groppe

156

15 May 2023

Text Classification via Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Jiwei Li

245

227

15 May 2023

MEGABYTE: Predicting Million-byte Sequences with Multiscale TransformersNeural Information Processing Systems (NeurIPS), 2023

Luke Zettlemoyer

301

140

12 May 2023

Salient Mask-Guided Vision Transformer for Fine-Grained ClassificationVISIGRAPP (VISIGRAPP), 2023

Hisham Cholakkal

235

11 May 2023

A General-Purpose Multilingual Document Encoder

Onur Galoglu

Robert Litschko

Goran Glavaš

214

11 May 2023

ORKG-Leaderboards: A Systematic Workflow for Mining Leaderboards as a Knowledge GraphInternational Journal on Digital Libraries (IJDL), 2023

Salomon Kabongo KABENAMUALU

Jennifer D'Souza

Sören Auer

309

10 May 2023

VTPNet for 3D deep learning on point cloud

165

10 May 2023

Learning to Parallelize with OpenMP by Augmented Heterogeneous AST RepresentationConference on Machine Learning and Systems (MLSys), 2023

Le Chen

Quazi Ishtiaque Mahmud

Hung Phan

Nesreen Ahmed

Ali Jannesari

170

09 May 2023

Effects of sub-word segmentation on performance of transformer language modelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

346

09 May 2023

ComputeGPT: A computational chat model for numerical problems

Ryan H. Lewis

Junfeng Jiao

112

08 May 2023

Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel ProteinsJournal of Applied Physics (JAP), 2023

Markus J. Buehler

179

07 May 2023

Leveraging Synthetic Targets for Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Sarthak Mittal

Oleksii Hrinchuk

Oleksii Kuchaiev

147

07 May 2023

Adapting Transformer Language Models for Predictive Typing in Brain-Computer Interfaces

Shijia Liu

David A. Smith

05 May 2023