v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown

EpiK-Eval: Evaluation for Language Models as Epistemic ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Gabriele Prato

Jerry Huang

Prasannna Parthasarathi

Shagun Sodhani

Sarath Chandar

ELM

248

23 Oct 2023

Meta learning with language models: Challenges and opportunities in the classification of imbalanced text

Apostol T. Vassilev

Honglan Jin

Munawar Hasan

264

23 Oct 2023

Retrieval-Augmented Chain-of-Thought in Semi-structured Domains

279

22 Oct 2023

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual RepresentationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

389

22 Oct 2023

Enhanced Low-Dimensional Sensing Mapless Navigation of Terrestrial Mobile Robots Using Double Deep Reinforcement Learning Techniques

Linda Dotto de Moraes

160

20 Oct 2023

Multi-level Contrastive Learning for Script-based Character Understanding

277

20 Oct 2023

Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding

Luc Van Gool

301

19 Oct 2023

A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models

Yi Zhou

Jose Camacho-Collados

Danushka Bollegala

436

19 Oct 2023

The Locality and Symmetry of Positional Encodings

Lihu Chen

Gaël Varoquaux

Fabian M. Suchanek

185

19 Oct 2023

Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every LayerConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Sheng Zha

278

19 Oct 2023

From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers

Shaoxiong Duan

Yining Shi

Wei Xu

281

18 Oct 2023

Long-form Simultaneous Speech Translation: Thesis ProposalInternational Joint Conference on Natural Language Processing (IJCNLP), 2023

Peter Polák

3DV

208

17 Oct 2023

Heterogenous Memory Augmented Neural Networks

Shanghang Zhang

205

17 Oct 2023

Approximating Two-Layer Feedforward Networks for Efficient TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

420

16 Oct 2023

A Survey on Video Diffusion ModelsACM Computing Surveys (ACM Comput. Surv.), 2023

Zuxuan Wu

457

220

16 Oct 2023

Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data PerspectiveNeural Information Processing Systems (NeurIPS), 2023

Taro Watanabe

206

16 Oct 2023

Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation

Shanshan Li

184

16 Oct 2023

Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels

144

16 Oct 2023

CoCoFormer: A controllable feature-rich polyphonic music generation method

235

15 Oct 2023

STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning

239

14 Oct 2023

From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

409

13 Oct 2023

MemGPT: Towards LLMs as Operating Systems

1.7K

333

12 Oct 2023

Cross-Episodic Curriculum for Transformer AgentsNeural Information Processing Systems (NeurIPS), 2023

Linxi "Jim" Fan

167

12 Oct 2023

GROOT: Learning to Follow Instructions by Watching Gameplay VideosInternational Conference on Learning Representations (ICLR), 2023

Xiaojian Ma

322

12 Oct 2023

DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech TranslationNeural Information Processing Systems (NeurIPS), 2023

Qingkai Fang

Yan Zhou

Yangzhou Feng

210

11 Oct 2023

Argumentative Stance Prediction: An Exploratory Study on Multimodality and Few-Shot LearningWorkshop on Argument Mining (ArgMining), 2023

Arushi Sharma

Abhibha Gupta

Maneesh Bilalpur

182

11 Oct 2023

Humans and language models diverge when predicting repeating textConference on Computational Natural Language Learning (CoNLL), 2023

Aditya R. Vaidya

Javier S. Turek

Alexander G. Huth

247

10 Oct 2023

Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading

Jason Weston

334

116

08 Oct 2023

Uncovering hidden geometry in Transformers via disentangling position and context

Jiajun Song

Yiqiao Zhong

248

07 Oct 2023

Higher-Order DeepTrails: Unified Approach to *TrailsLernen, Wissen, Daten, Analysen (LWA), 2023

06 Oct 2023

Investigating Alternative Feature Extraction Pipelines For Clinical Note Phenotyping

Daniel Neil

121

05 Oct 2023

Neural architecture impact on identifying temporally extended Reinforcement Learning tasks

Victor Vadakechirayath George

OffRL

157

04 Oct 2023

Retrieval meets Long Context Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

458

112

04 Oct 2023

Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory ArchitectureInternational Conference on Machine Learning (ICML), 2023

Sangjun Park

Jinyeong Bak

CLL

288

04 Oct 2023

ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer LayersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Yiming Wang

Jinyu Li

207

03 Oct 2023

Dodo: Dynamic Contextual Compression for Decoder-only LMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

198

03 Oct 2023

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic AlignmentInternational Conference on Learning Representations (ICLR), 2023

Bin Lin

...

Wei Liu

758

340

03 Oct 2023

The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers under Fully Homomorphic Encryption on the Torus

Rickard Brannvall

Andrei Stoian

170

03 Oct 2023

A Framework for Inference Inspired by Human Memory MechanismsInternational Conference on Learning Representations (ICLR), 2023

192

01 Oct 2023

GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length

Chia-Yuan Chang

157

01 Oct 2023

Self-Supervised Open-Ended Classification with Small Visual Language Models

Mohammad Mahdi Derakhshani

416

30 Sep 2023

Contextual Biasing with the Knuth-Morris-Pratt Matching AlgorithmInterspeech (Interspeech), 2023

Weiran Wang

Zelin Wu

D. Caseiro

Tsendsuren Munkhdalai

...

Ding Zhao

245

29 Sep 2023

Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents

Marco Pleines

Matthias Pallasch

Frank Zimmer

Mike Preuss

OffRL

320

29 Sep 2023

LatticeGen: A Cooperative Framework which Hides Generated Text in a Lattice for Privacy-Aware Generation on Cloud

Mengke Zhang

Tianxing He

Tianle Wang

Lu Mi

Fatemehsadat Mireshghallah

Binyi Chen

Hao Wang

Yulia Tsvetkov

225

29 Sep 2023

PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers

Yuxuan Liu

Zecheng Zhang

Hayden Schaeffer

211

28 Sep 2023

Training a Large Video Model on a Single Machine in a Day

Yue Zhao

Philipp Krahenbuhl

VLM

273

28 Sep 2023

Unsupervised Pretraining for Fact Verification by Language Model DistillationInternational Conference on Learning Representations (ICLR), 2023

351

28 Sep 2023

Transformer-VQ: Linear-Time Transformers via Vector QuantizationInternational Conference on Learning Representations (ICLR), 2023

Albert Mohwald

250

28 Sep 2023

At Which Training Stage Does Code Data Help LLMs Reasoning?International Conference on Learning Representations (ICLR), 2023

Yue Liu

Shanshan Li

363

28 Sep 2023

Attention Sorting Combats Recency Bias In Long Context Language Models

A. Peysakhovich

Adam Lerer

LRM RALM

323

28 Sep 2023