v1v2v3v4 (latest)

Investigating Recurrent Transformers with Dynamic Halt

1 February 2024

Jishnu Ray Chowdhury

Cornelia Caragea

ArXiv (abs)PDF HTML

Papers citing "Investigating Recurrent Transformers with Dynamic Halt"

37 / 87 papers shown

Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström MethodNeural Information Processing Systems (NeurIPS), 2021

Yifan Chen

Qi Zeng

Heng Ji

Yun Yang

205

29 Oct 2021

The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

409

14 Oct 2021

Dynamic Inference with Neural Interpreters

Nasim Rahaman

Muhammad Waleed Gondal

Francesco Locatello

300

12 Oct 2021

Saturated Transformers are Constant-Depth Threshold CircuitsTransactions of the Association for Computational Linguistics (TACL), 2021

William Merrill

Ashish Sabharwal

Noah A. Smith

491

136

30 Jun 2021

Modeling Hierarchical Structures with Continuous Recursive Neural NetworksInternational Conference on Machine Learning (ICML), 2021

Jishnu Ray Chowdhury

Cornelia Caragea

153

10 Jun 2021

Staircase Attention for Recurrent Processing of SequencesNeural Information Processing Systems (NeurIPS), 2021

Da Ju

Stephen Roller

Sainbayar Sukhbaatar

Jason Weston

143

08 Jun 2021

Consistent Accelerated Inference via Confident Adaptive TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

455

18 Apr 2021

Transformer in TransformerNeural Information Processing Systems (NeurIPS), 2021

1.0K

2,004

27 Feb 2021

Dynamic Neural Networks: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

Gao Huang

Yulin Wang

415

798

09 Feb 2021

Long Range Arena: A Benchmark for Efficient Transformers

383

830

08 Nov 2020

Memformer: A Memory-Augmented Transformer for Sequence Modeling

Kun Qian

218

14 Oct 2020

Neurocoder: Learning General-Purpose Computation Using Stored Neural Programs

Hung Le

Svetha Venkatesh

NAI

100

24 Sep 2020

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Angelos Katharopoulos

Apoorv Vyas

Nikolaos Pappas

Franccois Fleuret

719

2,305

29 Jun 2020

BERT Loses Patience: Fast and Robust Inference with Early Exit

376

399

07 Jun 2020

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

216

200

11 May 2020

Highway Transformer: Self-Gating Enhanced Self-Attentive NetworksAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Yekun Chai

Jin Shuo

Xinwen Hou

297

17 Apr 2020

DynaBERT: Dynamic BERT with Adaptive Width and DepthNeural Information Processing Systems (NeurIPS), 2020

Lu Hou

Zhiqi Huang

Lifeng Shang

Xin Jiang

Xiao Chen

Qun Liu

254

361

08 Apr 2020

GLU Variants Improve Transformer

Noam M. Shazeer

561

1,463

12 Feb 2020

Compressive Transformers for Long-Range Sequence ModellingInternational Conference on Learning Representations (ICLR), 2019

Jack W. Rae

Anna Potapenko

Siddhant M. Jayakumar

Timothy Lillicrap

RALM VLM KELM

289

770

13 Nov 2019

Ordered MemoryNeural Information Processing Systems (NeurIPS), 2019

Aaron Courville

174

29 Oct 2019

Depth-Adaptive TransformerInternational Conference on Learning Representations (ICLR), 2019

397

235

22 Oct 2019

ALBERT: A Lite BERT for Self-supervised Learning of Language RepresentationsInternational Conference on Learning Representations (ICLR), 2019

1.1K

7,127

26 Sep 2019

Deep Equilibrium ModelsNeural Information Processing Systems (NeurIPS), 2019

Shaojie Bai

J. Zico Kolter

V. Koltun

220

768

03 Sep 2019

Cooperative Learning of Disjoint Syntax and Semantics

Serhii Havrylov

Germán Kruszewski

Armand Joulin

209

25 Feb 2019

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

738

4,111

09 Jan 2019

Universal Transformers

515

828

10 Jul 2018

ListOps: A Diagnostic Dataset for Latent Tree Learning

Nikita Nangia

Samuel R. Bowman

248

151

17 Apr 2018

The Importance of Being Recurrent for Modeling Hierarchical Structure

Ke M. Tran

Arianna Bisazza

Christof Monz

220

153

09 Mar 2018

Parallelizing Linear Recurrent Neural Nets Over Sequence Length

Eric Martin

Chris Cundy

274

149

12 Sep 2017

Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017

4.0K

161,538

12 Jun 2017

Language Modeling with Gated Convolutional NetworksInternational Conference on Machine Learning (ICML), 2016

Yann N. Dauphin

Angela Fan

Michael Auli

David Grangier

607

2,690

23 Dec 2016

Layer Normalization

Jimmy Lei Ba

J. Kiros

Geoffrey E. Hinton

671

11,761

21 Jul 2016

Adaptive Computation Time for Recurrent Neural Networks

Alex Graves

846

621

29 Mar 2016

Neural GPUs Learn Algorithms

Lukasz Kaiser

Ilya Sutskever

253

384

25 Nov 2015

Tree-structured composition in neural networks without tree-structured architectures

Samuel R. Bowman

Christopher D. Manning

Christopher Potts

254

16 Jun 2015

Neural Turing Machines

Alex Graves

Greg Wayne

Ivo Danihelka

326

2,451

20 Oct 2014

Self-Delimiting Neural Networks

Jürgen Schmidhuber

201

29 Sep 2012