ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.00976
  4. Cited By
Investigating Recurrent Transformers with Dynamic Halt
v1v2v3v4 (latest)

Investigating Recurrent Transformers with Dynamic Halt

1 February 2024
Jishnu Ray Chowdhury
Cornelia Caragea
ArXiv (abs)PDFHTML

Papers citing "Investigating Recurrent Transformers with Dynamic Halt"

37 / 87 papers shown
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström
  Method
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström MethodNeural Information Processing Systems (NeurIPS), 2021
Yifan Chen
Qi Zeng
Heng Ji
Yun Yang
205
63
0
29 Oct 2021
The Neural Data Router: Adaptive Control Flow in Transformers Improves
  Systematic Generalization
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
AI4CE
409
64
0
14 Oct 2021
Dynamic Inference with Neural Interpreters
Dynamic Inference with Neural Interpreters
Nasim Rahaman
Muhammad Waleed Gondal
S. Joshi
Peter V. Gehler
Yoshua Bengio
Francesco Locatello
Bernhard Schölkopf
300
31
0
12 Oct 2021
Saturated Transformers are Constant-Depth Threshold Circuits
Saturated Transformers are Constant-Depth Threshold CircuitsTransactions of the Association for Computational Linguistics (TACL), 2021
William Merrill
Ashish Sabharwal
Noah A. Smith
491
136
0
30 Jun 2021
Modeling Hierarchical Structures with Continuous Recursive Neural
  Networks
Modeling Hierarchical Structures with Continuous Recursive Neural NetworksInternational Conference on Machine Learning (ICML), 2021
Jishnu Ray Chowdhury
Cornelia Caragea
153
18
0
10 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Staircase Attention for Recurrent Processing of SequencesNeural Information Processing Systems (NeurIPS), 2021
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
143
13
0
08 Jun 2021
Consistent Accelerated Inference via Confident Adaptive Transformers
Consistent Accelerated Inference via Confident Adaptive TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Tal Schuster
Adam Fisch
Tommi Jaakkola
Regina Barzilay
AI4TS
455
77
0
18 Apr 2021
Transformer in Transformer
Transformer in TransformerNeural Information Processing Systems (NeurIPS), 2021
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
1.0K
2,004
0
27 Feb 2021
Dynamic Neural Networks: A Survey
Dynamic Neural Networks: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Yizeng Han
Gao Huang
Shiji Song
Le Yang
Honghui Wang
Yulin Wang
3DHAI4TSAI4CE
415
798
0
09 Feb 2021
Long Range Arena: A Benchmark for Efficient Transformers
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
383
830
0
08 Nov 2020
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Qingyang Wu
Zhenzhong Lan
Kun Qian
Jing Gu
A. Geramifard
Zhou Yu
218
67
0
14 Oct 2020
Neurocoder: Learning General-Purpose Computation Using Stored Neural
  Programs
Neurocoder: Learning General-Purpose Computation Using Stored Neural Programs
Hung Le
Svetha Venkatesh
NAI
100
5
0
24 Sep 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear
  Attention
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
719
2,305
0
29 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
376
399
0
07 Jun 2020
MART: Memory-Augmented Recurrent Transformer for Coherent Video
  Paragraph Captioning
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Jie Lei
Liwei Wang
Yelong Shen
Dong Yu
Tamara L. Berg
Joey Tianyi Zhou
216
200
0
11 May 2020
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Highway Transformer: Self-Gating Enhanced Self-Attentive NetworksAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Yekun Chai
Jin Shuo
Xinwen Hou
297
22
0
17 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and DepthNeural Information Processing Systems (NeurIPS), 2020
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
254
361
0
08 Apr 2020
GLU Variants Improve Transformer
GLU Variants Improve Transformer
Noam M. Shazeer
561
1,463
0
12 Feb 2020
Compressive Transformers for Long-Range Sequence Modelling
Compressive Transformers for Long-Range Sequence ModellingInternational Conference on Learning Representations (ICLR), 2019
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALMVLMKELM
289
770
0
13 Nov 2019
Ordered Memory
Ordered MemoryNeural Information Processing Systems (NeurIPS), 2019
Songlin Yang
Shawn Tan
Seyedarian Hosseini
Zhouhan Lin
Alessandro Sordoni
Aaron Courville
174
24
0
29 Oct 2019
Depth-Adaptive Transformer
Depth-Adaptive TransformerInternational Conference on Learning Representations (ICLR), 2019
Maha Elbayad
Jiatao Gu
Edouard Grave
Michael Auli
397
235
0
22 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language RepresentationsInternational Conference on Learning Representations (ICLR), 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSLAIMat
1.1K
7,127
0
26 Sep 2019
Deep Equilibrium Models
Deep Equilibrium ModelsNeural Information Processing Systems (NeurIPS), 2019
Shaojie Bai
J. Zico Kolter
V. Koltun
220
768
0
03 Sep 2019
Cooperative Learning of Disjoint Syntax and Semantics
Cooperative Learning of Disjoint Syntax and Semantics
Serhii Havrylov
Germán Kruszewski
Armand Joulin
209
48
0
25 Feb 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
738
4,111
0
09 Jan 2019
Universal Transformers
Universal Transformers
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
515
828
0
10 Jul 2018
ListOps: A Diagnostic Dataset for Latent Tree Learning
ListOps: A Diagnostic Dataset for Latent Tree Learning
Nikita Nangia
Samuel R. Bowman
248
151
0
17 Apr 2018
The Importance of Being Recurrent for Modeling Hierarchical Structure
The Importance of Being Recurrent for Modeling Hierarchical Structure
Ke M. Tran
Arianna Bisazza
Christof Monz
220
153
0
09 Mar 2018
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Eric Martin
Chris Cundy
274
149
0
12 Sep 2017
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
4.0K
161,538
0
12 Jun 2017
Language Modeling with Gated Convolutional Networks
Language Modeling with Gated Convolutional NetworksInternational Conference on Machine Learning (ICML), 2016
Yann N. Dauphin
Angela Fan
Michael Auli
David Grangier
607
2,690
0
23 Dec 2016
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
671
11,761
0
21 Jul 2016
Adaptive Computation Time for Recurrent Neural Networks
Adaptive Computation Time for Recurrent Neural Networks
Alex Graves
846
621
0
29 Mar 2016
Neural GPUs Learn Algorithms
Neural GPUs Learn Algorithms
Lukasz Kaiser
Ilya Sutskever
253
384
0
25 Nov 2015
Tree-structured composition in neural networks without tree-structured
  architectures
Tree-structured composition in neural networks without tree-structured architectures
Samuel R. Bowman
Christopher D. Manning
Christopher Potts
254
76
0
16 Jun 2015
Neural Turing Machines
Neural Turing Machines
Alex Graves
Greg Wayne
Ivo Danihelka
326
2,451
0
20 Oct 2014
Self-Delimiting Neural Networks
Self-Delimiting Neural Networks
Jürgen Schmidhuber
201
38
0
29 Sep 2012
Previous
12