ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.06981
  4. Cited By
Thinking Like Transformers

Thinking Like Transformers

13 June 2021
Gail Weiss
Yoav Goldberg
Eran Yahav
    AI4CE
ArXivPDFHTML

Papers citing "Thinking Like Transformers"

9 / 109 papers shown
Title
Neural Networks and the Chomsky Hierarchy
Neural Networks and the Chomsky Hierarchy
Grégoire Delétang
Anian Ruoss
Jordi Grau-Moya
Tim Genewein
L. Wenliang
...
Chris Cundy
Marcus Hutter
Shane Legg
Joel Veness
Pedro A. Ortega
UQCV
96
129
0
05 Jul 2022
FloorGenT: Generative Vector Graphic Model of Floor Plans for Robotics
FloorGenT: Generative Vector Graphic Model of Floor Plans for Robotics
Ludvig Ericson
Patric Jensfelt
3DV
14
2
0
07 Mar 2022
Overcoming a Theoretical Limitation of Self-Attention
Overcoming a Theoretical Limitation of Self-Attention
David Chiang
Peter A. Cholak
25
76
0
24 Feb 2022
The Neural Data Router: Adaptive Control Flow in Transformers Improves
  Systematic Generalization
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
AI4CE
25
55
0
14 Oct 2021
Saturated Transformers are Constant-Depth Threshold Circuits
Saturated Transformers are Constant-Depth Threshold Circuits
William Merrill
Ashish Sabharwal
Noah A. Smith
12
96
0
30 Jun 2021
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
254
2,012
0
28 Jul 2020
Recognizing Long Grammatical Sequences Using Recurrent Networks
  Augmented With An External Differentiable Stack
Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack
A. Mali
Alexander Ororbia
Daniel Kifer
C. Lee Giles
6
13
0
04 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
238
579
0
12 Mar 2020
Effective Approaches to Attention-based Neural Machine Translation
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
216
7,923
0
17 Aug 2015
Previous
123