Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.07096
Cited By
Sparse Universal Transformer
11 October 2023
Shawn Tan
Yikang Shen
Zhenfang Chen
Aaron Courville
Chuang Gan
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sparse Universal Transformer"
5 / 5 papers shown
Title
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
34
1
0
01 Feb 2024
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
93
42
0
11 Oct 2022
Compositional Semantic Parsing with Large Language Models
Andrew Drozdov
Nathanael Scharli
Ekin Akyuurek
Nathan Scales
Xinying Song
Xinyun Chen
Olivier Bousquet
Denny Zhou
ReLM
LRM
187
91
0
29 Sep 2022
Neural Networks and the Chomsky Hierarchy
Grégoire Delétang
Anian Ruoss
Jordi Grau-Moya
Tim Genewein
L. Wenliang
...
Chris Cundy
Marcus Hutter
Shane Legg
Joel Veness
Pedro A. Ortega
UQCV
94
129
0
05 Jul 2022
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
4,424
0
23 Jan 2020
1