Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GLU Variants Improve Transformer"
50 / 647 papers shown
Title
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo-Wen Zhang
Xiaolin Wei
Chunhua Shen
MLLM
31
33
0
28 Dec 2023
PanGu-
π
π
π
: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
59
15
0
27 Dec 2023
YAYI 2: Multilingual Open-Source Large Language Models
Yin Luo
Qingchao Kong
Nan Xu
Jia Cao
Bao Hao
...
Zhaoxin Yu
Zhengda Luo
Wenji Mao
Lei Wang
Dajun Zeng
ALM
OSLM
43
7
0
22 Dec 2023
Paloma: A Benchmark for Evaluating Language Model Fit
Ian H. Magnusson
Akshita Bhagia
Valentin Hofmann
Luca Soldaini
A. Jha
...
Iz Beltagy
Hanna Hajishirzi
Noah A. Smith
Kyle Richardson
Jesse Dodge
132
21
0
16 Dec 2023
PETDet: Proposal Enhancement for Two-Stage Fine-Grained Object Detection
Wentao Li
Danpei Zhao
Bo Yuan
Yue Gao
Z. Shi
ObjD
35
15
0
16 Dec 2023
SLS4D: Sparse Latent Space for 4D Novel View Synthesis
Qiyuan Feng
Hao-Xiang Chen
Qun-Ce Xu
Tai-Jiang Mu
29
1
0
15 Dec 2023
TigerBot: An Open Multilingual Multitask LLM
Ye Chen
Wei Cai
Liangming Wu
Xiaowei Li
Zhanxuan Xin
Cong Fu
82
11
0
14 Dec 2023
Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models
Alexandre Variengien
Eric Winsor
LRM
ReLM
74
10
0
13 Dec 2023
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
39
62
0
11 Dec 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang
Bailin Wang
Yikang Shen
Rameswar Panda
Yoon Kim
40
140
0
11 Dec 2023
FaultFormer: Pretraining Transformers for Adaptable Bearing Fault Classification
Anthony Y. Zhou
Amir Barati Farimani
AI4CE
11
6
0
04 Dec 2023
MABViT -- Modified Attention Block Enhances Vision Transformers
Mahesh Ramesh
Aswinkumar Ramkumar
6
3
0
03 Dec 2023
HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers
Maciej Besta
Afonso Claudino Catarino
Lukas Gianinazzi
Nils Blach
Piotr Nyczyk
H. Niewiadomski
Torsten Hoefler
30
6
0
30 Nov 2023
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Dai Shi
ViT
13
74
0
28 Nov 2023
MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
Yang Zhao
Yanwu Xu
Zhisheng Xiao
Haolin Jia
Tingbo Hou
VLM
39
11
0
28 Nov 2023
Who is leading in AI? An analysis of industry AI research
Ben Cottier
T. Besiroglu
David Owen
28
7
0
24 Nov 2023
LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo Molecular Design
Niklas Dobberstein
Astrid Maass
J. Hamaekers
23
5
0
24 Nov 2023
AcademicGPT: Empowering Academic Research
Shufa Wei
Xiaolong Xu
Xianbiao Qi
Xi Yin
Jun Xia
...
Chihao Dai
Lihua Wang
Xiaohui Liu
Lei Zhang
Yutao Xie
LM&MA
39
3
0
21 Nov 2023
Secure Transformer Inference Protocol
Mu Yuan
Lan Zhang
Xiang-Yang Li
30
3
0
14 Nov 2023
Speech-based Slot Filling using Large Language Models
Guangzhi Sun
Shutong Feng
Dongcheng Jiang
Chao Zhang
Milica Gasic
P. Woodland
21
1
0
13 Nov 2023
Enhancing Actuarial Non-Life Pricing Models via Transformers
Alexej Brauer
16
3
0
10 Nov 2023
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems
Huan Gui
Ruoxi Wang
Ke Yin
Long Jin
Maciej Kula
Taibai Xu
Lichan Hong
Ed H. Chi
38
2
0
10 Nov 2023
Efficient Parallelization Layouts for Large-Scale Distributed Model Training
Johannes Hagemann
Samuel Weinbach
Konstantin Dobler
Maximilian Schall
Gerard de Melo
LRM
34
6
0
09 Nov 2023
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Zhen Qin
Songlin Yang
Yiran Zhong
36
74
0
08 Nov 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
116
375
0
07 Nov 2023
Multilingual Mathematical Autoformalization
Albert Q. Jiang
Wenda Li
M. Jamnik
AI4CE
21
19
0
07 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
17
445
0
06 Nov 2023
Global Transformer Architecture for Indoor Room Temperature Forecasting
Alfredo V. Clemente
A. Nocente
Massimiliano Ruocco
AI4CE
11
1
0
31 Oct 2023
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
Michael Gunther
Jackmin Ong
Isabelle Mohr
Alaeddine Abdessalem
Tanguy Abel
...
Saba Sturua
Bo Wang
Maximilian Werk
Nan Wang
Han Xiao
RALM
27
58
0
30 Oct 2023
Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings
David Samuel
11
2
0
30 Oct 2023
Skywork: A More Open Bilingual Foundation Model
Tianwen Wei
Liang Zhao
Lichang Zhang
Bo Zhu
Lijie Wang
...
Yongyi Peng
Xiaojuan Liang
Shuicheng Yan
Han Fang
Yahui Zhou
27
92
0
30 Oct 2023
MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Yizhe Yang
Huashan Sun
Jiawei Li
Runheng Liu
Yinghao Li
Yuhang Liu
Heyan Huang
Yang Gao
ALM
LRM
8
8
0
24 Oct 2023
How Much Context Does My Attention-Based ASR System Need?
Robert Flynn
Anton Ragni
30
1
0
24 Oct 2023
FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering
Md. Rafi Ur Rashid
Vishnu Asutosh Dasu
Kang Gu
Najrin Sultana
Shagufta Mehnaz
AAML
FedML
44
10
0
24 Oct 2023
Unlocking the Transferability of Tokens in Deep Models for Tabular Data
Qi-Le Zhou
Han-Jia Ye
Le-Ye Wang
De-Chuan Zhan
29
7
0
23 Oct 2023
Functional Invariants to Watermark Large Transformers
Pierre Fernandez
Guillaume Couairon
Teddy Furon
Matthijs Douze
14
8
0
17 Oct 2023
ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation
Jaap Jumelet
Michael Hanna
Marianne de Heer Kloots
Anna Langedijk
Charlotte Pouw
Oskar van der Wal
21
3
0
17 Oct 2023
Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning Ability
Ivan Lee
Nan Jiang
Taylor Berg-Kirkpatrick
32
12
0
12 Oct 2023
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Mengzhou Xia
Tianyu Gao
Zhiyuan Zeng
Danqi Chen
24
262
0
10 Oct 2023
Learning to Decode the Surface Code with a Recurrent, Transformer-Based Neural Network
Johannes Bausch
Andrew W. Senior
Francisco J. H. Heras
Thomas Edlich
Alex Davies
...
C. Gidney
Demis Hassabis
Sergio Boixo
Hartmut Neven
Pushmeet Kohli
14
32
0
09 Oct 2023
A Meta-Learning Perspective on Transformers for Causal Language Modeling
Xinbo Wu
L. Varshney
23
6
0
09 Oct 2023
Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think -- Introducing AI Detectability Index
Megha Chakraborty
S.M. Towhidul Islam Tonmoy
S. M. Mehedi
Krish Sharma
Niyar R. Barman
...
Tanay Kumar
Vinija Jain
Aman Chadha
Amit P. Sheth
Amitava Das
DeLMO
12
21
0
08 Oct 2023
The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations
Vipula Rawte
Swagata Chakraborty
Agnibh Pathak
Anubhav Sarkar
S.M. Towhidul Islam Tonmoy
Aman Chadha
Mikel Artetxe
Punit Daniel Simig
HILM
32
116
0
08 Oct 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
118
60
0
06 Oct 2023
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion
Filip Szatkowski
Eric Elmoznino
Younesse Kaddar
Simone Scardapane
MoE
30
5
0
06 Oct 2023
Predicting Emergent Abilities with Infinite Resolution Evaluation
Shengding Hu
Xin Liu
Xu Han
Xinrong Zhang
Chaoqun He
...
Ning Ding
Zebin Ou
Guoyang Zeng
Zhiyuan Liu
Maosong Sun
ELM
LRM
23
13
0
05 Oct 2023
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
Praneeth Kacham
Vahab Mirrokni
Peilin Zhong
31
7
0
02 Oct 2023
Multilingual Natural Language Processing Model for Radiology Reports -- The Summary is all you need!
Mariana Lindo
Ana Sofia Santos
André Ferreira
Jianning Li
Gijs Luijten
...
Cornelius Deuschl
Johannes Haubold
Jens Kleesiek
Jan Egger
Victor Alves
LM&MA
14
2
0
29 Sep 2023
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
29
1,568
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
24
15
0
28 Sep 2023
Previous
1
2
3
...
10
11
12
13
9
Next