ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 901 papers shown
Title
Attention as a Hypernetwork
Attention as a HypernetworkInternational Conference on Learning Representations (ICLR), 2024
Simon Schug
Seijin Kobayashi
Yassir Akram
João Sacramento
Razvan Pascanu
GNN
236
9
0
09 Jun 2024
Accelerating evolutionary exploration through language model-based transfer learning
Accelerating evolutionary exploration through language model-based transfer learning
M. Reissmann
Yuan Fang
Andrew S. H. Ooi
R. D. Sandberg
298
2
0
07 Jun 2024
Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI
  Synthesis
Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis
Juanhua Zhang
Ruodan Yan
Alessandro Perelli
Xi Chen
Chao Li
MedImDiffM
310
11
0
05 Jun 2024
Xmodel-LM Technical Report
Xmodel-LM Technical Report
Yichuan Wang
Yang Liu
Yu Yan
Qun Wang
Xucheng Huang
Ling Jiang
OSLMALM
234
1
0
05 Jun 2024
Scalable MatMul-free Language Modeling
Scalable MatMul-free Language Modeling
Rui-Jie Zhu
Yu Zhang
Ethan Sifferman
Tyler Sheaves
Yiqiao Wang
Dustin Richmond
P. Zhou
Nhan Duy Truong
493
31
0
04 Jun 2024
Decoupled Alignment for Robust Plug-and-Play Adaptation
Decoupled Alignment for Robust Plug-and-Play Adaptation
Haozheng Luo
Jiahao Yu
Wenxin Zhang
Jialong Li
Jerry Yao-Chieh Hu
Xingyu Xing
Han Liu
323
11
0
03 Jun 2024
LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments
LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments
Zikun Ye
Hema Yoganarasimhan
Yufeng Zheng
169
15
0
03 Jun 2024
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts
  Language Models
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
Tianwen Wei
Bo Zhu
Liang Zhao
Cheng Cheng
Biye Li
...
Yutuan Ma
Rui Hu
Shuicheng Yan
Han Fang
Yahui Zhou
MoE
277
50
0
03 Jun 2024
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
Huadai Liu
Rongjie Huang
Yang Liu
Hengyuan Cao
Jialei Wang
Xize Cheng
Siqi Zheng
Zhou Zhao
325
17
0
01 Jun 2024
You Only Scan Once: Efficient Multi-dimension Sequential Modeling with
  LightNet
You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
Zhen Qin
Yuxin Mao
Xuyang Shen
Dong Li
Jing Zhang
Yuchao Dai
Yiran Zhong
172
8
0
31 May 2024
Improving Generalization and Convergence by Enhancing Implicit
  Regularization
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Feiyu Xiong
Zhiyu Li
E. Weinan
Lei Wu
221
11
0
31 May 2024
TAIA: Large Language Models are Out-of-Distribution Data Learners
TAIA: Large Language Models are Out-of-Distribution Data Learners
Shuyang Jiang
Yusheng Liao
Ya Zhang
Yu Wang
Yanfeng Wang
187
7
0
30 May 2024
Would I Lie To You? Inference Time Alignment of Language Models using
  Direct Preference Heads
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads
Avelina Asada Hadji-Kyriacou
Ognjen Arandjelović
129
3
0
30 May 2024
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model
  Series
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Ge Zhang
Scott Qu
Jiaheng Liu
Chenchen Zhang
Chenghua Lin
...
Zi-Kai Zhao
Jiajun Zhang
Wanli Ouyang
Wenhao Huang
Lei Ma
ELM
290
69
0
29 May 2024
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron
  Pruning
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
Ruchika Chavhan
Da Li
Timothy M. Hospedales
225
28
0
29 May 2024
Transformers as Neural Operators for Solutions of Differential Equations
  with Finite Regularity
Transformers as Neural Operators for Solutions of Differential Equations with Finite Regularity
Benjamin Shih
Ahmad Peyvan
Zhongqiang Zhang
George Karniadakis
AI4CE
186
33
0
29 May 2024
Enhancing Vision-Language Model with Unmasked Token Alignment
Enhancing Vision-Language Model with Unmasked Token Alignment
Jihao Liu
Jinliang Zheng
Boxiao Liu
Yu Liu
Jiaming Song
CLIP
178
0
0
29 May 2024
Understanding Transformer Reasoning Capabilities via Graph Algorithms
Understanding Transformer Reasoning Capabilities via Graph Algorithms
Clayton Sanford
Bahare Fatemi
Ethan Hall
Anton Tsitsulin
Seyed Mehran Kazemi
Jonathan J. Halcrow
Bryan Perozzi
Vahab Mirrokni
260
64
0
28 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear
  Attention
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
279
8
0
28 May 2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training
  Durations
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
Elie Bakouch
Atli Kosson
Loubna Ben Allal
Leandro von Werra
Martin Jaggi
388
92
0
28 May 2024
2BP: 2-Stage Backpropagation
2BP: 2-Stage Backpropagation
Christopher Rae
Joseph K. L. Lee
James Richings
MoEMQ
99
0
0
28 May 2024
Transformers Can Do Arithmetic with the Right Embeddings
Transformers Can Do Arithmetic with the Right Embeddings
Sean McLeish
Arpit Bansal
Alex Stein
Neel Jain
John Kirchenbauer
...
B. Kailkhura
A. Bhatele
Jonas Geiping
Avi Schwarzschild
Tom Goldstein
164
63
0
27 May 2024
Are Self-Attentions Effective for Time Series Forecasting?
Are Self-Attentions Effective for Time Series Forecasting?
Dongbin Kim
Jinseong Park
Jaewook Lee
Hoki Kim
AI4TS
176
20
0
27 May 2024
The Expressive Capacity of State Space Models: A Formal Language Perspective
The Expressive Capacity of State Space Models: A Formal Language Perspective
Yash Sarrof
Yana Veitsman
Michael Hahn
Mamba
415
24
0
27 May 2024
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa
John Lafferty
306
6
0
26 May 2024
Expanded Gating Ranges Improve Activation Functions
Expanded Gating Ranges Improve Activation Functions
Allen Hao Huang
AI4CE
191
2
0
25 May 2024
Sparse maximal update parameterization: A holistic approach to sparse
  training dynamics
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
Nolan Dey
Shane Bergsma
Joel Hestness
220
7
0
24 May 2024
iVideoGPT: Interactive VideoGPTs are Scalable World Models
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Jialong Wu
Shaofeng Yin
Ningya Feng
Xu He
Dong Li
Haifeng Zhang
Mingsheng Long
VGen
255
83
0
24 May 2024
Activator: GLU Activation Function as the Core Component of a Vision Transformer
Activator: GLU Activation Function as the Core Component of a Vision Transformer
Abdullah Nazhat Abdullah
Tarkan Aydin
ViT
260
0
0
24 May 2024
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng
Diego Doimo
Corentin Kervadec
Iuri Macocco
Jade Yu
Alessandro Laio
Marco Baroni
618
29
0
24 May 2024
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
Xianzhi Du
Tom Gunter
Xiang Kong
Mark Lee
Zirui Wang
Aonan Zhang
Nan Du
Ruoming Pang
MoE
124
5
0
23 May 2024
Aya 23: Open Weight Releases to Further Multilingual Progress
Aya 23: Open Weight Releases to Further Multilingual Progress
Viraat Aryabumi
John Dang
Dwarak Talupuru
Saurabh Dash
David Cairuz
...
Aidan Gomez
Phil Blunsom
Marzieh Fadaee
Ahmet Üstün
Sara Hooker
OSLM
424
119
0
23 May 2024
Neural Pfaffians: Solving Many Many-Electron Schrödinger Equations
Neural Pfaffians: Solving Many Many-Electron Schrödinger EquationsNeural Information Processing Systems (NeurIPS), 2024
Nicholas Gao
Stephan Günnemann
229
11
0
23 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based
  LLMs
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
195
20
0
23 May 2024
Super Tiny Language Models
Super Tiny Language Models
Dylan Hillier
Leon Guertler
Cheston Tan
Palaash Agrawal
Ruirui Chen
Bobby Cheng
263
8
0
23 May 2024
360Zhinao Technical Report
360Zhinao Technical Report
360Zhinao Team
205
0
0
22 May 2024
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Ting Jiang
Shaohan Huang
Shengyue Luo
Zihan Zhang
Haizhen Huang
...
Weiwei Deng
Feng Sun
Qi Zhang
Deqing Wang
Fuzhen Zhuang
195
32
0
20 May 2024
PRISM: A Multi-Modal Generative Foundation Model for Slide-Level
  Histopathology
PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology
Eugene Vorontsov
Adam Casson
Kristen Severson
Eric Zimmermann
Yi Kan Wang
...
Peter Hamilton
William A. Moye
Eugene Vorontsov
Siqi Liu
Thomas J. Fuchs
MedIm
261
63
0
16 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
492
606
0
16 May 2024
LoRA Learns Less and Forgets Less
LoRA Learns Less and Forgets Less
D. Biderman
Jose Javier Gonzalez Ortiz
Jacob P. Portes
Mansheej Paul
Philip Greengard
...
Sam Havens
Vitaliy Chiley
Jonathan Frankle
Cody Blakeney
John P. Cunningham
CLL
313
225
0
15 May 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
Improving Transformers with Dynamically Composable Multi-Head AttentionInternational Conference on Machine Learning (ICML), 2024
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
223
5
0
14 May 2024
CANAL -- Cyber Activity News Alerting Language Model: Empirical Approach
  vs. Expensive LLM
CANAL -- Cyber Activity News Alerting Language Model: Empirical Approach vs. Expensive LLM
Urjitkumar Patel
Fang-Chun Yeh
Chinmay Gondhalekar
144
8
0
10 May 2024
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage
  Pruning
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
Dan Qiao
Yi Su
Pinzheng Wang
Jing Ye
Wen Xie
...
Wenliang Chen
Guohong Fu
Guodong Zhou
Qiaoming Zhu
Min Zhang
MQ
206
1
0
09 May 2024
You Only Cache Once: Decoder-Decoder Architectures for Language Models
You Only Cache Once: Decoder-Decoder Architectures for Language ModelsNeural Information Processing Systems (NeurIPS), 2024
Yutao Sun
Li Dong
Yi Zhu
Shaohan Huang
Wenhui Wang
Shuming Ma
Quanlu Zhang
Jianyong Wang
Furu Wei
VLM
286
105
0
08 May 2024
EVA-X: A Foundation Model for General Chest X-ray Analysis with
  Self-supervised Learning
EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning
Jingfeng Yao
Xinggang Wang
Yuehao Song
Huangxuan Zhao
Jun Ma
Yajie Chen
Wenyu Liu
Bo Wang
ViT
162
16
0
08 May 2024
ChuXin: 1.6B Technical Report
ChuXin: 1.6B Technical Report
Xiaomin Zhuang
Yufan Jiang
Qiaozhi He
Zhihua Wu
ALM
167
0
0
08 May 2024
Granite Code Models: A Family of Open Foundation Models for Code
  Intelligence
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Mayank Mishra
Matt Stallone
Gaoyuan Zhang
Songlin Yang
Aditya Prasad
...
Amith Singhee
Nirmit Desai
David D. Cox
Ruchir Puri
Yikang Shen
AI4TS
339
107
0
07 May 2024
Learning Linear Block Error Correction Codes
Learning Linear Block Error Correction Codes
Yoni Choukroun
Lior Wolf
163
13
0
07 May 2024
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive
  Language Model Pre-training
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Zexuan Zhong
Mengzhou Xia
Danqi Chen
Mike Lewis
MoE
196
27
0
06 May 2024
Dependency-Aware Semi-Structured Sparsity: Declining Roles of Outliers
  in Pruning GLU-based LLMs
Dependency-Aware Semi-Structured Sparsity: Declining Roles of Outliers in Pruning GLU-based LLMs
Zhiyu Guo
Hidetaka Kamigaito
Taro Wanatnabe
78
2
0
03 May 2024
Previous
123...111213...171819
Next