ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoEAI4CE
493
9
0
13 Feb 2025
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Qingshui Gu
Shu Li
Tianyu Zheng
Rundong Wang
1.1K
0
0
10 Feb 2025
When, Where and Why to Average Weights?
When, Where and Why to Average Weights?International Conference on Machine Learning (ICML), 2025
Niccolò Ajroldi
Antonio Orvieto
Jonas Geiping
MoMe
559
2
0
10 Feb 2025
The Curse of Depth in Large Language Models
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
406
21
0
09 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
998
11
0
05 Feb 2025
FuXi-$\alpha$: Scaling Recommendation Model with Feature Interaction Enhanced Transformer
FuXi-α\alphaα: Scaling Recommendation Model with Feature Interaction Enhanced TransformerThe Web Conference (WWW), 2025
Yufei Ye
Wei Guo
Jin Yao Chin
Hao Wang
Hong Zhu
...
Yuyang Ye
Yixiao Liu
Ruiming Tang
Defu Lian
Tong Xu
330
10
0
05 Feb 2025
Transformers trained on proteins can learn to attend to Euclidean distance
Transformers trained on proteins can learn to attend to Euclidean distance
Isaac Ellmen
Constantin Schneider
Matthew I.J. Raybould
Charlotte M. Deane
241
0
0
03 Feb 2025
CoddLLM: Empowering Large Language Models for Data Analytics
CoddLLM: Empowering Large Language Models for Data Analytics
Jiani Zhang
Hengrui Zhang
Rishav Chakravarti
Yiqun Hu
Patrick Ng
Asterios Katsifodimos
Huzefa Rangwala
George Karypis
Alon Halevy
SyDaELM
899
5
0
01 Feb 2025
iFormer: Integrating ConvNet and Transformer for Mobile Application
iFormer: Integrating ConvNet and Transformer for Mobile ApplicationInternational Conference on Learning Representations (ICLR), 2025
Chuanyang Zheng
ViT
396
3
0
26 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in MedicineIEEE Reviews in Biomedical Engineering (RBME), 2024
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CELM&MAVLM
772
72
0
17 Jan 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM TrainingInternational Conference on Learning Representations (ICLR), 2025
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zinan Lin
Shiwei Liu
394
15
0
12 Jan 2025
Tensor Product Attention Is All You Need
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
787
30
0
11 Jan 2025
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion ModelsInternational Symposium on High-Performance Computer Architecture (HPCA), 2025
Jaehoon Heo
Adiwena Putra
Jieon Yoon
Sungwoong Yune
Hangyeol Lee
Ji-Hoon Kim
Joo-Young Kim
DiffM
273
6
0
10 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
377
0
0
10 Jan 2025
CURing Large Models: Compression via CUR Decomposition
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
349
2
0
08 Jan 2025
SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment
SLAM: Towards Efficient Multilingual Reasoning via Selective Language AlignmentInternational Conference on Computational Linguistics (COLING), 2025
Yuchun Fan
Yongyu Mu
Yilin Wang
Lei Huang
Junhao Ruan
Yangqiu Song
Tong Xiao
Shujian Huang
Xiaocheng Feng
Jingbo Zhu
LRM
259
19
0
08 Jan 2025
VMamba: Visual State Space Model
VMamba: Visual State Space ModelNeural Information Processing Systems (NeurIPS), 2024
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
1.1K
1,554
0
31 Dec 2024
ELECTRA and GPT-4o: Cost-Effective Partners for Sentiment Analysis
ELECTRA and GPT-4o: Cost-Effective Partners for Sentiment Analysis
James P. Beno
VLM
290
2
0
29 Dec 2024
PIMphony: Overcoming Bandwidth and Capacity Inefficiency in PIM-based Long-Context LLM Inference System
PIMphony: Overcoming Bandwidth and Capacity Inefficiency in PIM-based Long-Context LLM Inference System
Hyucksung Kwon
Kyungmo Koo
Janghyeon Kim
W. Lee
Minjae Lee
...
Ilkon Kim
Euicheol Lim
John Kim
Jungwook Choi
Jungwook Choi
296
7
0
28 Dec 2024
Segment-Based Attention Masking for GPTs
Segment-Based Attention Masking for GPTsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Shahar Katz
Liran Ringel
Yaniv Romano
Lior Wolf
CLL
157
5
0
24 Dec 2024
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with
  MxDNA
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNANeural Information Processing Systems (NeurIPS), 2024
Lifeng Qiao
Peng Ye
Yuchen Ren
Weiqiang Bai
Chaoqi Liang
Cheng Wang
Nanqing Dong
W. Ouyang
318
8
0
18 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
  Fast, Memory Efficient, and Long Context Finetuning and Inference
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
457
389
0
18 Dec 2024
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LNInternational Conference on Learning Representations (ICLR), 2024
Pengxiang Li
Lu Yin
Shiwei Liu
295
11
0
18 Dec 2024
Wonderful Matrices: Combining for a More Efficient and Effective
  Foundation Model Architecture
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture
Jingze Shi
Yiran Peng
282
0
0
16 Dec 2024
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
PunchBench: Benchmarking MLLMs in Multimodal Punchline ComprehensionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Kun Ouyang
Yuanxin Liu
Shicheng Li
Yi Liu
Hao Zhou
Fandong Meng
Jie Zhou
Xu Sun
391
1
0
16 Dec 2024
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational ComplexityComputer Vision and Pattern Recognition (CVPR), 2024
Hongjie Wang
Chih-Yao Ma
Yen-Cheng Liu
Ji Hou
Tao Xu
...
Peizhao Zhang
Tingbo Hou
Peter Vajda
N. Jha
Xiaoliang Dai
LMTDVGenVLMDiffM
426
27
0
13 Dec 2024
Code LLMs: A Taxonomy-based Survey
Code LLMs: A Taxonomy-based SurveyBigData Congress [Services Society] (BSS), 2024
Nishat Raihan
Christian D. Newman
Marcos Zampieri
377
4
0
11 Dec 2024
VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition
VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition
Michael Yeung
Toya Teramoto
Songtao Wu
Tatsuo Fujiwara
Kenji Suzuki
Tamaki Kojima
503
6
0
09 Dec 2024
GuARD: Effective Anomaly Detection through a Text-Rich and Graph-Informed Language Model
GuARD: Effective Anomaly Detection through a Text-Rich and Graph-Informed Language Model
Yunhe Pang
Bo Chen
Fanjin Zhang
Yanghui Rao
Jie Tang
Jie Tang
297
0
0
05 Dec 2024
AntLM: Bridging Causal and Masked Language Models
AntLM: Bridging Causal and Masked Language Models
Xinru Yu
Bin Guo
Shiwei Luo
Jiadong Wang
Changzhi Sun
Man Lan
CLL
331
4
0
04 Dec 2024
TruncFormer: Private LLM Inference Using Only Truncations
TruncFormer: Private LLM Inference Using Only Truncations
Patrick Yubeaton
Jianqiao Mo
Karthik Garimella
N. Jha
Brandon Reagen
Chinmay Hegde
Siddharth Garg
263
1
0
02 Dec 2024
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
RandAR: Decoder-only Autoregressive Visual Generation in Random OrdersComputer Vision and Pattern Recognition (CVPR), 2024
Ziqi Pang
Tianyuan Zhang
Fujun Luan
Yunze Man
Hao Tan
Kai Zhang
William T. Freeman
Yu-Xiong Wang
VGen
392
61
0
02 Dec 2024
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov
Denis Kuznedelev
Mikhail Khoroshikh
Valentin Khrulkov
Dmitry Baranchuk
665
19
0
02 Dec 2024
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
Ali Shiraee Kasmaee
Mohammad Khodadad
Mohammad Arshi Saloot
Nick Sherck
Stephen Dokas
H. Mahyar
Soheila Samiee
ELM
1.3K
10
0
30 Nov 2024
H3Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs
H3Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs
Selim Furkan Tekin
Fatih Ilhan
Tiansheng Huang
Sihao Hu
Yichang Xu
Zachary Yahn
Ling Liu
MoMe
363
8
0
26 Nov 2024
MH-MoE: Multi-Head Mixture-of-Experts
MH-MoE: Multi-Head Mixture-of-Experts
Shaohan Huang
Xun Wu
Shuming Ma
Furu Wei
MoE
378
5
0
25 Nov 2024
LibraGrad: Balancing Gradient Flow for Universally Better Vision
  Transformer Attributions
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer AttributionsComputer Vision and Pattern Recognition (CVPR), 2024
Faridoun Mehri
Mahdieh Soleymani Baghshah
Mohammad Taher Pilehvar
296
3
0
24 Nov 2024
MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language
  Model
MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model
Yifan Wu
Min Zeng
Yang Li
Yujiao Shi
Min Li
362
3
0
23 Nov 2024
Signformer is all you need: Towards Edge AI for Sign Language
Signformer is all you need: Towards Edge AI for Sign Language
Eta Yang
SLR
311
0
0
19 Nov 2024
Selective Attention: Enhancing Transformer through Principled Context
  Control
Selective Attention: Enhancing Transformer through Principled Context ControlNeural Information Processing Systems (NeurIPS), 2024
Xuechen Zhang
Xiangyu Chang
Mingchen Li
Amit K. Roy-Chowdhury
Jiasi Chen
Samet Oymak
260
10
0
19 Nov 2024
BanglaDialecto: An End-to-End AI-Powered Regional Speech StandardizationBigData Congress [Services Society] (BSS), 2024
Md. Nazmus Sadat Samin
Jawad Ibn Ahad
Tanjila Ahmed Medha
Fuad Rahman
M. R. Amin
Nabeel Mohammed
Shafin Rahman
226
2
0
16 Nov 2024
Empowering Meta-Analysis: Leveraging Large Language Models for Scientific SynthesisBigData Congress [Services Society] (BSS), 2024
Jawad Ibn Ahad
Rafeed Mohammad Sultan
Abraham Kaikobad
Fuad Rahman
M. R. Amin
Nabeel Mohammed
Shafin Rahman
197
2
0
16 Nov 2024
Xmodel-1.5: An 1B-scale Multilingual LLM
Xmodel-1.5: An 1B-scale Multilingual LLM
Wang Qun
Liu Yang
Lin Qingquan
Jiang Ling
LRM
361
0
0
15 Nov 2024
Hysteresis Activation Function for Efficient Inference
Hysteresis Activation Function for Efficient Inference
Moshe Kimhi
Idan Kashani
A. Mendelson
Chaim Baskin
LLMSV
471
2
0
15 Nov 2024
Unraveling the Gradient Descent Dynamics of Transformers
Unraveling the Gradient Descent Dynamics of TransformersNeural Information Processing Systems (NeurIPS), 2024
Bingqing Song
Boran Han
Shuai Zhang
Jie Ding
Mingyi Hong
AI4CE
322
8
0
12 Nov 2024
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Philip Zmushko
Aleksandr Beznosikov
Martin Takáč
Samuel Horváth
304
4
0
12 Nov 2024
More Expressive Attention with Negative Weights
More Expressive Attention with Negative Weights
Ang Lv
Ruobing Xie
Shuaipeng Li
Jiayi Liao
Xingwu Sun
Zhanhui Kang
Di Wang
Rui Yan
428
2
0
11 Nov 2024
Scaling Laws for Precision
Scaling Laws for PrecisionInternational Conference on Learning Representations (ICLR), 2024
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
Cengiz Pehlevan
Christopher Ré
Aditi Raghunathan
AIFinMoMe
387
64
0
07 Nov 2024
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
OpenCoder: The Open Cookbook for Top-Tier Code Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Siming Huang
Tianhao Cheng
J.K. Liu
Jiaran Hao
L. Song
...
Ge Zhang
Zili Wang
Yuan Qi
Yinghui Xu
Wei Chu
ALM
484
84
0
07 Nov 2024
Character-level Tokenizations as Powerful Inductive Biases for RNA Foundational Models
Adrián Morales-Pastor
Raquel Vázquez-Reza
Miłosz Wieczór
Clàudia Valverde
Manel Gil-Sorribes
Bertran Miquel-Oliver
Álvaro Ciudad
Alexis Molina
AI4CE
253
2
0
05 Nov 2024
Previous
123...789...171819
Next