Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GLU Variants Improve Transformer"
50 / 647 papers shown
Title
Understanding Silent Data Corruption in LLM Training
Jeffrey Ma
Hengzhi Pei
Leonard Lausen
George Karypis
37
0
0
17 Feb 2025
Large Language Diffusion Models
Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin
Ji-Rong Wen
Chongxuan Li
102
14
0
14 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoE
AI4CE
54
1
0
13 Feb 2025
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Qingshui Gu
Shu Li
Tianyu Zheng
Zhaoxiang Zhang
175
0
0
10 Feb 2025
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
67
4
0
09 Feb 2025
FuXi-
α
\alpha
α
: Scaling Recommendation Model with Feature Interaction Enhanced Transformer
Yufei Ye
Wei Guo
Jin Yao Chin
Hao Wang
Hong Zhu
...
Yuyang Ye
Y. Liu
Ruiming Tang
Defu Lian
Enhong Chen
92
2
0
05 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
136
0
0
05 Feb 2025
Transformers trained on proteins can learn to attend to Euclidean distance
Isaac Ellmen
Constantin Schneider
Matthew I.J. Raybould
Charlotte M. Deane
79
0
0
03 Feb 2025
CoddLLM: Empowering Large Language Models for Data Analytics
Jiani Zhang
Hengrui Zhang
Rishav Chakravarti
Yiqun Hu
Patrick K. L. Ng
Asterios Katsifodimos
Huzefa Rangwala
George Karypis
Alon Halevy
SyDa
ELM
140
0
0
01 Feb 2025
Enhancing Glucose Level Prediction of ICU Patients through Hierarchical Modeling of Irregular Time-Series
Hadi Mehdizavareh
Arijit Khan
Simon Lebech Cichosz
AI4TS
41
1
0
28 Jan 2025
iFormer: Integrating ConvNet and Transformer for Mobile Application
Chuanyang Zheng
ViT
67
0
0
26 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
97
18
0
17 Jan 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
42
1
0
12 Jan 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
77
9
0
11 Jan 2025
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
Jaehoon Heo
Adiwena Putra
Jieon Yoon
Sungwoong Yune
Hangyeol Lee
Ji-Hoon Kim
Joo-Young Kim
DiffM
55
1
0
10 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
50
0
0
10 Jan 2025
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
38
0
0
08 Jan 2025
SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment
Yuchun Fan
Yongyu Mu
Yilin Wang
Lei Huang
Junhao Ruan
B. Li
Tong Xiao
Shujian Huang
Xiaocheng Feng
Jingbo Zhu
LRM
49
3
0
08 Jan 2025
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
111
609
0
31 Dec 2024
ELECTRA and GPT-4o: Cost-Effective Partners for Sentiment Analysis
James P. Beno
VLM
30
0
0
29 Dec 2024
LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System
Hyucksung Kwon
Kyungmo Koo
Janghyeon Kim
W. Lee
Minjae Lee
...
Yongkee Kwon
Ilkon Kim
Euicheol Lim
John Kim
Jungwook Choi
66
4
0
28 Dec 2024
Segment-Based Attention Masking for GPTs
Shahar Katz
Liran Ringel
Yaniv Romano
Lior Wolf
CLL
40
1
0
24 Dec 2024
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li
Lu Yin
Shiwei Liu
70
4
0
18 Dec 2024
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
Lifeng Qiao
Peng Ye
Yuchen Ren
Weiqiang Bai
Chaoqi Liang
Xinzhu Ma
Nanqing Dong
W. Ouyang
73
2
0
18 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
88
75
0
18 Dec 2024
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
Kun Ouyang
Yuanxin Liu
Shicheng Li
Yi Liu
Hao Zhou
Fandong Meng
Jie Zhou
Xu Sun
102
1
0
16 Dec 2024
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture
Jingze Shi
Bingheng Wu
65
0
0
16 Dec 2024
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang
Chih-Yao Ma
Yen-Cheng Liu
Ji Hou
Tao Xu
...
Peizhao Zhang
Tingbo Hou
Peter Vajda
N. Jha
Xiaoliang Dai
LMTD
DiffM
VGen
VLM
81
5
0
13 Dec 2024
Code LLMs: A Taxonomy-based Survey
Nishat Raihan
Christian D. Newman
Marcos Zampieri
91
1
0
11 Dec 2024
VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition
Michael Yeung
Toya Teramoto
Songtao Wu
Tatsuo Fujiwara
Kenji Suzuki
Tamaki Kojima
71
0
0
09 Dec 2024
MIND: Effective Incorrect Assignment Detection through a Multi-Modal Structure-Enhanced Language Model
Yunhe Pang
Bo Chen
Fanjin Zhang
Yanghui Rao
Jie Tang
75
0
0
05 Dec 2024
AntLM: Bridging Causal and Masked Language Models
Xinru Yu
Bin Guo
Shiwei Luo
J. Wang
Tao Ji
Yuanbin Wu
CLL
77
1
0
04 Dec 2024
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Ziqi Pang
Tianyuan Zhang
Fujun Luan
Yunze Man
Hao Tan
Kai Zhang
William T. Freeman
Yu-Xiong Wang
VGen
71
8
0
02 Dec 2024
TruncFormer: Private LLM Inference Using Only Truncations
Patrick Yubeaton
Jianqiao Mo
Karthik Garimella
N. Jha
Brandon Reagen
Chinmay Hegde
Siddharth Garg
74
0
0
02 Dec 2024
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov
Denis Kuznedelev
Mikhail Khoroshikh
Valentin Khrulkov
Dmitry Baranchuk
106
2
0
02 Dec 2024
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
Ali Shiraee Kasmaee
Mohammad Khodadad
Mohammad Arshi Saloot
Nick Sherck
Stephen Dokas
H. Mahyar
Soheila Samiee
ELM
139
0
0
30 Nov 2024
H
3
H^3
H
3
Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs
Selim Furkan Tekin
Fatih Ilhan
Tiansheng Huang
Sihao Hu
Zachary Yahn
Ling Liu
MoMe
76
3
0
26 Nov 2024
MH-MoE: Multi-Head Mixture-of-Experts
Shaohan Huang
Xun Wu
Shuming Ma
Furu Wei
MoE
69
1
0
25 Nov 2024
MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model
Yifan Wu
Min Zeng
Yang Li
Y. Zhang
Min Li
67
1
0
23 Nov 2024
Signformer is all you need: Towards Edge AI for Sign Language
Eta Yang
SLR
82
0
0
19 Nov 2024
Selective Attention: Enhancing Transformer through Principled Context Control
Xuechen Zhang
Xiangyu Chang
Mingchen Li
A. Roy-Chowdhury
J. Chen
Samet Oymak
78
3
0
19 Nov 2024
BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization
Md. Nazmus Sadat Samin
Jawad Ibn Ahad
Tanjila Ahmed Medha
Fuad Rahman
M. R. Amin
Nabeel Mohammed
Shafin Rahman
34
0
0
16 Nov 2024
Empowering Meta-Analysis: Leveraging Large Language Models for Scientific Synthesis
Jawad Ibn Ahad
Rafeed Mohammad Sultan
Abraham Kaikobad
Fuad Rahman
M. R. Amin
Nabeel Mohammed
Shafin Rahman
40
0
0
16 Nov 2024
Xmodel-1.5: An 1B-scale Multilingual LLM
Wang Qun
Liu Yang
Lin Qingquan
Jiang Ling
LRM
44
0
0
15 Nov 2024
Hysteresis Activation Function for Efficient Inference
Moshe Kimhi
Idan Kashani
A. Mendelson
Chaim Baskin
LLMSV
23
0
0
15 Nov 2024
Unraveling the Gradient Descent Dynamics of Transformers
Bingqing Song
Boran Han
Shuai Zhang
Jie Ding
Mingyi Hong
AI4CE
34
1
0
12 Nov 2024
More Expressive Attention with Negative Weights
Ang Lv
Ruobing Xie
Shuaipeng Li
Jiayi Liao
X. Sun
Zhanhui Kang
Di Wang
Rui Yan
30
0
0
11 Nov 2024
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
46
13
0
07 Nov 2024
Character-level Tokenizations as Powerful Inductive Biases for RNA Foundational Models
Adrián Morales-Pastor
Raquel Vázquez-Reza
Miłosz Wieczór
Clàudia Valverde
Manel Gil-Sorribes
Bertran Miquel-Oliver
Álvaro Ciudad
Alexis Molina
AI4CE
66
0
0
05 Nov 2024
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
X. Sun
Yanfeng Chen
Y. Huang
Ruobing Xie
Jiaqi Zhu
...
Zhanhui Kang
Yong Yang
Yuhong Liu
Di Wang
Jie Jiang
MoE
ALM
ELM
67
25
0
04 Nov 2024
Previous
1
2
3
4
5
6
...
11
12
13
Next