Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"GLU Variants Improve Transformer"
50 / 904 papers shown
The Free Transformer
François Fleuret
68
0
0
20 Oct 2025
MuonBP: Faster Muon via Block-Periodic Orthogonalization
Ahmed Khaled
Kaan Ozkara
Tao Yu
Mingyi Hong
Youngsuk Park
96
3
0
19 Oct 2025
Finding Manifolds With Bilinear Autoencoders
Thomas Dooms
Ward Gauderis
91
0
0
19 Oct 2025
NeurIPT: Foundation Model for Neural Interfaces
Zitao Fang
Chenxuan Li
Hongting Zhou
Shuyang Yu
Guodong DU
Ashwaq Qasem
Yang Lu
Jing Li
J. Zhang
Sim Kuan Goh
98
3
0
18 Oct 2025
Sequence Modeling with Spectral Mean Flows
Jinwoo Kim
Max Beier
Nicolas Hoischen
Nayun Kim
Seunghoon Hong
BDL
170
0
0
17 Oct 2025
SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling
Kadri Hacioğlu
Manjunath K E
Andreas Stolcke
126
1
0
17 Oct 2025
Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology
Xinrui Huang
Fan Xiao
Dongming He
Anqi Gao
Dandan Li
Xiaofan Zhang
Shaoting Zhang
Xudong Wang
MedIm
LM&MA
221
0
0
16 Oct 2025
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
Ming Gui
Johannes Schusterbauer
Timy Phan
Felix Krause
J. Susskind
Miguel Angel Bautista
Bjorn Ommer
201
1
0
16 Oct 2025
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
Mike Lasby
Ivan Lazarevich
Nish Sinnadurai
Sean Lie
Yani Andrew Ioannou
Vithursan Thangarasa
120
1
0
15 Oct 2025
Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models
Daniil Gurgurov
Josef van Genabith
Simon Ostermann
MoE
201
0
0
15 Oct 2025
DiSTAR: Diffusion over a Scalable Token Autoregressive Representation for Speech Generation
Yakun Song
Xiaobin Zhuang
Jiawei Chen
Zhikang Niu
Guanrou Yang
...
Zhuo Chen
Yuping Wang
Yuping Wang
Xie Chen
Xie Chen
DiffM
196
0
0
14 Oct 2025
Simple Projection Variants Improve ColBERT Performance
Benjamin Clavié
Sean Lee
Rikiya Takehi
Aamir Shakir
Makoto P. Kato
140
1
0
14 Oct 2025
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models
Weiyang Jin
Yuwei Niu
Jiaqi Liao
Chengqi Duan
Aoxue Li
Shenghua Gao
Xihui Liu
LRM
208
4
0
14 Oct 2025
What If : Understanding Motion Through Sparse Interactions
S. A. Baumann
Nick Stracke
Timy Phan
Bjorn Ommer
135
0
0
14 Oct 2025
Vision-LLMs for Spatiotemporal Traffic Forecasting
Ning Yang
Hengyu Zhong
Haijun Zhang
Randall Berry
AI4TS
121
1
0
13 Oct 2025
High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation
Runyang Feng
H. Chang
Tze Ho Elden Tse
Boeun Kim
Yi Chang
Yixing Gao
Mamba
143
0
0
13 Oct 2025
DAWP: A framework for global observation forecasting via Data Assimilation and Weather Prediction in satellite observation space
Junchao Gong
Jingyi Xu
Ben Fei
Zhangrui Li
W. Zhang
Kun Chen
Wanghan Xu
Weidong Yang
Xiaokang Yang
Lei Bai
124
0
0
13 Oct 2025
Hierarchical Scheduling for Multi-Vector Image Retrieval
Maoliang Li
K. Li
Yaoyang Liu
Jiayu Chen
Zihao Zheng
Yinjun Wu
Xiang Chen
118
1
0
10 Oct 2025
Understanding the Effects of Domain Finetuning on LLMs
Eshaan Tanwar
Deepak Nathani
William Yang Wang
Tanmoy Chakraborty
130
0
0
10 Oct 2025
iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation
Chuanrui Zhang
Zhengxian Wu
Guanxing Lu
Yansong Tang
Ziwei Wang
VGen
103
0
0
10 Oct 2025
Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models
Yankun Han
61
0
0
10 Oct 2025
From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill
Gunjun Lee
Jiwon Kim
Jaiyoung Park
Y. Lee
Jung Ho Ahn
MoE
121
0
0
09 Oct 2025
Scaling Laws for Code: A More Data-Hungry Regime
Xianzhen Luo
Wenzhen Zheng
Qingfu Zhu
Rongyi Zhang
Houyi Li
Siming Huang
YuanTao Fan
Wanxiang Che
ALM
110
2
0
09 Oct 2025
Evaluation of a Robust Control System in Real-World Cable-Driven Parallel Robots
Damir Nurtdinov
Aliaksei Korshuk
Alexei Kornaev
Alexander Maloletov
73
0
0
09 Oct 2025
Mid-Training of Large Language Models: A Survey
Kaixiang Mo
Yuxin Shi
Weiwei Weng
Zhiqiang Zhou
Shuman Liu
Haibo Zhang
Anxiang Zeng
LRM
151
0
0
08 Oct 2025
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer
Ziyuan Huang
Dandan Zheng
Cheng Zou
Rui Liu
Xiaolong Wang
...
Jiajia Liu
Qingpei Guo
Ming-Hsuan Yang
Jingdong Chen
Jun Zhou
155
8
0
08 Oct 2025
Language Lives in Sparse Dimensions: Toward Interpretable and Efficient Multilingual Control for Large Language Models
Chengzhi Zhong
Fei Cheng
Qianying Liu
Yugo Murawaki
Chenhui Chu
Sadao Kurohashi
LRM
132
0
0
08 Oct 2025
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies
Chunsan Hong
Seonho An
Min-Soo Kim
Jong Chul Ye
DiffM
OffRL
127
0
0
07 Oct 2025
SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization
Théophane Vallaeys
Jakob Verbeek
Matthieu Cord
DiffM
227
3
0
06 Oct 2025
Scaling Sequence-to-Sequence Generative Neural Rendering
Shikun Liu
Kam Woh Ng
Wonbong Jang
Jiadong Guo
Junlin Han
...
Juan C. Pérez
Zijian Zhou
Chi Phung
Tao Xiang
Juan-Manuel Perez-Rua
VGen
129
0
0
05 Oct 2025
A Unified Deep Reinforcement Learning Approach for Close Enough Traveling Salesman Problem
Mingfeng Fan
Jiaqi Cheng
Yaoxin Wu
Yifeng Zhang
Yibin Yang
Guohua Wu
Guillaume Sartoretti
BDL
105
0
0
03 Oct 2025
SoundReactor: Frame-level Online Video-to-Audio Generation
Koichi Saito
Julian Tanke
Christian Simon
Masato Ishii
Kazuki Shimada
Zachary Novack
Zhi-Wei Zhong
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
DiffM
VGen
241
0
0
02 Oct 2025
Litespark Technical Report: High-Throughput, Energy-Efficient LLM Training Framework
Nii Osae Osae Dade
Moinul Hossain Rahat
144
0
0
02 Oct 2025
Uncovering the Computational Ingredients of Human-Like Representations in LLMs
Zach Studdiford
Timothy T. Rogers
Kushin Mukherjee
Siddharth Suresh
162
0
0
01 Oct 2025
Eliciting Chain-of-Thought Reasoning for Time Series Analysis using Reinforcement Learning
Felix Parker
Nimeesha Chan
Chi Zhang
Kimia Ghobadi
AI4TS
OffRL
LRM
136
1
0
01 Oct 2025
Composer: A Search Framework for Hybrid Neural Architecture Design
Bilge Acun
Prasoon Sinha
Newsha Ardalani
Sangmin Bae
Alicia Golden
Chien-Yu Lin
Meghana Madhyastha
Fei Sun
N. Yadwadkar
Carole-Jean Wu
222
1
0
01 Oct 2025
Flock: A Knowledge Graph Foundation Model via Learning on Random Walks
Jinwoo Kim
Xingyue Huang
Krzysztof Olejniczak
Kyungbin Min
M. Bronstein
Seunghoon Hong
.Ismail .Ilkan Ceylan
SLR
273
1
0
01 Oct 2025
Swift: An Autoregressive Consistency Model for Efficient Weather Forecasting
Jason Stock
T. Arcomano
R. Kotamarthi
DiffM
161
4
0
30 Sep 2025
Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining
M. R
Dan John Velasco
117
1
0
29 Sep 2025
Scalable GANs with Transformers
Sangeek Hyun
MinKyu Lee
Jae-Pil Heo
110
1
0
29 Sep 2025
Training Agents Inside of Scalable World Models
Danijar Hafner
Wilson Yan
Timothy Lillicrap
VGen
208
21
0
29 Sep 2025
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Guolin Ke
Hui Xue
137
3
0
29 Sep 2025
Scaling with Collapse: Efficient and Predictable Training of LLM Families
Shane Bergsma
Bin Claire Zhang
Nolan Dey
Shaheer Muhammad
Gurpreet Gosal
Joel Hestness
133
2
0
29 Sep 2025
Pretraining with hierarchical memories: separating long-tail and common knowledge
Hadi Pouransari
David Grangier
C Thomas
Michael Kirchhof
Oncel Tuzel
RALM
KELM
240
1
0
29 Sep 2025
UniVid: The Open-Source Unified Video Model
Jiabin Luo
Junhui Lin
Zeyu Zhang
Biao Wu
Meng Fang
Ling-Hao Chen
Hao Tang
VGen
276
8
0
29 Sep 2025
Efficient Hyperparameter Tuning via Trajectory Invariance Principle
Bingrui Li
Jiaxin Wen
Zhanpeng Zhou
Jun-Jie Zhu
Jianfei Chen
83
0
0
29 Sep 2025
AuON: A Linear-time Alternative to Orthogonal Momentum Updates
Dipan Maity
146
0
0
29 Sep 2025
Negative Pre-activations Differentiate Syntax
Linghao Kong
Angelina Ning
Micah Adler
Nir Shavit
120
0
0
29 Sep 2025
LLaDA-MoE: A Sparse MoE Diffusion Language Model
Fengqi Zhu
Zebin You
Yipeng Xing
Zenan Huang
Lin Liu
...
Junbo Zhao
Da Zheng
Chongxuan Li
Jianguo Li
J. Wen
MoE
236
12
0
29 Sep 2025
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
Shane Bergsma
Nolan Dey
Joel Hestness
162
0
0
29 Sep 2025
Previous
1
2
3
4
5
...
17
18
19
Next