Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"GLU Variants Improve Transformer"
50 / 904 papers shown
Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention
Kai Li
Kejun Gao
Xiaolin Hu
76
0
0
28 Sep 2025
QuadEnhancer: Leveraging Quadratic Transformations to Enhance Deep Neural Networks
Qian Chen
Linxin Yang
Akang Wang
Xiaodong Luo
Y. Zhang
124
0
0
28 Sep 2025
Disentangling Score Content and Performance Style for Joint Piano Rendering and Transcription
Wei Zeng
Junchuan Zhao
Ye Wang
124
0
0
28 Sep 2025
Impute-MACFM: Imputation based on Mask-Aware Flow Matching
Dengyi Liu
Honggang Wang
Hua Fang
142
0
0
27 Sep 2025
Stochastic activations
Maria Lomeli
Matthijs Douze
Gergely Szilvasy
Loic Cabannes
Jade Copet
Sainbayar Sukhbaatar
Jason Weston
Gabriel Synnaeve
Pierre-Emmanuel Mazaré
Hervé Jégou
LLMSV
264
0
0
26 Sep 2025
IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method
Xinyu Liu
Bei Li
Jiahao Liu
Junhao Ruan
Kechen Jiao
Hongyin Tang
Jingang Wang
Xiao Tong
Jingbo Zhu
178
0
0
26 Sep 2025
Compute-Optimal Quantization-Aware Training
Aleksandr Dremov
David Grangier
Angelos Katharopoulos
Awni Y. Hannun
MQ
128
0
0
26 Sep 2025
Real-Time Object Detection Meets DINOv3
Shihua Huang
Yongjie Hou
Longfei Liu
Xuanlong Yu
Xi Shen
ObjD
3DH
PINN
VLM
364
5
0
25 Sep 2025
GZSL-MoE: Apprentissage G{é}n{é}ralis{é} Z{é}ro-Shot bas{é} sur le M{é}lange dÉxperts pour la Segmentation S{é}mantique de Nuages de Points 3DAppliqu{é} {à} un Jeu de Donn{é}es dÉnvironnement de Collaboration Humain-Robot
Ahed Alboody
87
0
0
23 Sep 2025
SimpleFold: Folding Proteins is Simpler than You Think
Yuyang Wang
Jiarui Lu
Navdeep Jaitly
J. Susskind
Miguel Angel Bautista
270
6
0
23 Sep 2025
Understanding Post-Training Structural Changes in Large Language Models
Xinyu He
Xianghui Cao
158
0
0
22 Sep 2025
Training-free Truthfulness Detection via Value Vectors in LLMs
Runheng Liu
Heyan Huang
Xingchen Xiao
Zhijing Wu
89
0
0
22 Sep 2025
Rethinking the Role of Text Complexity in Language Model Pretraining
Dan John Velasco
M. R
211
2
0
20 Sep 2025
Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research
Richard Diehl Martinez
David Demitri Africa
Yuval Weiss
Suchir Salhan
Ryan Daniels
P. Buttery
140
1
0
19 Sep 2025
Neural Speech Separation with Parallel Amplitude and Phase Spectrum Estimation
Fei Liu
Yang Ai
Zhen-Hua Ling
113
0
0
17 Sep 2025
NIRVANA: Structured pruning reimagined for large language models compression
Mengting Ai
Tianxin Wei
Sirui Chen
Jingrui He
VLM
1.6K
1
0
17 Sep 2025
Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models
Yuval Weiss
David Demitri Africa
P. Buttery
Richard Diehl Martinez
262
0
0
16 Sep 2025
MFAF: An EVA02-Based Multi-scale Frequency Attention Fusion Method for Cross-View Geo-Localization
YiTong Liu
Tianzhu Liu
Yanfeng Gu
130
0
0
16 Sep 2025
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions
Väinö Hatanpää
Eugene Ku
Jason Stock
M. Emani
Sam Foreman
...
Sam Wheeler
Huihuo Zheng
T. Arcomano
V. Vishwanath
R. Kotamarthi
153
2
0
16 Sep 2025
Lightweight Metadata-Aware Mixture-of-Experts Masked Autoencoder for Earth Observation
Mohanad Albughdadi
MoE
68
0
0
13 Sep 2025
ENSI: Efficient Non-Interactive Secure Inference for Large Language Models
Zhiyu He
Maojiang Wang
Xinwen Gao
Yuchuan Luo
Lin Liu
Shaojing Fu
120
0
0
11 Sep 2025
ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
Bingxin Xu
Zhen Dong
Oussama Elachqar
Yuzhang Shang
MQ
192
1
0
11 Sep 2025
Open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison
Marianna Nezhurina
Jörg Franke
Taishi Nakamura
Timur Carstensen
Niccolò Ajroldi
Ville Komulainen
David Salinas
J. Jitsev
165
2
0
10 Sep 2025
Practice on Long Behavior Sequence Modeling in Tencent Advertising
Xian Hu
Ming Yue
Zhixiang Feng
Junwei Pan
Junjie Zhai
...
Chao Deng
Yuekui Yang
Shudong Huang
Dapeng Liu
Haijie Gu
92
0
0
10 Sep 2025
When FinTech Meets Privacy: Securing Financial LLMs with Differential Private Fine-Tuning
Sichen Zhu
Hoyeung Leung
Xiaoyi Wang
Jia Wei
Honghui Xu
AIFin
135
1
0
10 Sep 2025
Causal Attention with Lookahead Keys
Zhuoqing Song
Peng Sun
Huizhuo Yuan
Quanquan Gu
CML
189
0
0
09 Sep 2025
ALICE: An Interpretable Neural Architecture for Generalization in Substitution Ciphers
Jeff Shen
Lindsay Smith
AI4CE
156
0
0
08 Sep 2025
RL Fine-Tuning Heals OOD Forgetting in SFT
Hangzhan Jin
Sitao Luan
Sicheng Lyu
Guillaume Rabusseau
Reihaneh Rabbany
Doina Precup
Mohammad Hamdaqa
CLL
LRM
179
5
0
08 Sep 2025
CURE: Controlled Unlearning for Robust Embeddings - Mitigating Conceptual Shortcuts in Pre-Trained Language Models
Aysenur Kocak
Shuo Yang
Bardh Prenkaj
Gjergji Kasneci
102
0
0
05 Sep 2025
Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining
Deniz Bayazit
Aaron Mueller
Antoine Bosselut
140
0
0
05 Sep 2025
FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies
Moritz Reuss
Hongyi Zhou
Marcel Rühle
Ömer Erdinç Yagmurlu
Fabian Otto
Rudolf Lioutikov
LM&Ro
VLM
162
16
0
05 Sep 2025
Elucidating the Design Space of Decay in Linear Attention
Zhen Qin
Xuyang Shen
Yiran Zhong
100
1
0
05 Sep 2025
Multi-level SSL Feature Gating for Audio Deepfake Detection
Hoan My Tran
Damien Lolive
Aghilas Sini
Arnaud Delhay
Pierre-François Marteau
David Guennec
132
1
0
03 Sep 2025
Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens
Sohee Kim
Soohyun Ryu
Joonhyung Park
Eunho Yang
151
0
0
03 Sep 2025
Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages
David Demitri Africa
Suchir Salhan
Yuval Weiss
P. Buttery
Richard Diehl Martinez
164
1
0
02 Sep 2025
Preserving Bilinear Weight Spectra with a Signed and Shrunk Quadratic Activation Function
Jason Abohwo
Thomas Mosen
49
0
0
02 Sep 2025
LLM Encoder vs. Decoder: Robust Detection of Chinese AI-Generated Text with LoRA
Houji Jin
Negin Ashrafi
A. Abdollahi
Wei Liu
Jian Wang
Ganyu Gui
Maryam Pishgar
H. Feng
80
0
0
31 Aug 2025
Universal Properties of Activation Sparsity in Modern Large Language Models
Filip Szatkowski
Patryk Bedkowski
Alessio Devoto
Jan Dubiñski
Pasquale Minervini
Mikołaj Piórczyński
Simone Scardapane
Bartosz Wójcik
153
1
0
30 Aug 2025
Mechanistic interpretability for steering vision-language-action models
Bear Häon
Kaylene C. Stocking
Ian Chuang
Claire Tomlin
LLMSV
156
2
0
30 Aug 2025
QZhou-Embedding Technical Report
Peng Yu
En Xu
Bin Chen
Haibiao Chen
Yinfei Xu
92
2
0
29 Aug 2025
Provable Benefits of In-Tool Learning for Large Language Models
Sam Houliston
Ambroise Odonnat
Charles Arnal
Vivien A. Cabannes
RALM
152
1
0
28 Aug 2025
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
Taishi Nakamura
Satoki Ishikawa
Masaki Kawamura
Takumi Okamoto
Daisuke Nohara
Jun Suzuki
Rio Yokota
MoE
LRM
175
0
0
26 Aug 2025
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Zihao Huang
Yu Bao
Qiyang Min
S. Chen
Ran Guo
...
Defa Zhu
Yutao Zeng
Banggu Wu
Xun Zhou
Siyuan Qiao
MoE
173
3
0
26 Aug 2025
Training Transformers for Mesh-Based Simulations
Paul Garnier
Vincent Lannelongue
J. Viquerat
E. Hachem
AI4CE
90
2
0
25 Aug 2025
Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models
Wataru Ikeda
Kazuki Yano
Ryosuke Takahashi
Jaesung Lee
Keigo Shibata
Jun Suzuki
84
1
0
25 Aug 2025
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
Weilin Cai
Le Qin
Shwai He
Junwei Cui
Ang Li
Jiayi Huang
MoE
116
0
0
25 Aug 2025
Exploring Scaling Laws of CTR Model for Online Performance Improvement
ACM Conference on Recommender Systems (RecSys), 2025
Weijiang Lai
Beihong Jin
Jiongyan Zhang
Yiyuan Zheng
Jian Dong
Jia Cheng
Jun Lei
Xingxing Wang
LRM
178
2
0
21 Aug 2025
Generative AI models capture realistic sea-ice evolution from days to decades
Tobias S. Finn
Marc Bocquet
Pierre Rampal
Charlotte Durand
Flavia Porro
A. Farchi
A. Carrassi
AI4CE
133
2
0
20 Aug 2025
Maximum Score Routing For Mixture-of-Experts
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Bowen Dong
Yilong Fan
Yutao Sun
Zhenyu Li
Tengyu Pan
Xun Zhou
Jianyong Wang
MoE
114
2
0
18 Aug 2025
CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems
Xuran Liu
Nan Xue
Rui Bao
Yaping Sun
Zhiyong Chen
Meixia Tao
Xiaodong Xu
Shuguang Cui
119
0
0
15 Aug 2025
Previous
1
2
3
4
5
6
...
17
18
19
Next