ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown
Equivalence of Context and Parameter Updates in Modern Transformer Blocks
Equivalence of Context and Parameter Updates in Modern Transformer Blocks
Adrian Goldwaser
Michael Munn
J. Gonzalvo
Benoit Dherin
88
0
0
24 Dec 2025
Jina-VLM: Small Multilingual Vision Language Model
Jina-VLM: Small Multilingual Vision Language Model
Andreas Koukounas
Georgios Mastrapas
Florian Hönicke
Sedigheh Eslami
Guillaume Roncari
Scott Martens
Han Xiao
MLLM
335
0
0
03 Dec 2025
Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study
Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study
Yixuan Li
Yuhao Lu
Y. Liu
Liang Li
R. Ruffini
Di Li
Rong-Gen Cai
Xiaoyan Zhu
Wenbin Lin
Yu Wang
96
0
0
03 Dec 2025
AutoBrep: Autoregressive B-Rep Generation with Unified Topology and Geometry
AutoBrep: Autoregressive B-Rep Generation with Unified Topology and Geometry
Xiang Xu
P. Jayaraman
Joseph George Lambourne
Yilin Liu
Durvesh Malpure
Pete Meltzer
AI4CE
145
0
0
02 Dec 2025
ViT$^3$: Unlocking Test-Time Training in Vision
ViT3^33: Unlocking Test-Time Training in Vision
Dongchen Han
Y. Li
Tianyu Li
Z. Cao
Ziming Wang
Jun Song
Yu Cheng
Bo Zheng
Gao Huang
ViT
56
0
0
01 Dec 2025
Improved Mean Flows: On the Challenges of Fastforward Generative Models
Zhengyang Geng
Yiyang Lu
Zongze Wu
Eli Shechtman
J. Zico Kolter
Kaiming He
AI4CE
116
1
0
01 Dec 2025
Scaling and context steer LLMs along the same computational path as the human brain
Joséphine Raugel
Stéphane DÁscoli
Jérémy Rapin
Valentin Wyart
J. King
116
0
0
01 Dec 2025
AI-Enabled grading with near-domain data for scaling feedback with human-level accuracy
AI-Enabled grading with near-domain data for scaling feedback with human-level accuracy
Shyam Agarwal
Ali Moghimi
Kevin C. Haudek
AI4Ed
181
0
0
01 Dec 2025
Estimating the Event-Related Potential from Few EEG Trials
Estimating the Event-Related Potential from Few EEG Trials
Anders Vestergaard Nørskov
Kasper Jørgensen
Alexander Neergaard Zahid
Morten Mørup
100
0
0
28 Nov 2025
SpaceMind: Camera-Guided Modality Fusion for Spatial Reasoning in Vision-Language Models
SpaceMind: Camera-Guided Modality Fusion for Spatial Reasoning in Vision-Language Models
Ruosen Zhao
Zhikang Zhang
Jialei Xu
Jiahao Chang
Dong Chen
Lingyun Li
Weijian Sun
Zizhuang Wei
VLMLRM
150
0
0
28 Nov 2025
DisMo: Disentangled Motion Representations for Open-World Motion Transfer
DisMo: Disentangled Motion Representations for Open-World Motion Transfer
Thomas Ressler-Antal
Frank Fundel
Malek Ben Alaya
S. A. Baumann
Felix Krause
Ming Gui
Bjorn Ommer
DiffMVGen
97
0
0
28 Nov 2025
ABounD: Adversarial Boundary-Driven Few-Shot Learning for Multi-Class Anomaly Detection
ABounD: Adversarial Boundary-Driven Few-Shot Learning for Multi-Class Anomaly Detection
Runzhi Deng
Yundi Hu
Xinshuang Zhang
Zhao Wang
Xixi Liu
Wang-Zhou Dai
Caifeng Shan
Fang Zhao
44
0
0
27 Nov 2025
On the Origin of Algorithmic Progress in AI
On the Origin of Algorithmic Progress in AI
Hans Gundlach
Alex Fogelson
Jayson Lynch
Ana Trisovic
Jonathan Rosenfeld
Anmol Sandhu
Neil Thompson
80
0
0
26 Nov 2025
Subjective Depth and Timescale Transformers: Learning Where and When to Compute
Subjective Depth and Timescale Transformers: Learning Where and When to Compute
Frederico Wieser
Martin A Benfeghoul
Haitham Bou-Ammar
Jun Wang
Zafeirios Fountas
118
0
0
26 Nov 2025
Adam Simplified: Bias Correction Debunked
Adam Simplified: Bias Correction Debunked
Sam Laing
Antonio Orvieto
128
0
0
25 Nov 2025
3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding
3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding
X. Wang
Chen Tang
Xiangyu Yue
Wei-Hong Li
3DV
189
0
0
25 Nov 2025
Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling
Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling
Xiao Cui
Yulei Qin
Xinyue Li
Wengang Zhou
Hongsheng Li
Houqiang Li
DDFedML
338
0
0
24 Nov 2025
VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking
VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking
Kichang Yang
Seonjun Kim
Minjae Kim
Nairan Zhang
Chi Zhang
Youngki Lee
VLM
164
0
0
24 Nov 2025
MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection
MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection
H. Lu
Yi Yu
Shijian Lu
Deepu Rajan
Boon Poh Ng
Alex Chichung Kot
Xudong Jiang
Mamba
192
0
0
22 Nov 2025
MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning
MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning
Wenrui Zhang
Xinggang Wang
Bin Feng
Wenyu Liu
80
0
0
21 Nov 2025
CAMS: Towards Compositional Zero-Shot Learning via Gated Cross-Attention and Multi-Space Disentanglement
Pan Yang
Cheng Deng
J. Yang
Han Zhao
Yun-Hai Liu
Yuling Chen
Xiaoli Ruan
Yanping Chen
CoGe
289
0
0
20 Nov 2025
Decoupling Complexity from Scale in Latent Diffusion Model
Tianxiong Zhong
Xingye Tian
X. Wang
Boyuan Jiang
Xin Tao
Pengfei Wan
DiffM
316
0
0
20 Nov 2025
Analysis of heart failure patient trajectories using sequence modeling
Analysis of heart failure patient trajectories using sequence modeling
Falk Dippela
Yinan Yu
Annika Rosengren
Martin Lindgren
Christina E. Lundberg
Erik Aerts
Martin Adiels
Helen Sjöland
Mamba
283
0
0
20 Nov 2025
OPFormer: Object Pose Estimation leveraging foundation model with geometric encoding
OPFormer: Object Pose Estimation leveraging foundation model with geometric encoding
Artem Moroz
Vit Zeman
Martin Mikšík
Elizaveta Isianova
Miroslav David
Pavel Burget
Varun Burde
ViT
88
0
0
16 Nov 2025
CellARC: Measuring Intelligence with Cellular Automata
CellARC: Measuring Intelligence with Cellular Automata
Miroslav Lžičař
LRM
84
0
0
11 Nov 2025
oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention
oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention
Ryusuke Mizutani
Kazuaki Matano
Tsugumi Kadowaki
Haruki Tenya
Layris
nuigurumi
Koki Hashimoto
Yu Tanaka
158
0
0
11 Nov 2025
Learning to Focus: Focal Attention for Selective and Scalable Transformers
Learning to Focus: Focal Attention for Selective and Scalable Transformers
Dhananjay Ram
Wei Xia
Stefano Soatto
284
0
0
10 Nov 2025
SyMuPe: Affective and Controllable Symbolic Music Performance
SyMuPe: Affective and Controllable Symbolic Music Performance
Ilya Borovik
Dmitrii Gavrilev
Vladimir Viro
104
0
0
05 Nov 2025
Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining
Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining
Costin-Andrei Oncescu
Qingyang Wu
Wai Tong Chung
Robert Wu
Bryan Gopal
Junxiong Wang
Tri Dao
Ben Athiwaratkun
MoE
190
0
0
04 Nov 2025
MoSa: Motion Generation with Scalable Autoregressive Modeling
MoSa: Motion Generation with Scalable Autoregressive Modeling
Mengyuan Liu
Sheng Yan
Y. Wang
Yingjie Li
Gui-Bin Bian
Hong Liu
174
2
0
03 Nov 2025
CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing
CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert RoutingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Yifan Zhou
Tianshi Xu
Jue Hong
Ye Wu
Meng Li
MoE
529
0
0
03 Nov 2025
Consciousness-ECG Transformer for Conscious State Estimation System with Real-Time Monitoring
Consciousness-ECG Transformer for Conscious State Estimation System with Real-Time MonitoringExpert systems with applications (ESWA), 2025
Young-Seok Kweon
Gi-Hwan Shin
Ji-Yong Kim
Bokyeong Ryu
Seong-Whan Lee
108
0
0
31 Oct 2025
Continuous Autoregressive Language Models
Continuous Autoregressive Language Models
Chenze Shao
Darren Li
Fandong Meng
Jie Zhou
KELM
310
0
0
31 Oct 2025
Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model
Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model
Biao Zhang
Yong Cheng
Siamak Shakeri
Xinyi Wang
Min Ma
Orhan Firat
141
1
0
30 Oct 2025
Do LLMs Signal When They're Right? Evidence from Neuron Agreement
Do LLMs Signal When They're Right? Evidence from Neuron Agreement
Kang Chen
Yaoning Wang
Kai Xiong
Zhuoka Feng
Wenhe Sun
Haotian Chen
Yixin Cao
76
1
0
30 Oct 2025
Emu3.5: Native Multimodal Models are World Learners
Emu3.5: Native Multimodal Models are World Learners
Yufeng Cui
Honghao Chen
Haoge Deng
X. Y. Huang
Xinghang Li
...
Zhuo Chen
Yulong Ao
Tiejun Huang
Zhongyuan Wang
Xinlong Wang
MLLMVGen
451
16
0
30 Oct 2025
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats
Mengzhao Chen
Meng Wu
Hui Jin
Zhihang Yuan
Jing Liu
...
Jin Ma
Zeyue Xue
Zhiheng Liu
Xingyan Bin
Ping Luo
MQ
238
1
0
29 Oct 2025
BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training
BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training
Wenjie Zhou
Bohan Wang
Wei Chen
Xueqi Cheng
104
0
0
29 Oct 2025
MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency
MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency
Nicolas Dufour
Lucas Degeorge
Arijit Ghosh
Vicky Kalogeiton
David Picard
EGVM
376
1
0
29 Oct 2025
DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation
DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation
Jingyi Tian
Le Wang
Sanping Zhou
Sen Wang
Jiayi Li
Gang Hua
104
0
0
28 Oct 2025
HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
Joungbin An
Kristen Grauman
Mamba
257
0
0
27 Oct 2025
Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
Xu Zhang
Ruijie Quan
Wenguan Wang
Yi Yang
DiffM
96
0
0
25 Oct 2025
Streaming Generation for Music Accompaniment
Streaming Generation for Music Accompaniment
Yusong Wu
Mason Wang
Heidi Lei
Stephen Brade
Lancelot Blanchard
Shih-Lun Wu
Aaron Courville
Anna Huang
88
0
0
25 Oct 2025
Smule Renaissance Small: Efficient General-Purpose Vocal Restoration
Smule Renaissance Small: Efficient General-Purpose Vocal Restoration
Yongyi Zang
Chris Manchester
David Young
Ivan Ivanov
Jeffrey Lufkin
...
Svetoslav Kepchelev
Fei Yueh Chen
Dongting Cai
Teodor Naydenov
Randal Leistikow
105
0
0
24 Oct 2025
A Unified Model for Multi-Task Drone Routing in Post-Disaster Road Assessment
A Unified Model for Multi-Task Drone Routing in Post-Disaster Road Assessment
Huatian Gong
Jiuh-Biing Sheu
Zheng Wang
Xiaoguang Yang
Ran Yan
AI4CE
209
0
0
24 Oct 2025
REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects
REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects
Yassine El Ouahidi
Jonathan Lys
Philipp Tholke
Nicolas Farrugia
Bastien Pasdeloup
Vincent Gripon
Karim Jerbi
G. Lioi
AI4TSVLM
122
0
0
24 Oct 2025
SEMPO: Lightweight Foundation Models for Time Series Forecasting
SEMPO: Lightweight Foundation Models for Time Series Forecasting
Hui He
Kun Yi
Yuanchi Ma
Qi Zhang
ZhenDong Niu
Guansong Pang
AI4TS
142
0
0
22 Oct 2025
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
Gunshi Gupta
Karmesh Yadav
Z. Kira
Y. Gal
Rahaf Aljundi
OffRL
132
0
0
22 Oct 2025
Forging GEMs: Advancing Greek NLP through Quality-Based Corpus Curation
Forging GEMs: Advancing Greek NLP through Quality-Based Corpus Curation
Alexandra Apostolopoulou
Konstantinos Kanaris
Athanasios Koursaris
Dimitris Tsakalidis
George Domalis
I. Livieris
173
0
0
22 Oct 2025
The Free Transformer
The Free Transformer
François Fleuret
56
0
0
20 Oct 2025
1234...171819
Next