Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2002.05202
Cited By

GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020

Noam M. Shazeer

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown

The Free Transformer

The Free Transformer

François Fleuret

68

0

0

20 Oct 2025

MuonBP: Faster Muon via Block-Periodic Orthogonalization

MuonBP: Faster Muon via Block-Periodic Orthogonalization

96

3

0

19 Oct 2025

Finding Manifolds With Bilinear Autoencoders

Finding Manifolds With Bilinear Autoencoders

91

0

0

19 Oct 2025

NeurIPT: Foundation Model for Neural Interfaces

NeurIPT: Foundation Model for Neural Interfaces

98

3

0

18 Oct 2025

Sequence Modeling with Spectral Mean Flows

Sequence Modeling with Spectral Mean Flows

Nicolas Hoischen

170

0

0

17 Oct 2025

SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling

SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling

Kadri Hacioğlu

Andreas Stolcke

126

1

0

17 Oct 2025

Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology

Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology

221

0

0

16 Oct 2025

Adapting Self-Supervised Representations as a Latent Space for Efficient Generation

Adapting Self-Supervised Representations as a Latent Space for Efficient Generation

Johannes Schusterbauer

Miguel Angel Bautista

201

1

0

16 Oct 2025

REAP the Experts: Why Pruning Prevails for One-Shot MoE compression

REAP the Experts: Why Pruning Prevails for One-Shot MoE compression

Ivan Lazarevich

Nish Sinnadurai

Yani Andrew Ioannou

Vithursan Thangarasa

120

1

0

15 Oct 2025

Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models

Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models

Daniil Gurgurov

Josef van Genabith

Simon Ostermann

201

0

0

15 Oct 2025

DiSTAR: Diffusion over a Scalable Token Autoregressive Representation for Speech Generation

DiSTAR: Diffusion over a Scalable Token Autoregressive Representation for Speech Generation

...

196

0

0

14 Oct 2025

Simple Projection Variants Improve ColBERT Performance

Simple Projection Variants Improve ColBERT Performance

Benjamin Clavié

140

1

0

14 Oct 2025

SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models

SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models

208

4

0

14 Oct 2025

What If : Understanding Motion Through Sparse Interactions

What If : Understanding Motion Through Sparse Interactions

135

0

0

14 Oct 2025

Vision-LLMs for Spatiotemporal Traffic Forecasting

Vision-LLMs for Spatiotemporal Traffic Forecasting

121

1

0

13 Oct 2025

High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation

High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation

Tze Ho Elden Tse

143

0

0

13 Oct 2025

DAWP: A framework for global observation forecasting via Data Assimilation and Weather Prediction in satellite observation space

DAWP: A framework for global observation forecasting via Data Assimilation and Weather Prediction in satellite observation space

124

0

0

13 Oct 2025

Hierarchical Scheduling for Multi-Vector Image Retrieval

Hierarchical Scheduling for Multi-Vector Image Retrieval

118

1

0

10 Oct 2025

Understanding the Effects of Domain Finetuning on LLMs

Understanding the Effects of Domain Finetuning on LLMs

William Yang Wang

Tanmoy Chakraborty

130

0

0

10 Oct 2025

iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation

iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation

103

0

0

10 Oct 2025

Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models

Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models

61

0

0

10 Oct 2025

From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill

From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill

121

0

0

09 Oct 2025

Scaling Laws for Code: A More Data-Hungry Regime

Scaling Laws for Code: A More Data-Hungry Regime

110

2

0

09 Oct 2025

Evaluation of a Robust Control System in Real-World Cable-Driven Parallel Robots

Evaluation of a Robust Control System in Real-World Cable-Driven Parallel Robots

Damir Nurtdinov

Aliaksei Korshuk

Alexander Maloletov

73

0

0

09 Oct 2025

Mid-Training of Large Language Models: A Survey

Mid-Training of Large Language Models: A Survey

151

0

0

08 Oct 2025

Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

...

Ming-Hsuan Yang

155

8

0

08 Oct 2025

Language Lives in Sparse Dimensions: Toward Interpretable and Efficient Multilingual Control for Large Language Models

Language Lives in Sparse Dimensions: Toward Interpretable and Efficient Multilingual Control for Large Language Models

Sadao Kurohashi

132

0

0

08 Oct 2025

Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies

Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies

127

0

0

07 Oct 2025

SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization

SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization

Théophane Vallaeys

227

3

0

06 Oct 2025

Scaling Sequence-to-Sequence Generative Neural Rendering

Scaling Sequence-to-Sequence Generative Neural Rendering

...

Juan-Manuel Perez-Rua

129

0

0

05 Oct 2025

A Unified Deep Reinforcement Learning Approach for Close Enough Traveling Salesman Problem

A Unified Deep Reinforcement Learning Approach for Close Enough Traveling Salesman Problem

Guillaume Sartoretti

105

0

0

03 Oct 2025

SoundReactor: Frame-level Online Video-to-Audio Generation

SoundReactor: Frame-level Online Video-to-Audio Generation

Christian Simon

Takashi Shibuya

241

0

0

02 Oct 2025

Litespark Technical Report: High-Throughput, Energy-Efficient LLM Training Framework

Litespark Technical Report: High-Throughput, Energy-Efficient LLM Training Framework

Nii Osae Osae Dade

Moinul Hossain Rahat

144

0

0

02 Oct 2025

Uncovering the Computational Ingredients of Human-Like Representations in LLMs

Uncovering the Computational Ingredients of Human-Like Representations in LLMs

Zach Studdiford

Timothy T. Rogers

Kushin Mukherjee

Siddharth Suresh

162

0

0

01 Oct 2025

Eliciting Chain-of-Thought Reasoning for Time Series Analysis using Reinforcement Learning

Eliciting Chain-of-Thought Reasoning for Time Series Analysis using Reinforcement Learning

AI4TS OffRL LRM

136

1

0

01 Oct 2025

Composer: A Search Framework for Hybrid Neural Architecture Design

Composer: A Search Framework for Hybrid Neural Architecture Design

Newsha Ardalani

Meghana Madhyastha

222

1

0

01 Oct 2025

Flock: A Knowledge Graph Foundation Model via Learning on Random Walks

Flock: A Knowledge Graph Foundation Model via Learning on Random Walks

Krzysztof Olejniczak

.Ismail .Ilkan Ceylan

273

1

0

01 Oct 2025

Swift: An Autoregressive Consistency Model for Efficient Weather Forecasting

Swift: An Autoregressive Consistency Model for Efficient Weather Forecasting

161

4

0

30 Sep 2025

Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining

Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining

Dan John Velasco

117

1

0

29 Sep 2025

Scalable GANs with Transformers

Scalable GANs with Transformers

110

1

0

29 Sep 2025

Training Agents Inside of Scalable World Models

Training Agents Inside of Scalable World Models

Timothy Lillicrap

208

21

0

29 Sep 2025

Hyperspherical Latents Improve Continuous-Token Autoregressive Generation

Hyperspherical Latents Improve Continuous-Token Autoregressive Generation

137

3

0

29 Sep 2025

Scaling with Collapse: Efficient and Predictable Training of LLM Families

Scaling with Collapse: Efficient and Predictable Training of LLM Families

Bin Claire Zhang

Shaheer Muhammad

133

2

0

29 Sep 2025

Pretraining with hierarchical memories: separating long-tail and common knowledge

Pretraining with hierarchical memories: separating long-tail and common knowledge

Hadi Pouransari

Michael Kirchhof

240

1

0

29 Sep 2025

UniVid: The Open-Source Unified Video Model

UniVid: The Open-Source Unified Video Model

276

8

0

29 Sep 2025

Efficient Hyperparameter Tuning via Trajectory Invariance Principle

Efficient Hyperparameter Tuning via Trajectory Invariance Principle

83

0

0

29 Sep 2025

AuON: A Linear-time Alternative to Orthogonal Momentum Updates

AuON: A Linear-time Alternative to Orthogonal Momentum Updates

146

0

0

29 Sep 2025

Negative Pre-activations Differentiate Syntax

Negative Pre-activations Differentiate Syntax

120

0

0

29 Sep 2025

LLaDA-MoE: A Sparse MoE Diffusion Language Model

LLaDA-MoE: A Sparse MoE Diffusion Language Model

...

236

12

0

29 Sep 2025

Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs

Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs

162

0

0

29 Sep 2025

1 2 3 4 5...17 18 19