Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2002.05202
Cited By

GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020

Noam M. Shazeer

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown

Equivalence of Context and Parameter Updates in Modern Transformer Blocks

Equivalence of Context and Parameter Updates in Modern Transformer Blocks

Adrian Goldwaser

88

0

0

24 Dec 2025

Jina-VLM: Small Multilingual Vision Language Model

Jina-VLM: Small Multilingual Vision Language Model

Andreas Koukounas

Georgios Mastrapas

Florian Hönicke

Sedigheh Eslami

Guillaume Roncari

335

0

0

03 Dec 2025

Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study

Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study

96

0

0

03 Dec 2025

AutoBrep: Autoregressive B-Rep Generation with Unified Topology and Geometry

AutoBrep: Autoregressive B-Rep Generation with Unified Topology and Geometry

Joseph George Lambourne

Durvesh Malpure

145

0

0

02 Dec 2025

ViT$^3$: Unlocking Test-Time Training in Vision

^3

: Unlocking Test-Time Training in Vision

56

0

0

01 Dec 2025

Improved Mean Flows: On the Challenges of Fastforward Generative Models

116

1

0

01 Dec 2025

Scaling and context steer LLMs along the same computational path as the human brain

Joséphine Raugel

Stéphane DÁscoli

116

0

0

01 Dec 2025

AI-Enabled grading with near-domain data for scaling feedback with human-level accuracy

AI-Enabled grading with near-domain data for scaling feedback with human-level accuracy

Kevin C. Haudek

181

0

0

01 Dec 2025

Estimating the Event-Related Potential from Few EEG Trials

Estimating the Event-Related Potential from Few EEG Trials

Anders Vestergaard Nørskov

Kasper Jørgensen

Alexander Neergaard Zahid

100

0

0

28 Nov 2025

SpaceMind: Camera-Guided Modality Fusion for Spatial Reasoning in Vision-Language Models

SpaceMind: Camera-Guided Modality Fusion for Spatial Reasoning in Vision-Language Models

150

0

0

28 Nov 2025

DisMo: Disentangled Motion Representations for Open-World Motion Transfer

DisMo: Disentangled Motion Representations for Open-World Motion Transfer

Thomas Ressler-Antal

Malek Ben Alaya

97

0

0

28 Nov 2025

ABounD: Adversarial Boundary-Driven Few-Shot Learning for Multi-Class Anomaly Detection

ABounD: Adversarial Boundary-Driven Few-Shot Learning for Multi-Class Anomaly Detection

Xinshuang Zhang

44

0

0

27 Nov 2025

On the Origin of Algorithmic Progress in AI

On the Origin of Algorithmic Progress in AI

Jonathan Rosenfeld

80

0

0

26 Nov 2025

Subjective Depth and Timescale Transformers: Learning Where and When to Compute

Subjective Depth and Timescale Transformers: Learning Where and When to Compute

Frederico Wieser

Martin A Benfeghoul

Haitham Bou-Ammar

Zafeirios Fountas

118

0

0

26 Nov 2025

Adam Simplified: Bias Correction Debunked

Adam Simplified: Bias Correction Debunked

Antonio Orvieto

128

0

0

25 Nov 2025

3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding

3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding

189

0

0

25 Nov 2025

Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling

Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling

338

0

0

24 Nov 2025

VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking

VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking

164

0

0

24 Nov 2025

MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection

MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection

Alex Chichung Kot

192

0

0

22 Nov 2025

MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning

MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning

80

0

0

21 Nov 2025

CAMS: Towards Compositional Zero-Shot Learning via Gated Cross-Attention and Multi-Space Disentanglement

289

0

0

20 Nov 2025

Decoupling Complexity from Scale in Latent Diffusion Model

Tianxiong Zhong

316

0

0

20 Nov 2025

Analysis of heart failure patient trajectories using sequence modeling

Analysis of heart failure patient trajectories using sequence modeling

Annika Rosengren

Martin Lindgren

Christina E. Lundberg

283

0

0

20 Nov 2025

OPFormer: Object Pose Estimation leveraging foundation model with geometric encoding

OPFormer: Object Pose Estimation leveraging foundation model with geometric encoding

Martin Mikšík

Elizaveta Isianova

88

0

0

16 Nov 2025

CellARC: Measuring Intelligence with Cellular Automata

CellARC: Measuring Intelligence with Cellular Automata

Miroslav Lžičař

84

0

0

11 Nov 2025

oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention

oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention

Ryusuke Mizutani

Tsugumi Kadowaki

158

0

0

11 Nov 2025

Learning to Focus: Focal Attention for Selective and Scalable Transformers

Learning to Focus: Focal Attention for Selective and Scalable Transformers

284

0

0

10 Nov 2025

SyMuPe: Affective and Controllable Symbolic Music Performance

SyMuPe: Affective and Controllable Symbolic Music Performance

Dmitrii Gavrilev

104

0

0

05 Nov 2025

Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining

Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining

Costin-Andrei Oncescu

Ben Athiwaratkun

190

0

0

04 Nov 2025

MoSa: Motion Generation with Scalable Autoregressive Modeling

MoSa: Motion Generation with Scalable Autoregressive Modeling

174

2

0

03 Nov 2025

CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing

CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert RoutingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

529

0

0

03 Nov 2025

Consciousness-ECG Transformer for Conscious State Estimation System with Real-Time Monitoring

Consciousness-ECG Transformer for Conscious State Estimation System with Real-Time MonitoringExpert systems with applications (ESWA), 2025

Young-Seok Kweon

108

0

0

31 Oct 2025

Continuous Autoregressive Language Models

Continuous Autoregressive Language Models

310

0

0

31 Oct 2025

Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model

Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model

141

1

0

30 Oct 2025

Do LLMs Signal When They're Right? Evidence from Neuron Agreement

Do LLMs Signal When They're Right? Evidence from Neuron Agreement

76

1

0

30 Oct 2025

Emu3.5: Native Multimodal Models are World Learners

Emu3.5: Native Multimodal Models are World Learners

...

451

16

0

30 Oct 2025

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

...

238

1

0

29 Oct 2025

BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training

BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training

104

0

0

29 Oct 2025

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

Vicky Kalogeiton

376

1

0

29 Oct 2025

DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation

DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation

104

0

0

28 Oct 2025

HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

Kristen Grauman

257

0

0

27 Oct 2025

Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction

Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction

96

0

0

25 Oct 2025

Streaming Generation for Music Accompaniment

Streaming Generation for Music Accompaniment

Lancelot Blanchard

Aaron Courville

88

0

0

25 Oct 2025

Smule Renaissance Small: Efficient General-Purpose Vocal Restoration

Smule Renaissance Small: Efficient General-Purpose Vocal Restoration

Chris Manchester

...

Svetoslav Kepchelev

Teodor Naydenov

Randal Leistikow

105

0

0

24 Oct 2025

A Unified Model for Multi-Task Drone Routing in Post-Disaster Road Assessment

A Unified Model for Multi-Task Drone Routing in Post-Disaster Road Assessment

Jiuh-Biing Sheu

209

0

0

24 Oct 2025

REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects

REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects

Yassine El Ouahidi

Nicolas Farrugia

Bastien Pasdeloup

122

0

0

24 Oct 2025

SEMPO: Lightweight Foundation Models for Time Series Forecasting

SEMPO: Lightweight Foundation Models for Time Series Forecasting

142

0

0

22 Oct 2025

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

132

0

0

22 Oct 2025

Forging GEMs: Advancing Greek NLP through Quality-Based Corpus Curation

Forging GEMs: Advancing Greek NLP through Quality-Based Corpus Curation

Alexandra Apostolopoulou

Konstantinos Kanaris

Athanasios Koursaris

Dimitris Tsakalidis

173

0

0

22 Oct 2025

The Free Transformer

The Free Transformer

François Fleuret

56

0

0

20 Oct 2025

1 2 3 4...17 18 19