GLU Variants Improve Transformer

12 February 2020

Noam M. Shazeer

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown

ACE: A Cardinality Estimator for Set-Valued QueriesProceedings of the VLDB Endowment (PVLDB), 2025

324

19 Mar 2025

Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels

451

18 Mar 2025

xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

274

17 Mar 2025

Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process

300

17 Mar 2025

HAR-DoReMi: Optimizing Data Mixture for Self-Supervised Human Activity Recognition Across Heterogeneous IMU Datasets

418

16 Mar 2025

FastVID: Dynamic Density Pruning for Fast Video Large Language Models

410

14 Mar 2025

Direction-Aware Diagonal Autoregressive Image Generation

409

14 Mar 2025

Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models

178

14 Mar 2025

Text Compression for Efficient Language GenerationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

David Gu

Peter Belcak

Roger Wattenhofer

242

14 Mar 2025

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

337

13 Mar 2025

FlowTok: Flowing Seamlessly Across Text and Image Tokens

527

13 Mar 2025

Autoregressive Image Generation with Vision Full-view Prompt

450

13 Mar 2025

Autoregressive Image Generation with Randomized Parallel Decoding

273

13 Mar 2025

Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining

Mikey Shechter

Yair Carmon

CLIP

379

11 Mar 2025

The Space Between: On Folding, Symmetries and Sampling

245

11 Mar 2025

MELON: Multimodal Mixture-of-Experts with Spectral-Temporal Fusion for Long-Term Mobility Estimation in Critical CareInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

...

356

10 Mar 2025

YOLOE: Real-Time Seeing Anything

543

10 Mar 2025

Small Vision-Language Models: A Survey on Compact Architectures and Techniques

Nitesh Patnaik

Navdeep Nayak

Himani Bansal Agrawal

Moinak Chinmoy Khamaru

Gourav Bal

Saishree Smaranika Panda

268

09 Mar 2025

High-Precision Dichotomous Image Segmentation via Depth Integrity-Prior and Fine-Grained Patch Strategy

564

08 Mar 2025

BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities

367

07 Mar 2025

EuroBERT: Scaling Multilingual Encoders for European Languages

Nicolas Boizard

Hippolyte Gisserot-Boukhlef

...

1.1K

07 Mar 2025

Mixture of Experts Made Intrinsically Interpretable

Xingyi Yang

Constantin Venhoff

Ashkan Khakzar

Christian Schroeder de Witt

327

05 Mar 2025

SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture

Hocheol Lim

Hyein Cho

Jeonghoon Kim

242

04 Mar 2025

Proteina: Scaling Flow-based Protein Structure Generative ModelsInternational Conference on Learning Representations (ICLR), 2025

...

305

02 Mar 2025

GPIoT: Tailoring Small Language Models for IoT Program Synthesis and DevelopmentACM International Conference on Embedded Networked Sensor Systems (SenSys), 2025

246

02 Mar 2025

Synthetic data enables context-aware bioacoustic sound event detection

...

417

01 Mar 2025

Protein Structure Tokenization: Benchmarking and New Recipe

234

28 Feb 2025

Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective

363

28 Feb 2025

Reasoning is Periodicity? Improving Large Language Models Through Effective Periodicity Modeling

...

563

28 Feb 2025

(Mis)Fitting: A Survey of Scaling Laws

Margaret Li

Sneha Kudugunta

Luke Zettlemoyer

413

26 Feb 2025

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initializationInternational Conference on Learning Representations (ICLR), 2025

336

26 Feb 2025

NeoBERT: A Next-Generation BERT

348

26 Feb 2025

Kanana: Compute-efficient Bilingual Language Models

...

Minchul Lee

364

26 Feb 2025

Patient Trajectory Prediction: Integrating Clinical Notes with Transformers

Sifal Klioui

Sana Sellami

Youssef Trardi

295

25 Feb 2025

Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

329

25 Feb 2025

Dual Classification Head Self-training Network for Cross-scene Hyperspectral Image Classification

273

25 Feb 2025

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

...

358

24 Feb 2025

Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems

Maksim Zhdanov

Max Welling

Jan-Willem van de Meent

AI4CE

325

24 Feb 2025

Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps

Yen-Che Hsiao

Abhishek Dutta

LRM ReLM ELM

255

24 Feb 2025

Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective

649

24 Feb 2025

Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI

Syed Abdul Gaffar Shakhadri

Kruthika KR

Kartik Basavaraj Angadi

VLM

186

24 Feb 2025

Predictive Modeling: BIM Command Recommendation Based on Large-scale Usage LogsAdvanced Engineering Informatics (AEI), 2025

217

23 Feb 2025

Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMsInternational Conference on Learning Representations (ICLR), 2025

336

21 Feb 2025

MoM: Linear Sequence Modeling with Mixture-of-Memories

560

19 Feb 2025

Multi-branch of Attention Yields Accurate Results for Tabular Data

253

18 Feb 2025

Baichuan-M1: Pushing the Medical Capability of Large Language Models

...

384

18 Feb 2025

Understanding Silent Data Corruption in LLM TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

218

17 Feb 2025

Frequency-Aware Masked Autoencoders for Human Activity Recognition using Accelerometers

208

17 Feb 2025

Blessing of Multilinguality: A Systematic Analysis of Multilingual In-Context LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Yilei Tu

Andrew Xue

Freda Shi

401

17 Feb 2025

Large Language Diffusion Models

1.1K

323

14 Feb 2025