GLU Variants Improve Transformer

12 February 2020

Noam M. Shazeer

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown

Exploring the Benefit of Activation Sparsity in Pre-trainingInternational Conference on Machine Learning (ICML), 2024

Zhengyan Zhang

Chaojun Xiao

Qiujieli Qin

Yankai Lin

Zhiyuan Zeng

Xu Han

Zhiyuan Liu

Ruobing Xie

Maosong Sun

Jie Zhou

MoE

238

04 Oct 2024

Exploring the Limitations of Mamba in COPY and CoT Reasoning

Ruifeng Ren

Zhicong Li

Yong Liu

254

04 Oct 2024

ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI

344

03 Oct 2024

Selective Attention Improves TransformerInternational Conference on Learning Representations (ICLR), 2024

Yaniv Leviathan

Matan Kalman

Yossi Matias

357

03 Oct 2024

Neutral Residues: Revisiting Adapters for Model Extension

Franck Signe Talla

Edouard Grave

367

03 Oct 2024

Training Language Models on Synthetic Edit Sequences Improves Code SynthesisInternational Conference on Learning Representations (ICLR), 2024

Ulyana Piterbarg

Lerrel Pinto

Rob Fergus

SyDa

448

03 Oct 2024

FutureFill: Fast Generation from Convolutional Sequence Models

260

02 Oct 2024

Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge AcquisitionInternational Conference on Learning Representations (ICLR), 2024

Minjoon Seo

1.0K

02 Oct 2024

Circuit Compositions: Exploring Modular Structures in Transformer-Based Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Philipp Mondorf

Sondre Wold

Yun Xue

501

02 Oct 2024

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

308

02 Oct 2024

Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets

Yuandong Tian

440

02 Oct 2024

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus DatasetComputer Vision and Pattern Recognition (CVPR), 2024

Xiao Wang

Yuehang Li

Chuanfu Li

Jin Tang

345

01 Oct 2024

End-to-end Piano Performance-MIDI to Score Conversion with TransformersInternational Society for Music Information Retrieval Conference (ISMIR), 2024

T. Beyer

Angela Dai

218

30 Sep 2024

Characterizing and Efficiently Accelerating Multimodal Generation Model Inference

...

475

30 Sep 2024

Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs

...

323

30 Sep 2024

Efficient Long-Form Speech Recognition for General Speech In-Context LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Hao Yen

Shaoshi Ling

Guoli Ye

164

29 Sep 2024

Emu3: Next-Token Prediction is All You Need

Xinlong Wang

Xiaosong Zhang

Zhengxiong Luo

Quan-Sen Sun

Yufeng Cui

...

Xi Yang

Jingjing Liu

Yonghua Lin

Tiejun Huang

Zhongyuan Wang

MLLM

290

483

27 Sep 2024

Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical DocumentsInternational Conference on Asian Digital Libraries (ICADL), 2024

Emanuela Boros

Maud Ehrmann

240

25 Sep 2024

The Credibility TransformerEuropean Actuarial Journal (EAJ), 2024

Ronald Richman

Salvatore Scognamiglio

M. Wüthrich

207

25 Sep 2024

Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement

Ke Zhang

Xuelong Li

187

25 Sep 2024

EuroLLM: Multilingual Language Models for Europe

Pedro Henrique Martins

Patrick Fernandes

...

Alexandra Birch

André F. T. Martins

228

24 Sep 2024

dnaGrinder: a lightweight and high-capacity genomic foundation model

Qihang Zhao

Chi Zhang

Weixiong Zhang

183

24 Sep 2024

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of ExpertsInternational Conference on Learning Representations (ICLR), 2024

Qingsong Wen

633

169

24 Sep 2024

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

Ang Li

175

23 Sep 2024

Enhancing Aspect-based Sentiment Analysis in Tourism Using Large Language Models and Positional Information

219

23 Sep 2024

Is Tokenization Needed for Masked Particle Modelling?

Matthew Leigh

Samuel Klein

François Charton

Tobias Golling

Lukas Heinrich

Michael Kagan

Ines Ochoa

Margarita Osadchy

238

19 Sep 2024

Mastering Chess with a Transformer Model

Daniel Monroe

The Leela Chess Zero Team

246

18 Sep 2024

Kolmogorov-Arnold Transformer

Xingyi Yang

Xinchao Wang

258

16 Sep 2024

Cross-modality image synthesis from TOF-MRA to CTA using diffusion-based models

Dietmar Frey

210

16 Sep 2024

Flash STU: Fast Spectral Transform Units

474

16 Sep 2024

Ruri: Japanese General Text Embeddings

Hayato Tsukagoshi

Ryohei Sasano

144

12 Sep 2024

Gated Slot Attention for Efficient Linear-Time Sequence ModelingNeural Information Processing Systems (NeurIPS), 2024

Yu Zhang

...

Bailin Wang

Guohong Fu

297

11 Sep 2024

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Ying Shan

611

104

06 Sep 2024

Attention Heads of Large Language Models: A SurveyPatterns (Patterns), 2024

Yezhaohui Wang

Bo Tang

Zhiyu Li

287

05 Sep 2024

CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective SparsificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Chun Jason Xue

02 Sep 2024

CogVLM2: Visual Language Models for Image and Video Understanding

...

Bin Xu

Juanzi Li

Yuxiao Dong

Jie Tang

VLM MLLM

303

198

29 Aug 2024

Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts

Sara Hooker

213

28 Aug 2024

SpineMamba: Enhancing 3D Spinal Segmentation in Clinical Imaging through Residual Visual Mamba Layers and Shape Priors

Zhiqing Zhang

Tianyong Liu

Guojia Fan

Bin Li

Qianjin Feng

Shoujun Zhou

Mamba

228

28 Aug 2024

Flexible Control in Symbolic Music Generation via Musical Metadata

Heejin Kim

Seoyoon Kim

Yountae Jung

Woohyung Lim

235

28 Aug 2024

Legilimens: Practical and Unified Content Moderation for Large Language Model ServicesConference on Computer and Communications Security (CCS), 2024

356

28 Aug 2024

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

...

Bingning Wang

Weipeng Chen

210

27 Aug 2024

CLLMFS: A Contrastive Learning enhanced Large Language Model Framework for Few-Shot Named Entity RecognitionEuropean Conference on Artificial Intelligence (ECAI), 2024

214

23 Aug 2024

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Lili Yu

265

294

20 Aug 2024

Beyond Labels: Aligning Large Language Models with Human-like ReasoningInternational Conference on Pattern Recognition (ICPR), 2024

Muhammad Rafsan Kabir

Rafeed Mohammad Sultan

Mohammad Ruhul Amin

190

20 Aug 2024

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Sara Hooker

274

20 Aug 2024

Performance Law of Large Language Models

Chuhan Wu

Ruiming Tang

LRM

301

19 Aug 2024

OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction

Zhonghang Li

199

16 Aug 2024

CROME: Cross-Modal Adapters for Efficient Multimodal LLM

Sayna Ebrahimi

Sercan O. Arik

Tejas Nama

Tomas Pfister

188

13 Aug 2024

Fast-and-Frugal Text-Graph Transformers are Effective Link PredictorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Andrei Catalin Coman

Christos Theodoropoulos

Marie-Francine Moens

James Henderson

458

13 Aug 2024

FuxiTranyu: A Multilingual Large Language Model Trained with Balanced DataConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Supryadi

...

Lei Yang

Ling Shi

Juesi Xiao

Shaolin Zhu

Deyi Xiong

215

12 Aug 2024