GLU Variants Improve Transformer

12 February 2020

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown

Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think -- Introducing AI Detectability IndexConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Megha Chakraborty

S.M. Towhidul Islam Tonmoy

...

Vinija Jain

197

08 Oct 2023

The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive RemediationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

S.M. Towhidul Islam Tonmoy

314

182

08 Oct 2023

ReLU Strikes Back: Exploiting Activation Sparsity in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Iman Mirzadeh

Keivan Alizadeh-Vahid

490

100

06 Oct 2023

Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts ConversionNeural Information Processing Systems (NeurIPS), 2023

265

06 Oct 2023

Predicting Emergent Abilities with Infinite Resolution EvaluationInternational Conference on Learning Representations (ICLR), 2023

Xu Han

...

Zhiyuan Liu

Maosong Sun

ELM LRM

290

05 Oct 2023

PolySketchFormer: Fast Transformers via Sketching Polynomial KernelsInternational Conference on Machine Learning (ICML), 2023

Praneeth Kacham

Vahab Mirrokni

Peilin Zhong

219

02 Oct 2023

Multilingual Natural Language Processing Model for Radiology Reports -- The Summary is all you need!

...

275

29 Sep 2023

Qwen Technical Report

Jinze Bai

Shuai Bai

Yunfei Chu

Zeyu Cui

Kai Dang

...

Zhenru Zhang

Chang Zhou

Jingren Zhou

Xiaohuan Zhou

Tianhang Zhu

OSLM

797

3,067

28 Sep 2023

Transformer-VQ: Linear-Time Transformers via Vector QuantizationInternational Conference on Learning Representations (ICLR), 2023

Albert Mohwald

249

28 Sep 2023

Introducing DictaLM -- A Large Generative Language Model for Modern Hebrew

139

25 Sep 2023

LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression

Ayush Kaushal

Tejas Vaidhya

Irina Rish

359

25 Sep 2023

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

...

246

20 Sep 2023

SlimPajama-DC: Understanding Data Combinations for LLM Training

...

437

19 Sep 2023

OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from ScratchScience China Information Sciences (Sci China Inf Sci), 2023

...

370

19 Sep 2023

Baichuan 2: Open Large-scale Language Models

...

803

923

19 Sep 2023

AMuRD: Annotated Arabic-English Receipt Dataset for Key Information Extraction and Classification

144

18 Sep 2023

XGen-7B Technical Report

Erik Nijkamp

...

Silvio Savarese

Yingbo Zhou

Shafiq Joty

Caiming Xiong

ALM

216

07 Sep 2023

Language Models for Novelty Detection in System Call Traces

205

05 Sep 2023

Data-Juicer: A One-Stop Data Processing System for Large Language Models

...

Jingren Zhou

297

05 Sep 2023

LLM and Infrastructure as a Code use case

Thibault Chanus

Michael Aubertin

120

04 Sep 2023

Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

...

381

30 Aug 2023

Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive TextsThe European Symposium on Artificial Neural Networks (ESANN), 2023

Thanh Thi Nguyen

Campbell Wilson

Janis Dalins

116

28 Aug 2023

Aligning Language Models with Offline Learning from Human Feedback

313

23 Aug 2023

Cabrita: closing the gap for foreign languages

Vinicius Fernandes Caridá

CLL

108

23 Aug 2023

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

Jiasheng Ye

Quanquan Gu

625

23 Aug 2023

LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language ModelsIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023

320

20 Aug 2023

Token-Scaled Logit Distillation for Ternary Weight Generative Language ModelsNeural Information Processing Systems (NeurIPS), 2023

157

13 Aug 2023

RecycleGPT: An Autoregressive Language Model with Recyclable Module

275

07 Aug 2023

A Novel Convolutional Neural Network Architecture with a Continuous SymmetryCAAI International Conference on Artificial Intelligence (ICCAI), 2023

332

03 Aug 2023

Llama 2: Open Foundation and Fine-Tuned Chat Models

Louis Martin

...

Sharan Narang

Sergey Edunov

8.2K

15,302

18 Jul 2023

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language ModelsNeural Information Processing Systems (NeurIPS), 2023

424

12 Jul 2023

A Comprehensive Overview of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023

Saeed Anwar

Muhammad Usman

858

1,200

12 Jul 2023

ReLoRA: High-Rank Training Through Low-Rank UpdatesInternational Conference on Learning Representations (ICLR), 2023

513

178

11 Jul 2023

Self-supervised adversarial masking for 3D point cloud representation learningAsian Conference on Intelligent Information and Database Systems (ACIIDS), 2023

160

11 Jul 2023

On decoder-only architecture for speech-to-text and large language model integrationAutomatic Speech Recognition & Understanding (ASRU), 2023

...

532

186

08 Jul 2023

Trainable Transformer in TransformerInternational Conference on Machine Learning (ICML), 2023

353

03 Jul 2023

Leveraging Cross-Utterance Context For ASR DecodingInterspeech (Interspeech), 2023

Robert Flynn

Anton Ragni

191

29 Jun 2023

Reconstructing the Hemodynamic Response Function via a Bimodal TransformerInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023

28 Jun 2023

DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome

289

309

26 Jun 2023

Towards Stability of Autoregressive Neural Operators

413

18 Jun 2023

Recurrent Action Transformer with Memory

392

15 Jun 2023

Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant

Xianbiao Qi

Jianan Wang

Lei Zhang

200

15 Jun 2023

AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks

...

Daphne Theodorakopoulos

Tanja Tornede

Henning Wachsmuth

Marius Lindauer

324

13 Jun 2023

Exposing Attention Glitches with Flip-Flop Language ModelingNeural Information Processing Systems (NeurIPS), 2023

209

01 Jun 2023

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023

294

134

24 May 2023

Just CHOP: Embarrassingly Simple LLM Compression

Dirk Groeneveld

234

24 May 2023

A Framework for Fine-Grained Synchronization of Dependent GPU KernelsIEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2023

156

22 May 2023

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Peng Wang

Shijie Wang

Junyang Lin

Shuai Bai

Xiaohuan Zhou

Jingren Zhou

Xinggang Wang

Chang Zhou

VLM MLLM ObjD

585

154

18 May 2023

Less is More! A slim architecture for optimal language translation

Luca Herranz-Celotti

E. Rrapaj

18 May 2023

SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels

Alexander Moreno

Jonathan Mei

Luke Walters

223

15 May 2023