ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXivPDFHTML

Papers citing "GLU Variants Improve Transformer"

50 / 647 papers shown
Title
Introducing DictaLM -- A Large Generative Language Model for Modern
  Hebrew
Introducing DictaLM -- A Large Generative Language Model for Modern Hebrew
Shaltiel Shmidman
Avi Shmidman
Amir DN Cohen
Moshe Koppel
25
0
0
25 Sep 2023
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot
  Compression
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression
Ayush Kaushal
Tejas Vaidhya
Irina Rish
52
15
0
25 Sep 2023
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Nolan Dey
Daria Soboleva
Faisal Al-Khateeb
Bowen Yang
Ribhu Pathria
...
Robert Myers
Jacob Robert Steeves
Natalia Vassilieva
Marvin Tom
Joel Hestness
MoE
19
14
0
20 Sep 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
W. Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric P. Xing
25
44
0
19 Sep 2023
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model
  Pre-trained from Scratch
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch
Juntao Li
Zecheng Tang
Yuyang Ding
Pinzheng Wang
Pei Guo
...
Wenliang Chen
Guohong Fu
Qiaoming Zhu
Guodong Zhou
M. Zhang
40
5
0
19 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
66
701
0
19 Sep 2023
AMuRD: Annotated Arabic-English Receipt Dataset for Key Information
  Extraction and Classification
AMuRD: Annotated Arabic-English Receipt Dataset for Key Information Extraction and Classification
Abdelrahman Abdallah
Mahmoud Abdalla
Mohamed Elkasaby
Yasser Elbendary
Adam Jatowt
25
0
0
18 Sep 2023
XGen-7B Technical Report
XGen-7B Technical Report
Erik Nijkamp
Tian Xie
Hiroaki Hayashi
Bo Pang
Congying Xia
...
Chien-Sheng Wu
Silvio Savarese
Yingbo Zhou
Shafiq R. Joty
Caiming Xiong
ALM
26
12
0
07 Sep 2023
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style
  Models with Limited Resources
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources
Piotr Nawrot
AI4CE
17
5
0
05 Sep 2023
Language Models for Novelty Detection in System Call Traces
Language Models for Novelty Detection in System Call Traces
Quentin Fournier
Daniel Aloise
Leandro R. Costa
AI4TS
22
4
0
05 Sep 2023
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Daoyuan Chen
Yilun Huang
Zhijian Ma
Hesen Chen
Xuchen Pan
...
Zhaoyang Liu
Jinyang Gao
Yaliang Li
Bolin Ding
Jingren Zhou
SyDa
VLM
18
29
0
05 Sep 2023
LLM and Infrastructure as a Code use case
LLM and Infrastructure as a Code use case
Thibault Chanus
Michael Aubertin
6
2
0
04 Sep 2023
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open
  Generative Large Language Models
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
Neha Sengupta
Sunil Kumar Sahu
Bokang Jia
Satheesh Katipomu
Haonan Li
...
A. Jackson
Hector Xuguang Ren
Preslav Nakov
Timothy Baldwin
Eric P. Xing
LRM
16
40
0
30 Aug 2023
Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual
  Predatory Chats and Abusive Texts
Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts
Thanh Thi Nguyen
Campbell Wilson
Janis Dalins
17
15
0
28 Aug 2023
Aligning Language Models with Offline Learning from Human Feedback
Aligning Language Models with Offline Learning from Human Feedback
Jian Hu
Li Tao
J. Yang
Chandler Zhou
ALM
OffRL
22
6
0
23 Aug 2023
Cabrita: closing the gap for foreign languages
Cabrita: closing the gap for foreign languages
Celio H. N. Larcher
Marcos Piau
Paulo Finardi
P. Gengo
P. Esposito
Vinicius Fernandes Caridá
CLL
11
19
0
23 Aug 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
52
14
0
23 Aug 2023
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering
  with Large Language Models
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models
Zihan Zhao
Yiyang Jiang
Heyang Liu
Yanfeng Wang
Yu Wang
23
1
0
20 Aug 2023
Token-Scaled Logit Distillation for Ternary Weight Generative Language
  Models
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Minsoo Kim
Sihwa Lee
Jangwhan Lee
S. Hong
Duhyeuk Chang
Wonyong Sung
Jungwook Choi
MQ
16
14
0
13 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
23
3
0
07 Aug 2023
A Novel Convolutional Neural Network Architecture with a Continuous
  Symmetry
A Novel Convolutional Neural Network Architecture with a Continuous Symmetry
Y. Liu
Han-Juan Shao
Bing Bai
AI4CE
24
2
0
03 Aug 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
88
10,947
0
18 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
13
41
0
12 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
46
523
0
12 Jul 2023
ReLoRA: High-Rank Training Through Low-Rank Updates
ReLoRA: High-Rank Training Through Low-Rank Updates
Vladislav Lialin
Namrata Shivagunde
Sherin Muckatira
Anna Rumshisky
BDL
29
93
0
11 Jul 2023
Self-supervised adversarial masking for 3D point cloud representation
  learning
Self-supervised adversarial masking for 3D point cloud representation learning
Michal Szachniewicz
Wojciech Kozlowski
Michal Stypulkowski
Maciej Ziȩba
3DPC
11
2
0
11 Jul 2023
On decoder-only architecture for speech-to-text and large language model
  integration
On decoder-only architecture for speech-to-text and large language model integration
Jian Wu
Yashesh Gaur
Zhuo Chen
Long Zhou
Yilun Zhu
...
Jinyu Li
Shujie Liu
Bo Ren
Linquan Liu
Yu-Huan Wu
AuLLM
22
117
0
08 Jul 2023
Trainable Transformer in Transformer
Trainable Transformer in Transformer
A. Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
VLM
27
12
0
03 Jul 2023
Leveraging Cross-Utterance Context For ASR Decoding
Leveraging Cross-Utterance Context For ASR Decoding
Robert Flynn
Anton Ragni
20
1
0
29 Jun 2023
Reconstructing the Hemodynamic Response Function via a Bimodal
  Transformer
Reconstructing the Hemodynamic Response Function via a Bimodal Transformer
Yoni Choukroun
Lior Golgher
P. Blinder
L. Wolf
MedIm
14
0
0
28 Jun 2023
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species
  Genome
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
Zhihan Zhou
Yanrong Ji
Weijian Li
Pratik Dutta
R. Davuluri
Han Liu
14
170
0
26 Jun 2023
Towards Stability of Autoregressive Neural Operators
Towards Stability of Autoregressive Neural Operators
Michael McCabe
P. Harrington
Shashank Subramanian
Jed Brown
AI4CE
36
17
0
18 Jun 2023
Recurrent Action Transformer with Memory
Recurrent Action Transformer with Memory
A. Staroverov
A. Bessonov
Dmitry A. Yudin
A. Kovalev
Aleksandr I. Panov
OffRL
33
4
0
15 Jun 2023
Understanding Optimization of Deep Learning via Jacobian Matrix and
  Lipschitz Constant
Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant
Xianbiao Qi
Jianan Wang
Lei Zhang
13
0
0
15 Jun 2023
AutoML in the Age of Large Language Models: Current Challenges, Future
  Opportunities and Risks
AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks
Alexander Tornede
Difan Deng
Theresa Eimer
Joseph Giovanelli
Aditya Mohan
...
Sarah Segel
Daphne Theodorakopoulos
Tanja Tornede
Henning Wachsmuth
Marius Lindauer
28
22
0
13 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling
Exposing Attention Glitches with Flip-Flop Language Modeling
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
LRM
27
46
0
01 Jun 2023
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large
  Language Models
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Gen Luo
Yiyi Zhou
Tianhe Ren
Shen Chen
Xiaoshuai Sun
Rongrong Ji
VLM
MLLM
21
89
0
24 May 2023
Just CHOP: Embarrassingly Simple LLM Compression
Just CHOP: Embarrassingly Simple LLM Compression
A. Jha
Tom Sherborne
Evan Pete Walsh
Dirk Groeneveld
Emma Strubell
Iz Beltagy
20
3
0
24 May 2023
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
Abhinav Jangda
Saeed Maleki
M. Dehnavi
Madan Musuvathi
Olli Saarikivi
22
5
0
22 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
16
114
0
18 May 2023
Less is More! A slim architecture for optimal language translation
Less is More! A slim architecture for optimal language translation
Luca Herranz-Celotti
E. Rrapaj
28
0
0
18 May 2023
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric
  Kernels
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels
Alexander Moreno
Jonathan Mei
Luke Walters
10
0
0
15 May 2023
T-former: An Efficient Transformer for Image Inpainting
T-former: An Efficient Transformer for Image Inpainting
Ye Deng
Siqi Hui
Sanping Zhou
Deyu Meng
Jinjun Wang
ViT
11
29
0
12 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health
  Management: A Survey and Roadmaps
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Yanfang Li
Huan Wang
Muxia Sun
LM&MA
AI4TS
AI4CE
19
45
0
10 May 2023
XTab: Cross-table Pretraining for Tabular Transformers
XTab: Cross-table Pretraining for Tabular Transformers
Bingzhao Zhu
Xingjian Shi
Nick Erickson
Mu Li
George Karypis
Mahsa Shoaran
LMTD
26
65
0
10 May 2023
Toeplitz Neural Network for Sequence Modeling
Toeplitz Neural Network for Sequence Modeling
Zhen Qin
Xiaodong Han
Weixuan Sun
Bowen He
Dong Li
Dongxu Li
Yuchao Dai
Lingpeng Kong
Yiran Zhong
AI4TS
ViT
30
40
0
08 May 2023
A technical note on bilinear layers for interpretability
A technical note on bilinear layers for interpretability
Lee D. Sharkey
FAtt
6
6
0
05 May 2023
A Theory on Adam Instability in Large-Scale Machine Learning
A Theory on Adam Instability in Large-Scale Machine Learning
Igor Molybog
Peter Albert
Moya Chen
Zach DeVito
David Esiobu
...
Puxin Xu
Yuchen Zhang
Melanie Kambadur
Stephen Roller
Susan Zhang
AI4CE
25
29
0
19 Apr 2023
The MiniPile Challenge for Data-Efficient Language Models
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoE
ALM
24
41
0
17 Apr 2023
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
Yiming Cui
Ziqing Yang
Xin Yao
ALM
26
292
0
17 Apr 2023
Previous
123...10111213
Next