ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown
Learning Linear Block Error Correction Codes
Learning Linear Block Error Correction Codes
Yoni Choukroun
Lior Wolf
175
14
0
07 May 2024
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive
  Language Model Pre-training
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Zexuan Zhong
Mengzhou Xia
Danqi Chen
Mike Lewis
MoE
212
27
0
06 May 2024
Dependency-Aware Semi-Structured Sparsity: Declining Roles of Outliers
  in Pruning GLU-based LLMs
Dependency-Aware Semi-Structured Sparsity: Declining Roles of Outliers in Pruning GLU-based LLMs
Zhiyu Guo
Hidetaka Kamigaito
Taro Wanatnabe
123
0
0
03 May 2024
Uncovering Agendas: A Novel French & English Dataset for Agenda
  Detection on Social Media
Uncovering Agendas: A Novel French & English Dataset for Agenda Detection on Social Media
Gregorios A. Katsios
Ning Sa
Ankita Bhaumik
T. Strzalkowski
139
0
0
01 May 2024
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing
Yucheng Hu
Yuxing Lu
RALM
397
31
0
30 Apr 2024
HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level
  Synthesis
HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis
Andy He
Darren Key
Mason Bulling
Andrew Chang
Skyler Shapiro
Everett Lee
166
5
0
29 Apr 2024
Efficient LLM Inference with Kcache
Efficient LLM Inference with Kcache
Qiaozhi He
Zhihua Wu
RALM
207
1
0
28 Apr 2024
Optimizing Universal Lesion Segmentation: State Space Model-Guided
  Hierarchical Networks with Feature Importance Adjustment
Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment
Kazi Shahriar Sanjid
Md. Tanzim Hossain
Md. Shakib Shahariar Junayed
M. M. Uddin
Mamba
191
2
0
26 Apr 2024
Tele-FLM Technical Report
Tele-FLM Technical Report
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Chao Wang
...
Yequan Wang
Zhongjiang He
Zhongyuan Wang
Xuelong Li
Tiejun Huang
209
11
0
25 Apr 2024
zkLLM: Zero Knowledge Proofs for Large Language Models
zkLLM: Zero Knowledge Proofs for Large Language Models
Haochen Sun
Jason Li
Hongyang Zhang
ALM
351
58
0
24 Apr 2024
Improving Dictionary Learning with Gated Sparse Autoencoders
Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan
Arthur Conmy
Lewis Smith
Tom Lieberum
Vikrant Varma
János Kramár
Rohin Shah
Neel Nanda
RALM
371
130
0
24 Apr 2024
EEGEncoder: Advancing BCI with Transformer-Based Motor Imagery
  Classification
EEGEncoder: Advancing BCI with Transformer-Based Motor Imagery Classification
Wangdan Liao
Weidong Wang
169
9
0
23 Apr 2024
OpenELM: An Efficient Language Model Family with Open Training and
  Inference Framework
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Sachin Mehta
Mohammad Hossein Sekhavat
Qingqing Cao
Maxwell Horton
Yanzi Jin
...
Iman Mirzadeh
Mahyar Najibi
Dmitry Belenko
Peter Zatloukal
Mohammad Rastegari
OSLMAIFin
323
79
0
22 Apr 2024
When Life gives you LLMs, make LLM-ADE: Large Language Models with
  Adaptive Data Engineering
When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering
Stephen Choi
William Gazeley
KELM
174
2
0
19 Apr 2024
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
  Models
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Aitor Ormazabal
Che Zheng
Cyprien de Masson dÁutume
Dani Yogatama
Deyu Fu
...
Yazheng Yang
Yi Tay
Yuqi Wang
Zhongkai Zhu
Zhihui Xie
LRMVLMReLM
259
63
0
18 Apr 2024
KV-weights are all you need for skipless transformers
KV-weights are all you need for skipless transformers
Nils Graef
195
2
0
18 Apr 2024
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large
  Language Models
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models
Yushuo Chen
Tianyi Tang
Erge Xiang
Linjiang Li
Wayne Xin Zhao
Jing Wang
Yunpeng Chai
Ji-Rong Wen
90
2
0
17 Apr 2024
HumMUSS: Human Motion Understanding using State Space Models
HumMUSS: Human Motion Understanding using State Space Models
Arnab Kumar Mondal
Stefano Alletto
Denis Tome
211
8
0
16 Apr 2024
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Haozheng Fan
Hao Zhou
Guangtai Huang
Parameswaran Raman
Xinwei Fu
Gaurav Gupta
Dhananjay Ram
Yida Wang
Jun Huan
208
12
0
16 Apr 2024
Balancing Speciality and Versatility: a Coarse to Fine Framework for
  Supervised Fine-tuning Large Language Model
Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
Hengyuan Zhang
Yanru Wu
Dawei Li
Zacc Yang
Rui Zhao
Yong Jiang
Fei Tan
ALM
444
1
0
16 Apr 2024
EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal
  LLM
EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM
Henry Peng Zou
Gavin Heqing Yu
Ziwei Fan
Dan Bu
Han Liu
Peng Dai
Dongmei Jia
Cornelia Caragea
201
18
0
13 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
203
49
0
12 Apr 2024
Adapting LLaMA Decoder to Vision Transformer
Adapting LLaMA Decoder to Vision Transformer
Jiahao Wang
Wenqi Shao
Mengzhao Chen
Chengyue Wu
Yong Liu
Taiqiang Wu
Kaipeng Zhang
Songyang Zhang
Kai-xiang Chen
Ping Luo
MLLM
337
5
0
10 Apr 2024
From Protoscience to Epistemic Monoculture: How Benchmarking Set the
  Stage for the Deep Learning Revolution
From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution
Bernard J. Koch
David Peterson
175
15
0
09 Apr 2024
MuPT: A Generative Symbolic Music Pretrained Transformer
MuPT: A Generative Symbolic Music Pretrained TransformerInternational Conference on Learning Representations (ICLR), 2024
Xingwei Qu
Yuelin Bai
Yi Ma
Ziya Zhou
Ka Man Lo
...
Xu Tan
Stephen W. Huang
Lei Ma
Jie Fu
Ge Zhang
245
25
0
09 Apr 2024
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Xinrun Du
Zhouliang Yu
Songyang Gao
Ding Pan
Yuyang Cheng
...
Tianyu Zheng
Xinchen Luo
Guorui Zhou
Lei Ma
Ge Zhang
307
27
0
05 Apr 2024
CantTalkAboutThis: Aligning Language Models to Stay on Topic in
  Dialogues
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Makesh Narsimhan Sreedhar
Traian Rebedea
Shaona Ghosh
Jiaqi Zeng
Christopher Parisien
ALM
330
11
0
04 Apr 2024
Sailor: Open Language Models for South-East Asia
Sailor: Open Language Models for South-East AsiaConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Longxu Dou
Qian Liu
Guangtao Zeng
Jia Guo
Jiahui Zhou
Wei Lu
Min Lin
LRM
280
16
0
04 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
ViTamin: Designing Scalable Vision Models in the Vision-Language EraComputer Vision and Pattern Recognition (CVPR), 2024
Jienneg Chen
Qihang Yu
Xiaohui Shen
Yaoyao Liu
Liang-Chieh Chen
3DVVLM
405
50
0
02 Apr 2024
Accelerating Transformer Pre-training with 2:4 Sparsity
Accelerating Transformer Pre-training with 2:4 SparsityInternational Conference on Machine Learning (ICML), 2024
Yuezhou Hu
Kang Zhao
Weiyu Huang
Jianfei Chen
Jun Zhu
282
17
0
02 Apr 2024
Rewrite the Stars
Rewrite the Stars
Xu Ma
Xiyang Dai
Yue Bai
Yizhou Wang
Yun Fu
252
313
0
29 Mar 2024
MambaMixer: Efficient Selective State Space Models with Dual Token and
  Channel Selection
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Ali Behrouz
Michele Santacatterina
Ramin Zabih
433
45
0
29 Mar 2024
Jamba: A Hybrid Transformer-Mamba Language Model
Jamba: A Hybrid Transformer-Mamba Language Model
Opher Lieber
Barak Lenz
Hofit Bata
Gal Cohen
Jhonathan Osin
...
Nir Ratner
N. Rozen
Erez Shwartz
Mor Zusman
Y. Shoham
415
329
0
28 Mar 2024
Mechanistic Design and Scaling of Hybrid Architectures
Mechanistic Design and Scaling of Hybrid Architectures
Michael Poli
Armin W. Thomas
Eric N. D. Nguyen
Pragaash Ponnusamy
Bjorn Deiseroth
...
Brian Hie
Stefano Ermon
Christopher Ré
Ce Zhang
Stefano Massaroli
MoE
310
49
0
26 Mar 2024
Incorporating Exponential Smoothing into MLP: A Simple but Effective
  Sequence Model
Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model
Jiqun Chu
Zuoquan Lin
AI4TS
204
2
0
26 Mar 2024
VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate
  Spatiotemporal Forecasting
VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting
Yujin Tang
Peijie Dong
Zhenheng Tang
Xiaowen Chu
Junwei Liang
Mamba
320
49
0
25 Mar 2024
KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable
  Adaptation
KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable AdaptationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Xindi Luo
Zequn Sun
Jing-xin Zhao
Zhe Zhao
Wei Hu
KELM
212
15
0
22 Mar 2024
ChatGPT Alternative Solutions: Large Language Models Survey
ChatGPT Alternative Solutions: Large Language Models Survey
H. Alipour
Nick Pendar
Kohinoor Roy
LM&MA
152
9
0
21 Mar 2024
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
ExeGPT: Constraint-Aware Resource Scheduling for LLM InferenceInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
Hyungjun Oh
Kihong Kim
Jaemin Kim
Sungkyun Kim
Junyeol Lee
Du-Seong Chang
Jiwon Seo
213
66
0
15 Mar 2024
VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation
VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation
Mingya Zhang
Yue Yu
Limei Gu
Tingsheng Lin
Xianping Tao
Mamba
177
96
0
14 Mar 2024
Revealing the Parallel Multilingual Learning within Large Language Models
Revealing the Parallel Multilingual Learning within Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yongyu Mu
Peinan Feng
Zhiquan Cao
Yuzhang Wu
Bei Li
...
Tong Xiao
Kai Song
Tongran Liu
Chunliang Zhang
Jingbo Zhu
191
1
0
14 Mar 2024
Language models scale reliably with over-training and on downstream
  tasks
Language models scale reliably with over-training and on downstream tasksInternational Conference on Learning Representations (ICLR), 2024
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALMELMLRM
341
75
0
13 Mar 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLMLLMAG
589
825
0
13 Mar 2024
Rethinking Generative Large Language Model Evaluation for Semantic
  Comprehension
Rethinking Generative Large Language Model Evaluation for Semantic ComprehensionInternational Conference on Machine Learning (ICML), 2024
Fangyun Wei
Xi Chen
Linzi Luo
ELMALMLRM
191
14
0
12 Mar 2024
Harder Tasks Need More Experts: Dynamic Routing in MoE Models
Harder Tasks Need More Experts: Dynamic Routing in MoE Models
Quzhe Huang
Zhenwei An
Zhuang Nan
Mingxu Tao
Chen Zhang
...
Kun Xu
Kun Xu
Liwei Chen
Songfang Huang
Yansong Feng
MoE
225
55
0
12 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
434
639
0
08 Mar 2024
LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image
  Segmentation
LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation
Weibin Liao
Yinghao Zhu
Xinyuan Wang
Cehngwei Pan
Yasha Wang
Liantao Ma
Mamba
227
126
0
08 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLMLRM
829
764
0
07 Mar 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu Zhang
Beidi Chen
Zinan Lin
A. Anandkumar
Yuandong Tian
385
333
0
06 Mar 2024
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral
Yiming Cui
Xin Yao
88
7
0
04 Mar 2024
Previous
123...121314...171819
Next