ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXivPDFHTML

Papers citing "GLU Variants Improve Transformer"

50 / 647 papers shown
Title
Optimizing Universal Lesion Segmentation: State Space Model-Guided
  Hierarchical Networks with Feature Importance Adjustment
Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment
Kazi Shahriar Sanjid
Md. Tanzim Hossain
Md. Shakib Shahariar Junayed
M. M. Uddin
Mamba
35
2
0
26 Apr 2024
Tele-FLM Technical Report
Tele-FLM Technical Report
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Chao Wang
...
Yequan Wang
Zhongjiang He
Zhongyuan Wang
Xuelong Li
Tiejun Huang
30
3
0
25 Apr 2024
zkLLM: Zero Knowledge Proofs for Large Language Models
zkLLM: Zero Knowledge Proofs for Large Language Models
Haochen Sun
Jason Li
Hongyang Zhang
ALM
31
22
0
24 Apr 2024
Improving Dictionary Learning with Gated Sparse Autoencoders
Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan
Arthur Conmy
Lewis Smith
Tom Lieberum
Vikrant Varma
János Kramár
Rohin Shah
Neel Nanda
RALM
22
78
0
24 Apr 2024
EEGEncoder: Advancing BCI with Transformer-Based Motor Imagery
  Classification
EEGEncoder: Advancing BCI with Transformer-Based Motor Imagery Classification
Wangdan Liao
Weidong Wang
20
4
0
23 Apr 2024
OpenELM: An Efficient Language Model Family with Open Training and
  Inference Framework
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Sachin Mehta
Mohammad Hossein Sekhavat
Qingqing Cao
Maxwell Horton
Yanzi Jin
...
Iman Mirzadeh
Mahyar Najibi
Dmitry Belenko
Peter Zatloukal
Mohammad Rastegari
OSLM
AIFin
38
50
0
22 Apr 2024
When Life gives you LLMs, make LLM-ADE: Large Language Models with
  Adaptive Data Engineering
When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering
Stephen Choi
William Gazeley
KELM
36
2
0
19 Apr 2024
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
  Models
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Aitor Ormazabal
Che Zheng
Cyprien de Masson dÁutume
Dani Yogatama
Deyu Fu
...
Yazheng Yang
Yi Tay
Yuqi Wang
Zhongkai Zhu
Zhihui Xie
LRM
VLM
ReLM
28
47
0
18 Apr 2024
Transformer tricks: Removing weights for skipless transformers
Transformer tricks: Removing weights for skipless transformers
Nils Graef
33
2
0
18 Apr 2024
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large
  Language Models
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models
Yushuo Chen
Tianyi Tang
Erge Xiang
Linjiang Li
Wayne Xin Zhao
Jing Wang
Yunpeng Chai
Ji-Rong Wen
11
1
0
17 Apr 2024
HumMUSS: Human Motion Understanding using State Space Models
HumMUSS: Human Motion Understanding using State Space Models
Arnab Kumar Mondal
Stefano Alletto
Denis Tome
26
4
0
16 Apr 2024
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Haozheng Fan
Hao Zhou
Guangtai Huang
Parameswaran Raman
Xinwei Fu
Gaurav Gupta
Dhananjay Ram
Yida Wang
Jun Huan
38
5
0
16 Apr 2024
Balancing Speciality and Versatility: a Coarse to Fine Framework for
  Supervised Fine-tuning Large Language Model
Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
Hengyuan Zhang
Yanru Wu
Dawei Li
Zacc Yang
Rui Zhao
Yong Jiang
Fei Tan
ALM
24
0
0
16 Apr 2024
EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal
  LLM
EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM
Henry Peng Zou
Gavin Heqing Yu
Ziwei Fan
Dan Bu
Han Liu
Peng Dai
Dongmei Jia
Cornelia Caragea
16
14
0
13 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
53
27
0
12 Apr 2024
Adapting LLaMA Decoder to Vision Transformer
Adapting LLaMA Decoder to Vision Transformer
Jiahao Wang
Wenqi Shao
Mengzhao Chen
Chengyue Wu
Yong Liu
Taiqiang Wu
Kaipeng Zhang
Songyang Zhang
Kai-xiang Chen
Ping Luo
MLLM
38
4
0
10 Apr 2024
From Protoscience to Epistemic Monoculture: How Benchmarking Set the
  Stage for the Deep Learning Revolution
From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution
Bernard J. Koch
David Peterson
16
5
0
09 Apr 2024
MuPT: A Generative Symbolic Music Pretrained Transformer
MuPT: A Generative Symbolic Music Pretrained Transformer
Xingwei Qu
Yuelin Bai
Yi Ma
Ziya Zhou
Ka Man Lo
...
Xu Tan
Stephen W. Huang
Wenhu Chen
Jie Fu
Ge Zhang
52
10
0
09 Apr 2024
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Xinrun Du
Zhouliang Yu
Songyang Gao
Ding Pan
Yuyang Cheng
...
Tianyu Zheng
Xinchen Luo
Guorui Zhou
Wenhu Chen
Ge Zhang
48
17
0
05 Apr 2024
CantTalkAboutThis: Aligning Language Models to Stay on Topic in
  Dialogues
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Makesh Narsimhan Sreedhar
Traian Rebedea
Shaona Ghosh
Jiaqi Zeng
Christopher Parisien
ALM
27
4
0
04 Apr 2024
Sailor: Open Language Models for South-East Asia
Sailor: Open Language Models for South-East Asia
Longxu Dou
Qian Liu
Guangtao Zeng
Jia Guo
Jiahui Zhou
Wei Lu
Min-Bin Lin
LRM
32
7
0
04 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
28
24
0
02 Apr 2024
Accelerating Transformer Pre-training with 2:4 Sparsity
Accelerating Transformer Pre-training with 2:4 Sparsity
Yuezhou Hu
Kang Zhao
Weiyu Huang
Jianfei Chen
Jun Zhu
57
7
0
02 Apr 2024
Rewrite the Stars
Rewrite the Stars
Xu Ma
Xiyang Dai
Yue Bai
Yizhou Wang
Yun Fu
28
94
0
29 Mar 2024
MambaMixer: Efficient Selective State Space Models with Dual Token and
  Channel Selection
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Ali Behrouz
Michele Santacatterina
Ramin Zabih
39
31
0
29 Mar 2024
Jamba: A Hybrid Transformer-Mamba Language Model
Jamba: A Hybrid Transformer-Mamba Language Model
Opher Lieber
Barak Lenz
Hofit Bata
Gal Cohen
Jhonathan Osin
...
Nir Ratner
N. Rozen
Erez Shwartz
Mor Zusman
Y. Shoham
26
207
0
28 Mar 2024
Mechanistic Design and Scaling of Hybrid Architectures
Mechanistic Design and Scaling of Hybrid Architectures
Michael Poli
Armin W. Thomas
Eric N. D. Nguyen
Pragaash Ponnusamy
Bjorn Deiseroth
...
Brian Hie
Stefano Ermon
Christopher Ré
Ce Zhang
Stefano Massaroli
MoE
49
21
0
26 Mar 2024
Incorporating Exponential Smoothing into MLP: A Simple but Effective
  Sequence Model
Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model
Jiqun Chu
Zuoquan Lin
AI4TS
25
2
0
26 Mar 2024
VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate
  Spatiotemporal Forecasting
VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting
Yujin Tang
Peijie Dong
Zhenheng Tang
Xiaowen Chu
Junwei Liang
Mamba
60
20
0
25 Mar 2024
KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable
  Adaptation
KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation
Xindi Luo
Zequn Sun
Jing-xin Zhao
Zhe Zhao
Wei Hu
KELM
24
3
0
22 Mar 2024
ChatGPT Alternative Solutions: Large Language Models Survey
ChatGPT Alternative Solutions: Large Language Models Survey
H. Alipour
Nick Pendar
Kohinoor Roy
LM&MA
27
4
0
21 Mar 2024
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
Hyungjun Oh
Kihong Kim
Jaemin Kim
Sungkyun Kim
Junyeol Lee
Du-Seong Chang
Jiwon Seo
36
28
0
15 Mar 2024
VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation
VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation
Mingya Zhang
Yue Yu
Limei Gu
Tingsheng Lin
Xianping Tao
Mamba
50
43
0
14 Mar 2024
Language models scale reliably with over-training and on downstream
  tasks
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
103
40
0
13 Mar 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLM
LLMAG
129
428
0
13 Mar 2024
Rethinking Generative Large Language Model Evaluation for Semantic
  Comprehension
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension
Fangyun Wei
Xi Chen
Linzi Luo
ELM
ALM
LRM
30
7
0
12 Mar 2024
Harder Tasks Need More Experts: Dynamic Routing in MoE Models
Harder Tasks Need More Experts: Dynamic Routing in MoE Models
Quzhe Huang
Zhenwei An
Zhuang Nan
Mingxu Tao
Chen Zhang
...
Kun Xu
Kun Xu
Liwei Chen
Songfang Huang
Yansong Feng
MoE
37
26
0
12 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
36
288
0
08 Mar 2024
LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image
  Segmentation
LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation
Weibin Liao
Yinghao Zhu
Xinyuan Wang
Cehngwei Pan
Yasha Wang
Liantao Ma
Mamba
38
64
0
08 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
121
497
0
07 Mar 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu (Allen) Zhang
Beidi Chen
Zhangyang Wang
A. Anandkumar
Yuandong Tian
43
173
0
06 Mar 2024
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral
Yiming Cui
Xin Yao
20
4
0
04 Mar 2024
The Hidden Attention of Mamba Models
The Hidden Attention of Mamba Models
Ameen Ali
Itamar Zimerman
Lior Wolf
Mamba
37
58
0
03 Mar 2024
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu
Jianlin Su
Bo-Wen Zhang
Chunhua Shen
MLLM
30
10
0
01 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
53
117
0
29 Feb 2024
RiNALMo: General-Purpose RNA Language Models Can Generalize Well on
  Structure Prediction Tasks
RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks
Rafael Josip Penić
Tin Vlasic
Roland G. Huber
Yue Wan
M. Šikić
AI4CE
22
27
0
29 Feb 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma
Hongyu Wang
Lingxiao Ma
Lei Wang
Wenhui Wang
Shaohan Huang
Lifeng Dong
Ruiping Wang
Jilong Xue
Furu Wei
MQ
45
203
0
27 Feb 2024
Language-Specific Neurons: The Key to Multilingual Capabilities in Large
  Language Models
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
Tianyi Tang
Wenyang Luo
Haoyang Huang
Dongdong Zhang
Xiaolei Wang
Xin Zhao
Furu Wei
Ji-Rong Wen
56
46
0
26 Feb 2024
MambaIR: A Simple Baseline for Image Restoration with State-Space Model
MambaIR: A Simple Baseline for Image Restoration with State-Space Model
Hang Guo
Jinmin Li
Tao Dai
Zhihao Ouyang
Xudong Ren
Shu-Tao Xia
Mamba
50
199
0
23 Feb 2024
Previous
123...789...111213
Next