ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXivPDFHTML

Papers citing "GLU Variants Improve Transformer"

50 / 647 papers shown
Title
Position: Categorical Deep Learning is an Algebraic Theory of All
  Architectures
Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
Bruno Gavranovic
Paul Lessard
Andrew Dudzik
Tamara von Glehn
J. G. Araújo
Petar Velickovic
32
7
0
23 Feb 2024
Understanding and Patching Compositional Reasoning in LLMs
Understanding and Patching Compositional Reasoning in LLMs
Zhaoyi Li
Gangwei Jiang
Hong Xie
Linqi Song
Defu Lian
Ying Wei
LRM
46
20
0
22 Feb 2024
Improving Language Understanding from Screenshots
Improving Language Understanding from Screenshots
Tianyu Gao
Zirui Wang
Adithya Bhaskar
Danqi Chen
VLM
27
10
0
21 Feb 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity
  within Large Language Models
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
...
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
48
24
0
21 Feb 2024
Transformer tricks: Precomputing the first layer
Transformer tricks: Precomputing the first layer
Nils Graef
MoE
16
4
0
20 Feb 2024
Empirical Study on Updating Key-Value Memories in Transformer
  Feed-forward Layers
Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers
Zihan Qiu
Zeyu Huang
Youcheng Huang
Jie Fu
KELM
30
5
0
19 Feb 2024
Can Large Multimodal Models Uncover Deep Semantics Behind Images?
Can Large Multimodal Models Uncover Deep Semantics Behind Images?
Yixin Yang
Zheng Li
Qingxiu Dong
Heming Xia
Zhifang Sui
VLM
22
8
0
17 Feb 2024
PointMamba: A Simple State Space Model for Point Cloud Analysis
PointMamba: A Simple State Space Model for Point Cloud Analysis
Dingkang Liang
Xin Zhou
Wei Xu
Xingkui Zhu
Zhikang Zou
Xiaoqing Ye
Xinyu Wang
Xiang Bai
84
89
0
16 Feb 2024
Towards Privacy-Aware Sign Language Translation at Scale
Towards Privacy-Aware Sign Language Translation at Scale
Phillip Rust
Bowen Shi
Skyler Wang
Necati Cihan Camgöz
Jean Maillard
SLR
39
14
0
14 Feb 2024
Transformers Can Achieve Length Generalization But Not Robustly
Transformers Can Achieve Length Generalization But Not Robustly
Yongchao Zhou
Uri Alon
Xinyun Chen
Xuezhi Wang
Rishabh Agarwal
Denny Zhou
44
36
0
14 Feb 2024
Spectral Filters, Dark Signals, and Attention Sinks
Spectral Filters, Dark Signals, and Attention Sinks
Nicola Cancedda
56
16
0
14 Feb 2024
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald
Nimrod Barazani
Cees G. M. Snoek
Yuki M. Asano
VLM
MLLM
27
12
0
13 Feb 2024
Learn To be Efficient: Build Structured Sparsity in Large Language
  Models
Learn To be Efficient: Build Structured Sparsity in Large Language Models
Haizhong Zheng
Xiaoyan Bai
Xueshen Liu
Z. Morley Mao
Beidi Chen
Fan Lai
Atul Prakash
43
11
0
09 Feb 2024
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation
Ziyang Wang
Jian-Qing Zheng
Yichi Zhang
Ge Cui
Lei Li
Mamba
30
123
0
07 Feb 2024
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
  LLMs
ReLU2^22 Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang
Yixin Song
Guanghui Yu
Xu Han
Yankai Lin
Chaojun Xiao
Chenyang Song
Zhiyuan Liu
Zeyu Mi
Maosong Sun
20
31
0
06 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
29
27
0
05 Feb 2024
DenseFormer: Enhancing Information Flow in Transformers via Depth
  Weighted Averaging
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Matteo Pagliardini
Amirkeivan Mohtashami
F. Fleuret
Martin Jaggi
35
6
0
04 Feb 2024
Unified Training of Universal Time Series Forecasting Transformers
Unified Training of Universal Time Series Forecasting Transformers
Gerald Woo
Chenghao Liu
Akshat Kumar
Caiming Xiong
Silvio Savarese
Doyen Sahoo
AI4TS
107
161
0
04 Feb 2024
Leveraging Continuously Differentiable Activation Functions for Learning in Quantized Noisy Environments
Leveraging Continuously Differentiable Activation Functions for Learning in Quantized Noisy Environments
Vivswan Shah
Nathan Youngblood
36
2
0
04 Feb 2024
Learning Structure-Aware Representations of Dependent Types
Learning Structure-Aware Representations of Dependent Types
Konstantinos Kogkalidis
Orestis Melkonian
Jean-Philippe Bernardy
NAI
18
1
0
03 Feb 2024
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing
  Activation Density in Transformers
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers
Bharat Runwal
Tejaswini Pedapati
Pin-Yu Chen
MoE
47
4
0
02 Feb 2024
Nomic Embed: Training a Reproducible Long Context Text Embedder
Nomic Embed: Training a Reproducible Long Context Text Embedder
Zach Nussbaum
John X. Morris
Brandon Duderstadt
Andriy Mulyar
19
93
0
02 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
34
1
0
01 Feb 2024
OLMo: Accelerating the Science of Language Models
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
130
355
0
01 Feb 2024
BlackMamba: Mixture of Experts for State-Space Models
BlackMamba: Mixture of Experts for State-Space Models
Quentin G. Anthony
Yury Tokpanov
Paolo Glorioso
Beren Millidge
20
21
0
01 Feb 2024
LOCOST: State-Space Models for Long Document Abstractive Summarization
LOCOST: State-Space Models for Long Document Abstractive Summarization
Florian Le Bronnec
Song Duong
Mathieu Ravaut
Alexandre Allauzen
Nancy F. Chen
Vincent Guigue
Alberto Lumbreras
Laure Soulier
Patrick Gallinari
40
7
0
31 Jan 2024
Weaver: Foundation Models for Creative Writing
Weaver: Foundation Models for Creative Writing
Tiannan Wang
Jiamin Chen
Qingrui Jia
Shuai Wang
Ruoyu Fang
...
Xiaohua Xu
Ningyu Zhang
Huajun Chen
Yuchen Eleanor Jiang
Wangchunshu Zhou
20
18
0
30 Jan 2024
TeenyTinyLlama: open-source tiny language models trained in Brazilian
  Portuguese
TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese
N. Corrêa
Sophia Falk
Shiza Fatimah
Aniket Sen
N. D. Oliveira
20
9
0
30 Jan 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
20
87
0
29 Jan 2024
Baichuan2-Sum: Instruction Finetune Baichuan2-7B Model for Dialogue
  Summarization
Baichuan2-Sum: Instruction Finetune Baichuan2-7B Model for Dialogue Summarization
Jianfei Xiao
Yancan Chen
Yimin Ou
Hanyi Yu
Kai Shu
Yiyong Xiao
ALM
22
11
0
27 Jan 2024
The Case for Co-Designing Model Architectures with Hardware
The Case for Co-Designing Model Architectures with Hardware
Quentin G. Anthony
Jacob Hatef
Deepak Narayanan
Stella Biderman
Stas Bekman
Junqi Yin
A. Shafi
Hari Subramoni
Dhabaleswar Panda
3DV
19
4
0
25 Jan 2024
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced
  Understanding and Generation
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation
Gokcce Uludougan
Zeynep Yirmibecsouglu Balal
Furkan Akkurt
Melikcsah Turker
Onur Gungor
S. Uskudarli
31
12
0
25 Jan 2024
A Survey of Deep Learning and Foundation Models for Time Series
  Forecasting
A Survey of Deep Learning and Foundation Models for Time Series Forecasting
John A. Miller
Mohammed Aldosari
Farah Saeed
Nasid Habib Barna
Subas Rana
I. Arpinar
Ninghao Liu
AI4TS
AI4CE
35
19
0
25 Jan 2024
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced
  Token Detection
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
Ke Ye
Heinrich Jiang
Afshin Rostamizadeh
Ayan Chakrabarti
Giulia DeSalvo
Jean-François Kagy
Lazaros Karydas
Gui Citovsky
Sanjiv Kumar
28
0
0
24 Jan 2024
In-Context Language Learning: Architectures and Algorithms
In-Context Language Learning: Architectures and Algorithms
Ekin Akyürek
Bailin Wang
Yoon Kim
Jacob Andreas
LRM
ReLM
43
40
0
23 Jan 2024
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View
  Stereo
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo
Chenjie Cao
Xinlin Ren
Yanwei Fu
21
25
0
22 Jan 2024
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass
  Diffusion Transformers
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Katherine Crowson
Stefan Andreas Baumann
Alex Birch
Tanishq Mathew Abraham
Daniel Z. Kaplan
Enrico Shippole
21
48
0
21 Jan 2024
A Study on Training and Developing Large Language Models for Behavior
  Tree Generation
A Study on Training and Developing Large Language Models for Behavior Tree Generation
Fu Li
Xueying Wang
Bin Li
Yunlong Wu
Yanzhen Wang
Xiaodong Yi
11
4
0
16 Jan 2024
Extreme Compression of Large Language Models via Additive Quantization
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian
Andrei Panferov
Denis Kuznedelev
Elias Frantar
Artem Babenko
Dan Alistarh
MQ
98
88
0
11 Jan 2024
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
  the Language of Protein
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein
Bo Chen
Xingyi Cheng
Pan Li
Yangli-ao Geng
Jing Gong
...
Chiming Liu
Aohan Zeng
Yuxiao Dong
Jie Tang
Leo T. Song
34
101
0
11 Jan 2024
FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency
  Trade-off in Language Model Inference
FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference
Zirui Liu
Qingquan Song
Q. Xiao
Sathiya Keerthi Selvaraj
Rahul Mazumder
Aman Gupta
Xia Hu
20
4
0
08 Jan 2024
TeleChat Technical Report
TeleChat Technical Report
Zhongjiang He
Zihan Wang
Xinzhan Liu
Shixuan Liu
Yitong Yao
...
Zilu Huang
Sishi Xiong
Yuxiang Zhang
Chao Wang
Shuangyong Song
AI4MH
LRM
ALM
56
3
0
08 Jan 2024
PIXAR: Auto-Regressive Language Modeling in Pixel Space
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Yintao Tai
Xiyang Liao
Alessandro Suglia
Antonio Vergari
MLLM
21
7
0
06 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
139
306
0
05 Jan 2024
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese
  Prompt-Based Task
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task
Gabriel Lino Garcia
P. H. Paiola
Luis Henrique Morelli
Giovani Candido
Arnaldo Cândido Júnior
D. Jodas
Luis C. S. Afonso
I. R. Guilherme
B. Penteado
João Paulo Papa
13
11
0
05 Jan 2024
TinyLlama: An Open-Source Small Language Model
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALM
LRM
35
353
0
04 Jan 2024
Enhancing Automatic Modulation Recognition through Robust Global Feature
  Extraction
Enhancing Automatic Modulation Recognition through Robust Global Feature Extraction
Yunpeng Qu
Zhilin Lu
Rui Zeng
Jintao Wang
Jian Wang
24
8
0
02 Jan 2024
DocLLM: A layout-aware generative language model for multimodal document
  understanding
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
14
50
0
31 Dec 2023
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Jacob P. Portes
Alex Trott
Sam Havens
Daniel King
Abhinav Venigalla
Moin Nadeem
Nikhil Sardana
D. Khudia
Jonathan Frankle
13
16
0
29 Dec 2023
Learning Vision from Models Rivals Learning Vision from Data
Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian
Lijie Fan
Kaifeng Chen
Dina Katabi
Dilip Krishnan
Phillip Isola
27
45
0
28 Dec 2023
Previous
123...1011121389
Next