Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"GLU Variants Improve Transformer"
50 / 904 papers shown
The Hidden Attention of Mamba Models
Ameen Ali
Itamar Zimerman
Lior Wolf
Mamba
514
92
0
03 Mar 2024
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu
Jianlin Su
Bo Zhang
Chunhua Shen
MLLM
383
27
0
01 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
271
190
0
29 Feb 2024
RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks
Rafael Josip Penić
Tin Vlasic
Roland G. Huber
Yue Wan
M. Šikić
AI4CE
167
66
0
29 Feb 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Kushagra Pandey
Robert Bamler
Sina Daubener
...
Yixin Wang
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
758
40
0
28 Feb 2024
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma
Hongyu Wang
Lingxiao Ma
Lei Wang
Wenhui Wang
Shaohan Huang
Lifeng Dong
Ruiping Wang
Jilong Xue
Furu Wei
MQ
278
324
0
27 Feb 2024
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
Tianyi Tang
Wenyang Luo
Haoyang Huang
Dongdong Zhang
Xiaolei Wang
Xin Zhao
Furu Wei
Ji-Rong Wen
347
93
0
26 Feb 2024
MambaIR: A Simple Baseline for Image Restoration with State-Space Model
Hang Guo
Jinmin Li
Tao Dai
Zhihao Ouyang
Xudong Ren
Shu-Tao Xia
Mamba
366
518
0
23 Feb 2024
Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
Bruno Gavranovic
Paul Lessard
Andrew Dudzik
Tamara von Glehn
J. G. Araújo
Petar Velickovic
297
15
0
23 Feb 2024
Understanding and Patching Compositional Reasoning in LLMs
Zhaoyi Li
Gangwei Jiang
Hong Xie
Linqi Song
Defu Lian
Ying Wei
LRM
232
43
0
22 Feb 2024
Improving Language Understanding from Screenshots
Tianyu Gao
Zirui Wang
Adithya Bhaskar
Danqi Chen
VLM
201
13
0
21 Feb 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
...
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
370
40
0
21 Feb 2024
Transformer tricks: Precomputing the first layer
Nils Graef
MoE
136
5
0
20 Feb 2024
Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers
Zihan Qiu
Zeyu Huang
Youcheng Huang
Jie Fu
KELM
178
6
0
19 Feb 2024
Can Large Multimodal Models Uncover Deep Semantics Behind Images?
Yixin Yang
Zheng Li
Qingxiu Dong
Heming Xia
Zhifang Sui
VLM
181
19
0
17 Feb 2024
PointMamba: A Simple State Space Model for Point Cloud Analysis
Dingkang Liang
Xin Zhou
Wei Xu
Xingkui Zhu
Zhikang Zou
Xiaoqing Ye
Xinyu Wang
Xiang Bai
437
199
0
16 Feb 2024
Towards Privacy-Aware Sign Language Translation at Scale
Phillip Rust
Bowen Shi
Skyler Wang
Necati Cihan Camgöz
Jean Maillard
SLR
249
36
0
14 Feb 2024
Transformers Can Achieve Length Generalization But Not Robustly
Yongchao Zhou
Uri Alon
Xinyun Chen
Xuezhi Wang
Rishabh Agarwal
Denny Zhou
286
65
0
14 Feb 2024
Spectral Filters, Dark Signals, and Attention Sinks
Nicola Cancedda
215
34
0
14 Feb 2024
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald
Nimrod Barazani
Cees G. M. Snoek
Yuki M. Asano
VLM
MLLM
197
14
0
13 Feb 2024
Learn To be Efficient: Build Structured Sparsity in Large Language Models
Haizhong Zheng
Xiaoyan Bai
Xueshen Liu
Z. Morley Mao
Beidi Chen
Fan Lai
Atul Prakash
280
23
0
09 Feb 2024
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation
Ziyang Wang
Jian-Qing Zheng
Yichi Zhang
Ge Cui
Lei Li
Mamba
329
228
0
07 Feb 2024
ReLU
2
^2
2
Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang
Yixin Song
Guanghui Yu
Xu Han
Yankai Lin
Chaojun Xiao
Chenyang Song
Zhiyuan Liu
Zeyu Mi
Maosong Sun
248
46
0
06 Feb 2024
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
International Conference on Learning Representations (ICLR), 2024
Ji Qi
Ming Ding
Weihan Wang
Yushi Bai
Qingsong Lv
...
Bin Xu
Lei Hou
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
LRM
243
13
0
06 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
463
66
0
05 Feb 2024
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Matteo Pagliardini
Amirkeivan Mohtashami
François Fleuret
Martin Jaggi
251
15
0
04 Feb 2024
Unified Training of Universal Time Series Forecasting Transformers
Gerald Woo
Chenghao Liu
Akshat Kumar
Caiming Xiong
Silvio Savarese
Doyen Sahoo
AI4TS
370
389
0
04 Feb 2024
Leveraging Continuously Differentiable Activation Functions for Learning in Quantized Noisy Environments
Vivswan Shah
Nathan Youngblood
357
3
0
04 Feb 2024
Learning Structure-Aware Representations of Dependent Types
Konstantinos Kogkalidis
Orestis Melkonian
Jean-Philippe Bernardy
NAI
185
3
0
03 Feb 2024
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers
Bharat Runwal
Tejaswini Pedapati
Pin-Yu Chen
MoE
410
8
0
02 Feb 2024
Nomic Embed: Training a Reproducible Long Context Text Embedder
Zach Nussbaum
John X. Morris
Brandon Duderstadt
Andriy Mulyar
348
216
0
02 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
538
3
0
01 Feb 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
649
544
0
01 Feb 2024
BlackMamba: Mixture of Experts for State-Space Models
Quentin G. Anthony
Yury Tokpanov
Paolo Glorioso
Beren Millidge
164
34
0
01 Feb 2024
LOCOST: State-Space Models for Long Document Abstractive Summarization
Florian Le Bronnec
Song Duong
Mathieu Ravaut
Alexandre Allauzen
Nancy F. Chen
Vincent Guigue
Alberto Lumbreras
Laure Soulier
Patrick Gallinari
404
15
0
31 Jan 2024
Weaver: Foundation Models for Creative Writing
Tiannan Wang
Jiamin Chen
Qingrui Jia
Shuai Wang
Ruoyu Fang
...
Xiaohua Xu
Ningyu Zhang
Huajun Chen
Yuchen Eleanor Jiang
Wangchunshu Zhou
259
23
0
30 Jan 2024
TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese
N. Corrêa
Sophia Falk
Shiza Fatimah
Aniket Sen
N. D. Oliveira
266
22
0
30 Jan 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
International Conference on Machine Learning (ICML), 2024
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
289
155
0
29 Jan 2024
Baichuan2-Sum: Instruction Finetune Baichuan2-7B Model for Dialogue Summarization
IEEE International Joint Conference on Neural Network (IJCNN), 2024
Jianfei Xiao
Yancan Chen
Yimin Ou
Hanyi Yu
Kai Shu
Yiyong Xiao
ALM
289
19
0
27 Jan 2024
The Case for Co-Designing Model Architectures with Hardware
International Conference on Parallel Processing (ICPP), 2024
Quentin G. Anthony
Jacob Hatef
Deepak Narayanan
Stella Biderman
Stas Bekman
Junqi Yin
Hari Subramoni
Hari Subramoni
Dhabaleswar Panda
3DV
134
12
0
25 Jan 2024
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Gokcce Uludougan
Zeynep Yirmibecsouglu Balal
Furkan Akkurt
Melikcsah Turker
Onur Gungor
S. Uskudarli
211
20
0
25 Jan 2024
A Survey of Deep Learning and Foundation Models for Time Series Forecasting
John A. Miller
Mohammed Aldosari
Farah Saeed
Nasid Habib Barna
Subas Rana
I. Arpinar
Ninghao Liu
AI4TS
AI4CE
270
49
0
25 Jan 2024
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
Ke Ye
Heinrich Jiang
Afshin Rostamizadeh
Ayan Chakrabarti
Giulia DeSalvo
Jean-François Kagy
Lazaros Karydas
Gui Citovsky
Sanjiv Kumar
194
0
0
24 Jan 2024
In-Context Language Learning: Architectures and Algorithms
International Conference on Machine Learning (ICML), 2024
Ekin Akyürek
Bailin Wang
Yoon Kim
Jacob Andreas
LRM
ReLM
388
80
0
23 Jan 2024
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo
International Conference on Learning Representations (ICLR), 2024
Chenjie Cao
Xinlin Ren
Yanwei Fu
239
51
0
22 Jan 2024
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
International Conference on Machine Learning (ICML), 2024
Katherine Crowson
Stefan Andreas Baumann
Alex Birch
Tanishq Mathew Abraham
Daniel Z. Kaplan
Enrico Shippole
337
80
0
21 Jan 2024
A Study on Training and Developing Large Language Models for Behavior Tree Generation
Fu Li
Xueying Wang
Bin Li
Yunlong Wu
Yanzhen Wang
Xiaodong Yi
255
10
0
16 Jan 2024
Extreme Compression of Large Language Models via Additive Quantization
International Conference on Machine Learning (ICML), 2024
Vage Egiazarian
Andrei Panferov
Denis Kuznedelev
Elias Frantar
Artem Babenko
Dan Alistarh
MQ
417
149
0
11 Jan 2024
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein
bioRxiv (bioRxiv), 2024
Bo Chen
Xingyi Cheng
Pan Li
Yangli-ao Geng
Jing Gong
...
Chiming Liu
Aohan Zeng
Yuxiao Dong
Jie Tang
Leo T. Song
246
134
0
11 Jan 2024
FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference
Zirui Liu
Qingquan Song
Q. Xiao
Sathiya Keerthi Selvaraj
Rahul Mazumder
Aman Gupta
Helen Zhou
166
7
0
08 Jan 2024
Previous
1
2
3
...
13
14
15
...
17
18
19
Next