Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"GLU Variants Improve Transformer"
50 / 904 papers shown
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Xingwu Sun
Yanfeng Chen
Yanwen Huang
Ruobing Xie
Jiaqi Zhu
...
Zhanhui Kang
Yong Yang
Yuhong Liu
Di Wang
Jie Jiang
MoE
ALM
ELM
505
77
0
04 Nov 2024
Training Compute-Optimal Protein Language Models
bioRxiv (bioRxiv), 2024
Xingyi Cheng
Bo Chen
Pan Li
Jing Gong
Jie Tang
Le Song
312
29
0
04 Nov 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Yuxiao Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
Jiansheng Wei
Zhiyuan Liu
Maosong Sun
589
14
0
04 Nov 2024
Enhancing Glucose Level Prediction of ICU Patients through Hierarchical Modeling of Irregular Time-Series
Computational and Structural Biotechnology Journal (CSBJ), 2024
Hadi Mehdizavareh
Arijit Khan
Simon Lebech Cichosz
AI4TS
179
0
0
03 Nov 2024
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
Tianyi Zhang
B. Li
Jae-sun Seo
Yu Cao
177
1
0
31 Oct 2024
Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis
Théodor Lemerle
Harrison Vanderbyl
Vaibhav Srivastav
Nicolas Obin
221
4
0
30 Oct 2024
GenUP: Generative User Profilers as In-Context Learners for Next POI Recommender Systems
Wilson Wongso
Hao Xue
Flora D. Salim
434
11
0
28 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
International Conference on Learning Representations (ICLR), 2024
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
396
20
0
28 Oct 2024
BanglaLlama: LLaMA for Bangla Language
Abdullah Khan Zehady
Shubhashis Roy Dipta
Naymul Islam
Safi Al Mamun
Santu Karmaker
ALM
255
1
0
28 Oct 2024
Neuralink: Fast LLM Inference on Smartphones with Neuron Co-Activation Linking
Tuowei Wang
Ruwen Fan
Minxing Huang
Zixu Hao
Kun Li
Ting Cao
Youyou Lu
Yaoxue Zhang
Ju Ren
349
2
0
25 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
International Conference on Learning Representations (ICLR), 2024
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
494
18
0
25 Oct 2024
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Neural Information Processing Systems (NeurIPS), 2024
Ruisi Cai
Yeonju Ro
Geon-Woo Kim
Peihao Wang
Babak Ehteshami Bejnordi
Aditya Akella
Liang Luo
MoE
193
9
0
24 Oct 2024
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Chien Van Nguyen
Huy Huu Nguyen
Thang M. Pham
Ruiyi Zhang
Hanieh Deilamsalehy
...
Ryan A. Rossi
Trung Bui
Viet Dac Lai
Franck Dernoncourt
Thien Huu Nguyen
Mamba
RALM
154
2
0
24 Oct 2024
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
197
7
0
24 Oct 2024
Scaling up Masked Diffusion Models on Text
International Conference on Learning Representations (ICLR), 2024
Shen Nie
Fengqi Zhu
Chao Du
Tianyu Pang
Qian Liu
Guangtao Zeng
Min Lin
Chongxuan Li
AI4CE
536
101
0
24 Oct 2024
Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction
Nicholas Walker
157
0
0
23 Oct 2024
PLDR-LLM: Large Language Model from Power Law Decoder Representations
Burc Gokden
145
2
0
22 Oct 2024
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset
ACM Multimedia Asia (MMAsia), 2024
Ruikun Zhang
Hao Yang
Yan Yang
Ying Fu
Liyuan Pan
329
10
0
21 Oct 2024
Natural GaLore: Accelerating GaLore for memory-efficient LLM Training and Fine-tuning
Arijit Das
140
2
0
21 Oct 2024
Comprehensive benchmarking of large language models for RNA secondary structure prediction
L. I. Zablocki
L. A. Bugnon
M. Gerard
L. Di Persia
G. Stegmayer
D. H. Milone
AI4TS
255
11
0
21 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
International Conference on Learning Representations (ICLR), 2024
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
443
18
0
21 Oct 2024
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
International Conference on Learning Representations (ICLR), 2024
Thomas Robert
M. Safaryan
Ionut-Vlad Modoranu
Dan Alistarh
ODL
460
21
0
21 Oct 2024
Lossless KV Cache Compression to 2%
Zhen Yang
Jizong Han
Kan Wu
Ruobing Xie
An Wang
Xingwu Sun
Zhanhui Kang
VLM
MQ
203
5
0
20 Oct 2024
CompAct: Compressed Activations for Memory-Efficient LLM Training
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yara Shamshoum
Nitzan Hodos
Yuval Sieradzki
Assaf Schuster
MQ
VLM
309
6
0
20 Oct 2024
Quanta Video Restoration
European Conference on Computer Vision (ECCV), 2024
Prateek Chennuri
Yiheng Chi
Enze Jiang
G. M. Dilshan Godaliyadda
Abhiram Gnanasambandam
Hamid R. Sheikh
Istvan Gyongy
Stanley H. Chan
308
6
0
19 Oct 2024
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
ZiDong Wang
Zeyu Lu
Di Huang
Cai Zhou
Wanli Ouyang
and Lei Bai
289
9
0
17 Oct 2024
VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Lingxiao Luo
Bingda Tang
Xuanzhong Chen
Rong Han
Ting Chen
VLM
262
14
0
16 Oct 2024
Neuron-based Personality Trait Induction in Large Language Models
Jia Deng
Tianyi Tang
Yanbin Yin
Wenhao Yang
Wayne Xin Zhao
Ji-Rong Wen
240
3
0
16 Oct 2024
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Yanyue Xie
Zhi Zhang
Ding Zhou
Cong Xie
Ziang Song
Xin Liu
Yanzhi Wang
Xue Lin
An Xu
LLMAG
234
25
0
15 Oct 2024
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
International Conference on Learning Representations (ICLR), 2024
Yiding Jiang
Allan Zhou
Zhili Feng
Sadhika Malladi
J. Zico Kolter
233
33
0
15 Oct 2024
Survey and Evaluation of Converging Architecture in LLMs based on Footsteps of Operations
IEEE Open Journal of the Computer Society (JOCS), 2024
Seongho Kim
Jihyun Moon
Juntaek Oh
Insu Choi
Joon-Sung Yang
162
0
0
15 Oct 2024
Rethinking Graph Transformer Architecture Design for Node Classification
Jiajun Zhou
Xuanze Chen
Chenxuan Xie
Yu Shanqing
Qi Xuan
Xiaoniu Yang
238
1
0
15 Oct 2024
Transfer Learning with Foundational Models for Time Series Forecasting using Low-Rank Adaptations
Information Fusion (Inf. Fusion), 2024
M. Germán-Morales
A. J. Rivera-Rivas
M. J. del Jesus Díaz
C. J. Carmona
AI4TS
AI4CE
736
7
0
15 Oct 2024
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
International Conference on Learning Representations (ICLR), 2024
Syeda Nahida Akter
Shrimai Prabhumoye
John Kamalu
S. Satheesh
Eric Nyberg
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
LRM
SyDa
ReLM
457
6
0
15 Oct 2024
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
Artificial Intelligence Applications and Innovations (AIAI), 2024
Syed Abdul Gaffar Shakhadri
Kruthika KR
Rakshit Aralimatti
VLM
191
4
0
15 Oct 2024
Liger Kernel: Efficient Triton Kernels for LLM Training
Pin-Lun Hsu
Ata Fatahibaarzi
Vignesh Kothapalli
Qingquan Song
Shao Tang
Sirou Zhu
Steven Shimizu
Shivam Sahni
Haowen Ning
Yanning Chen
488
97
0
14 Oct 2024
Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning
Yongxin Xu
Ruizhe Zhang
Xinke Jiang
Yujie Feng
Yuzhen Xiao
Xinyu Ma
Runchuan Zhu
Xu Chu
Junfeng Zhao
Yasha Wang
KELM
275
11
0
14 Oct 2024
MaskControl: Spatio-Temporal Control for Masked Motion Synthesis
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Chong Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
488
85
0
14 Oct 2024
Diffusion Models Need Visual Priors for Image Generation
Xiaoyu Yue
Zidong Wang
Zeyu Lu
S. Sun
Meng Wei
Wanli Ouyang
Junlin Wu
Luping Zhou
VLM
284
7
0
11 Oct 2024
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Jiatao Gu
Yuyang Wang
Yizhe Zhang
Qihang Zhang
Dinghuai Zhang
Navdeep Jaitly
Josh Susskind
Shuangfei Zhai
DiffM
381
27
0
10 Oct 2024
Upcycling Large Language Models into Mixture of Experts
Ethan He
Syeda Nahida Akter
R. Prenger
V. Korthikanti
Zijie Yan
Tong Liu
Shiqing Fan
Ashwath Aithal
Mohammad Shoeybi
Bryan Catanzaro
MoE
434
32
0
10 Oct 2024
Bilinear MLPs enable weight-based mechanistic interpretability
International Conference on Learning Representations (ICLR), 2024
Michael T. Pearce
Thomas Dooms
Alice Rigg
José Oramas
Lee Sharkey
236
16
0
10 Oct 2024
Pixtral 12B
Pravesh Agrawal
Szymon Antoniak
Emma Bou Hanna
Baptiste Bout
Devendra Singh Chaplot
...
Joachim Studnia
Sandeep Subramanian
Sagar Vaze
Thomas Wang
Sophia Yang
VLM
MLLM
272
113
0
09 Oct 2024
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Siyuan Li
Juanxi Tian
Zedong Wang
Luyuan Zhang
Zicheng Liu
Weiyang Jin
Yang Liu
Baigui Sun
Stan Z. Li
232
2
0
08 Oct 2024
Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Kosuke Nishida
Kyosuke Nishida
Kuniko Saito
217
6
0
07 Oct 2024
Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data
David Heurtel-Depeiges
Anian Ruoss
J. Veness
Tim Genewein
523
7
0
07 Oct 2024
Differential Transformer
International Conference on Learning Representations (ICLR), 2024
Tianzhu Ye
Li Dong
Yuqing Xia
Yutao Sun
Yi Zhu
Gao Huang
Furu Wei
1.2K
0
0
07 Oct 2024
A Cross-Lingual Meta-Learning Method Based on Domain Adaptation for Speech Emotion Recognition
WISE (WISE), 2024
David-Gabriel Ion
Razvan-Alexandru Smadu
Dumitru-Clementin Cercel
Florin-Catalin Pop
Mihaela-Claudia Cercel
163
0
0
06 Oct 2024
Continuous Approximations for Improving Quantization Aware Training of LLMs
He Li
Jianhang Hong
Yuanzhuo Wu
Snehal Adbol
Zonglin Li
MQ
240
2
0
06 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
633
49
0
06 Oct 2024
Previous
1
2
3
...
8
9
10
...
17
18
19
Next