Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"GLU Variants Improve Transformer"
50 / 904 papers shown
Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think -- Introducing AI Detectability Index
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Megha Chakraborty
S.M. Towhidul Islam Tonmoy
S. M. Mehedi
Krish Sharma
Niyar R. Barman
...
Tanay Kumar
Vinija Jain
Vasu Sharma
Amit P. Sheth
Amitava Das
DeLMO
197
26
0
08 Oct 2023
The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Vipula Rawte
Swagata Chakraborty
Agnibh Pathak
Anubhav Sarkar
S.M. Towhidul Islam Tonmoy
Vasu Sharma
Mikel Artetxe
Punit Daniel Simig
HILM
314
182
0
08 Oct 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
International Conference on Learning Representations (ICLR), 2023
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
490
100
0
06 Oct 2023
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion
Neural Information Processing Systems (NeurIPS), 2023
Filip Szatkowski
Eric Elmoznino
Younesse Kaddar
Simone Scardapane
MoE
265
12
0
06 Oct 2023
Predicting Emergent Abilities with Infinite Resolution Evaluation
International Conference on Learning Representations (ICLR), 2023
Shengding Hu
Xin Liu
Xu Han
Xinrong Zhang
Chaoqun He
...
Ning Ding
Zebin Ou
Guoyang Zeng
Zhiyuan Liu
Maosong Sun
ELM
LRM
290
26
0
05 Oct 2023
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
International Conference on Machine Learning (ICML), 2023
Praneeth Kacham
Vahab Mirrokni
Peilin Zhong
219
18
0
02 Oct 2023
Multilingual Natural Language Processing Model for Radiology Reports -- The Summary is all you need!
Mariana Lindo
Ana Sofia Santos
André Ferreira
Jianning Li
Gijs Luijten
...
Cornelius Deuschl
Johannes Haubold
Jens Kleesiek
Jan Egger
Victor Alves
LM&MA
275
2
0
29 Sep 2023
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
797
3,067
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
International Conference on Learning Representations (ICLR), 2023
Albert Mohwald
249
26
0
28 Sep 2023
Introducing DictaLM -- A Large Generative Language Model for Modern Hebrew
Shaltiel Shmidman
Avi Shmidman
Amir DN Cohen
Moshe Koppel
139
1
0
25 Sep 2023
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression
Ayush Kaushal
Tejas Vaidhya
Irina Rish
359
26
0
25 Sep 2023
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Nolan Dey
Daria Soboleva
Faisal Al-Khateeb
Bowen Yang
Ribhu Pathria
...
Robert Myers
Jacob Robert Steeves
Natalia Vassilieva
Marvin Tom
Joel Hestness
MoE
246
18
0
20 Sep 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
Willie Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric Xing
437
69
0
19 Sep 2023
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch
Science China Information Sciences (Sci China Inf Sci), 2023
Juntao Li
Zecheng Tang
Yuyang Ding
Pinzheng Wang
Pei Guo
...
Wenliang Chen
Guohong Fu
Qiaoming Zhu
Guodong Zhou
Hao Fei
370
8
0
19 Sep 2023
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELM
LRM
803
923
0
19 Sep 2023
AMuRD: Annotated Arabic-English Receipt Dataset for Key Information Extraction and Classification
Abdelrahman Abdallah
Mahmoud Abdalla
Ibrahim Abdelhalim
Mohamed Elkasaby
Adam Jatowt
144
1
0
18 Sep 2023
XGen-7B Technical Report
Erik Nijkamp
Tian Xie
Hiroaki Hayashi
Bo Pang
Congying Xia
...
Chien-Sheng Wu
Silvio Savarese
Yingbo Zhou
Shafiq Joty
Caiming Xiong
ALM
216
15
0
07 Sep 2023
Language Models for Novelty Detection in System Call Traces
Quentin Fournier
Daniel Aloise
Leandro R. Costa
AI4TS
205
5
0
05 Sep 2023
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Daoyuan Chen
Yilun Huang
Zhijian Ma
Hesen Chen
Xuchen Pan
...
Zhaoyang Liu
Jinyang Gao
Yaliang Li
Bolin Ding
Jingren Zhou
SyDa
VLM
297
59
0
05 Sep 2023
LLM and Infrastructure as a Code use case
Thibault Chanus
Michael Aubertin
120
3
0
04 Sep 2023
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
Neha Sengupta
Sunil Kumar Sahu
Bokang Jia
Satheesh Katipomu
Jinyan Su
...
A. Jackson
Hector Xuguang Ren
Preslav Nakov
Timothy Baldwin
Eric P. Xing
LRM
381
61
0
30 Aug 2023
Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts
The European Symposium on Artificial Neural Networks (ESANN), 2023
Thanh Thi Nguyen
Campbell Wilson
Janis Dalins
116
35
0
28 Aug 2023
Aligning Language Models with Offline Learning from Human Feedback
Jian Hu
Li Tao
J. Yang
Chandler Zhou
ALM
OffRL
313
11
0
23 Aug 2023
Cabrita: closing the gap for foreign languages
Celio H. N. Larcher
Marcos Piau
Paulo Finardi
P. Gengo
P. Esposito
Vinicius Fernandes Caridá
CLL
108
35
0
23 Aug 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
625
32
0
23 Aug 2023
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models
IEEE Transactions on Artificial Intelligence (IEEE TAI), 2023
Zihan Zhao
Yiyang Jiang
Heyang Liu
Yanfeng Wang
Yu Wang
320
12
0
20 Aug 2023
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Neural Information Processing Systems (NeurIPS), 2023
Minsoo Kim
Sihwa Lee
Jangwhan Lee
S. Hong
Duhyeuk Chang
Wonyong Sung
Jungwook Choi
MQ
157
22
0
13 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
275
3
0
07 Aug 2023
A Novel Convolutional Neural Network Architecture with a Continuous Symmetry
CAAI International Conference on Artificial Intelligence (ICCAI), 2023
Y. Liu
Han-Juan Shao
Bing Bai
AI4CE
332
3
0
03 Aug 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
8.2K
15,302
0
18 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Neural Information Processing Systems (NeurIPS), 2023
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
424
58
0
12 Jul 2023
A Comprehensive Overview of Large Language Models
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Lin Wang
OffRL
858
1,200
0
12 Jul 2023
ReLoRA: High-Rank Training Through Low-Rank Updates
International Conference on Learning Representations (ICLR), 2023
Vladislav Lialin
Namrata Shivagunde
Sherin Muckatira
Anna Rumshisky
BDL
513
178
0
11 Jul 2023
Self-supervised adversarial masking for 3D point cloud representation learning
Asian Conference on Intelligent Information and Database Systems (ACIIDS), 2023
Michal Szachniewicz
Wojciech Kozlowski
Michal Stypulkowski
Maciej Ziȩba
3DPC
160
2
0
11 Jul 2023
On decoder-only architecture for speech-to-text and large language model integration
Automatic Speech Recognition & Understanding (ASRU), 2023
Jian Wu
Yashesh Gaur
Zhuo Chen
Long Zhou
Yilun Zhu
...
Jinyu Li
Shujie Liu
Bo Ren
Linquan Liu
Yu-Huan Wu
AuLLM
532
186
0
08 Jul 2023
Trainable Transformer in Transformer
International Conference on Machine Learning (ICML), 2023
A. Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
VLM
353
14
0
03 Jul 2023
Leveraging Cross-Utterance Context For ASR Decoding
Interspeech (Interspeech), 2023
Robert Flynn
Anton Ragni
191
1
0
29 Jun 2023
Reconstructing the Hemodynamic Response Function via a Bimodal Transformer
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023
Yoni Choukroun
Lior Golgher
P. Blinder
L. Wolf
MedIm
84
0
0
28 Jun 2023
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
Zhihan Zhou
Yanrong Ji
Weijian Li
Pratik Dutta
R. Davuluri
Han Liu
289
309
0
26 Jun 2023
Towards Stability of Autoregressive Neural Operators
Michael McCabe
P. Harrington
Shashank Subramanian
Jed Brown
AI4CE
413
35
0
18 Jun 2023
Recurrent Action Transformer with Memory
A. Staroverov
A. Bessonov
Dmitry A. Yudin
A. Kovalev
Aleksandr I. Panov
OffRL
392
13
0
15 Jun 2023
Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant
Xianbiao Qi
Jianan Wang
Lei Zhang
200
0
0
15 Jun 2023
AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks
Alexander Tornede
Difan Deng
Theresa Eimer
Joseph Giovanelli
Aditya Mohan
...
Sarah Segel
Daphne Theodorakopoulos
Tanja Tornede
Henning Wachsmuth
Marius Lindauer
324
36
0
13 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling
Neural Information Processing Systems (NeurIPS), 2023
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
LRM
209
70
0
01 Jun 2023
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Neural Information Processing Systems (NeurIPS), 2023
Gen Luo
Weihao Ye
Tianhe Ren
Shen Chen
Xiaoshuai Sun
Rongrong Ji
VLM
MLLM
294
134
0
24 May 2023
Just CHOP: Embarrassingly Simple LLM Compression
A. Jha
Tom Sherborne
Evan Pete Walsh
Dirk Groeneveld
Emma Strubell
Iz Beltagy
234
4
0
24 May 2023
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2023
Abhinav Jangda
Saeed Maleki
M. Dehnavi
Madan Musuvathi
Olli Saarikivi
156
9
0
22 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
585
154
0
18 May 2023
Less is More! A slim architecture for optimal language translation
Luca Herranz-Celotti
E. Rrapaj
96
0
0
18 May 2023
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels
Alexander Moreno
Jonathan Mei
Luke Walters
223
0
0
15 May 2023
Previous
1
2
3
...
15
16
17
18
19
Next