ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown
Exploring the Benefit of Activation Sparsity in Pre-training
Exploring the Benefit of Activation Sparsity in Pre-trainingInternational Conference on Machine Learning (ICML), 2024
Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin
Zhiyuan Zeng
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
238
6
0
04 Oct 2024
Exploring the Limitations of Mamba in COPY and CoT Reasoning
Exploring the Limitations of Mamba in COPY and CoT Reasoning
Ruifeng Ren
Zhicong Li
Yong Liu
254
2
0
04 Oct 2024
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for
  Embodied AI
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI
Ahmad Elawady
Gunjan Chhablani
Ram Ramrakhya
Karmesh Yadav
Dhruv Batra
Z. Kira
Andrew Szot
OffRL
344
2
0
03 Oct 2024
Selective Attention Improves Transformer
Selective Attention Improves TransformerInternational Conference on Learning Representations (ICLR), 2024
Yaniv Leviathan
Matan Kalman
Yossi Matias
357
20
0
03 Oct 2024
Neutral Residues: Revisiting Adapters for Model Extension
Neutral Residues: Revisiting Adapters for Model Extension
Franck Signe Talla
Edouard Grave
Edouard Grave
367
2
0
03 Oct 2024
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Training Language Models on Synthetic Edit Sequences Improves Code SynthesisInternational Conference on Learning Representations (ICLR), 2024
Ulyana Piterbarg
Lerrel Pinto
Rob Fergus
SyDa
448
7
0
03 Oct 2024
FutureFill: Fast Generation from Convolutional Sequence Models
FutureFill: Fast Generation from Convolutional Sequence Models
Naman Agarwal
Xinyi Chen
Evan Dogariu
Devan Shah
Daniel Suo
Vlad Feinberg
Elad Hazan
Peter L. Bartlett
Elad Hazan
AI4TSMQ
260
5
0
02 Oct 2024
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge AcquisitionInternational Conference on Learning Representations (ICLR), 2024
Jiyeon Kim
Hyunji Lee
Hyowon Cho
Joel Jang
Hyeonbin Hwang
Seungpil Won
Youbin Ahn
Dohaeng Lee
Minjoon Seo
KELM
1.0K
15
0
02 Oct 2024
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Philipp Mondorf
Sondre Wold
Yun Xue
501
2
0
02 Oct 2024
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
Xi Chen
Kaituo Feng
Changsheng Li
Xunhao Lai
Xiangyu Yue
Ye Yuan
Guoren Wang
308
31
0
02 Oct 2024
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
Yuandong Tian
440
4
0
02 Oct 2024
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report
  Generation on CheXpert Plus Dataset
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus DatasetComputer Vision and Pattern Recognition (CVPR), 2024
Xiao Wang
Fuling Wang
Yuehang Li
Qingchuan Ma
Shiao Wang
Bo Jiang
Chuanfu Li
Jin Tang
345
15
0
01 Oct 2024
End-to-end Piano Performance-MIDI to Score Conversion with Transformers
End-to-end Piano Performance-MIDI to Score Conversion with TransformersInternational Society for Music Information Retrieval Conference (ISMIR), 2024
T. Beyer
Angela Dai
218
9
0
30 Sep 2024
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Yejin Lee
Anna Y. Sun
Basil Hosmer
Bilge Acun
Can Balioglu
...
Ram Pasunuru
Scott Yih
Sravya Popuri
Xing Liu
Carole-Jean Wu
475
5
0
30 Sep 2024
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs
Mehdi Ali
Michael Fromm
Klaudia Thellmann
Jan Ebert
Alexander Arno Weber
...
René Jäkel
Georg Rehm
Stefan Kesselheim
Joachim Kohler
Nicolas Flores-Herr
323
14
0
30 Sep 2024
Efficient Long-Form Speech Recognition for General Speech In-Context
  Learning
Efficient Long-Form Speech Recognition for General Speech In-Context LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Hao Yen
Shaoshi Ling
Guoli Ye
164
0
0
29 Sep 2024
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
290
483
0
27 Sep 2024
Investigating OCR-Sensitive Neurons to Improve Entity Recognition in
  Historical Documents
Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical DocumentsInternational Conference on Asian Digital Libraries (ICADL), 2024
Emanuela Boros
Maud Ehrmann
240
0
0
25 Sep 2024
The Credibility Transformer
The Credibility TransformerEuropean Actuarial Journal (EAJ), 2024
Ronald Richman
Salvatore Scognamiglio
M. Wüthrich
207
6
0
25 Sep 2024
Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based
  Low-light Image Enhancement
Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement
Guanlin Li
Ke Zhang
Ting Wang
Ming Li
Bin Zhao
Xuelong Li
187
4
0
25 Sep 2024
EuroLLM: Multilingual Language Models for Europe
EuroLLM: Multilingual Language Models for Europe
Pedro Henrique Martins
Patrick Fernandes
Joao Alves
Nuno M. Guerreiro
Ricardo Rei
...
Pierre Colombo
Barry Haddow
José G. C. de Souza
Alexandra Birch
André F. T. Martins
228
81
0
24 Sep 2024
dnaGrinder: a lightweight and high-capacity genomic foundation model
dnaGrinder: a lightweight and high-capacity genomic foundation model
Qihang Zhao
Chi Zhang
Weixiong Zhang
183
3
0
24 Sep 2024
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of ExpertsInternational Conference on Learning Representations (ICLR), 2024
Xiaoming Shi
Shiyu Wang
Yuqi Nie
Dianqi Li
Zhou Ye
Qingsong Wen
Ming Jin
AI4TS
633
169
0
24 Sep 2024
Domino: Eliminating Communication in LLM Training via Generic Tensor
  Slicing and Overlapping
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping
Guanhua Wang
Chengming Zhang
Sihan Chen
Ang Li
Olatunji Ruwase
175
12
0
23 Sep 2024
Enhancing Aspect-based Sentiment Analysis in Tourism Using Large
  Language Models and Positional Information
Enhancing Aspect-based Sentiment Analysis in Tourism Using Large Language Models and Positional Information
Chun Xu
Mengmeng Wang
Yan Ren
Shaolin Zhu
219
10
0
23 Sep 2024
Is Tokenization Needed for Masked Particle Modelling?
Is Tokenization Needed for Masked Particle Modelling?
Matthew Leigh
Samuel Klein
François Charton
Tobias Golling
Lukas Heinrich
Michael Kagan
Ines Ochoa
Margarita Osadchy
238
18
0
19 Sep 2024
Mastering Chess with a Transformer Model
Mastering Chess with a Transformer Model
Daniel Monroe
The Leela Chess Zero Team
246
11
0
18 Sep 2024
Kolmogorov-Arnold Transformer
Kolmogorov-Arnold Transformer
Xingyi Yang
Xinchao Wang
258
80
0
16 Sep 2024
Cross-modality image synthesis from TOF-MRA to CTA using diffusion-based
  models
Cross-modality image synthesis from TOF-MRA to CTA using diffusion-based models
Alexander Koch
O. U. Aydin
A. Hilbert
Jana Rieger
Satoru Tanioka
F. Ishida
Dietmar Frey
DiffMMedIm
210
2
0
16 Sep 2024
Flash STU: Fast Spectral Transform Units
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
474
3
0
16 Sep 2024
Ruri: Japanese General Text Embeddings
Ruri: Japanese General Text Embeddings
Hayato Tsukagoshi
Ryohei Sasano
144
2
0
12 Sep 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Gated Slot Attention for Efficient Linear-Time Sequence ModelingNeural Information Processing Systems (NeurIPS), 2024
Yu Zhang
Aaron Courville
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
297
49
0
11 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
611
104
0
06 Sep 2024
Attention Heads of Large Language Models: A Survey
Attention Heads of Large Language Models: A SurveyPatterns (Patterns), 2024
Zifan Zheng
Yezhaohui Wang
Yuxin Huang
Chenyang Xi
Junchi Yan
Bo Tang
Feiyu Xiong
Zhiyu Li
LRM
287
65
0
05 Sep 2024
CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and
  Selective Sparsification
CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective SparsificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Junhui He
Shangyu Wu
Weidong Wen
Chun Jason Xue
Qingan Li
99
8
0
02 Sep 2024
CogVLM2: Visual Language Models for Image and Video Understanding
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLMMLLM
303
198
0
29 Aug 2024
Nexus: Specialization meets Adaptability for Efficiently Training
  Mixture of Experts
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
Nikolas Gritsch
Qizhen Zhang
Acyr Locatelli
Sara Hooker
Ahmet Üstün
MoE
213
7
0
28 Aug 2024
SpineMamba: Enhancing 3D Spinal Segmentation in Clinical Imaging through
  Residual Visual Mamba Layers and Shape Priors
SpineMamba: Enhancing 3D Spinal Segmentation in Clinical Imaging through Residual Visual Mamba Layers and Shape Priors
Zhiqing Zhang
Tianyong Liu
Guojia Fan
Bin Li
Qianjin Feng
Shoujun Zhou
Mamba
228
2
0
28 Aug 2024
Flexible Control in Symbolic Music Generation via Musical Metadata
Flexible Control in Symbolic Music Generation via Musical Metadata
Sangjun Han
Jiwon Ham
Chaeeun Lee
Heejin Kim
Soojong Do
Sihyuk Yi
Jun Seo
Seoyoon Kim
Yountae Jung
Woohyung Lim
235
0
0
28 Aug 2024
Legilimens: Practical and Unified Content Moderation for Large Language
  Model Services
Legilimens: Practical and Unified Content Moderation for Large Language Model ServicesConference on Computer and Communications Security (CCS), 2024
Jialin Wu
Jiangyi Deng
Shengyuan Pang
Yanjiao Chen
Jiayang Xu
Xinfeng Li
Wei Dong
356
12
0
28 Aug 2024
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and
  Deduplication by Introducing a Competitive Large Language Model Baseline
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Bin Cui
Zheng Liang
Yiding Sun
Da Pan
Zhuoran Zhang
...
Bingning Wang
Wentao Zhang
Jiaxin Mao
Guosheng Dong
Weipeng Chen
ALM
210
4
0
27 Aug 2024
CLLMFS: A Contrastive Learning enhanced Large Language Model Framework
  for Few-Shot Named Entity Recognition
CLLMFS: A Contrastive Learning enhanced Large Language Model Framework for Few-Shot Named Entity RecognitionEuropean Conference on Artificial Intelligence (ECAI), 2024
Yafeng Zhang
Zilan Yu
Yuang Huang
Jing Tang
214
3
0
23 Aug 2024
Transfusion: Predict the Next Token and Diffuse Images with One
  Multi-Modal Model
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou
Lili Yu
Arun Babu
Kushal Tirumala
Michihiro Yasunaga
Leonid Shamis
Jacob Kahn
Xuezhe Ma
Luke Zettlemoyer
Omer Levy
DiffM
265
294
0
20 Aug 2024
Beyond Labels: Aligning Large Language Models with Human-like Reasoning
Beyond Labels: Aligning Large Language Models with Human-like ReasoningInternational Conference on Pattern Recognition (ICPR), 2024
Muhammad Rafsan Kabir
Rafeed Mohammad Sultan
Ihsanul Haque Asif
Jawad Ibn Ahad
Fuad Rahman
Mohammad Ruhul Amin
Nabeel Mohammed
Shafin Rahman
LRM
190
7
0
20 Aug 2024
To Code, or Not To Code? Exploring Impact of Code in Pre-training
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Viraat Aryabumi
Yixuan Su
Raymond Ma
Adrien Morisot
Ivan Zhang
Acyr Locatelli
Marzieh Fadaee
Ahmet Üstün
Sara Hooker
SyDaAI4CE
274
39
0
20 Aug 2024
Performance Law of Large Language Models
Performance Law of Large Language Models
Chuhan Wu
Ruiming Tang
LRM
301
7
0
19 Aug 2024
OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction
OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction
Zhonghang Li
Long Xia
Lei Shi
Yong-mei Xu
D. Yin
Chao Huang
VLMAI4TSAI4CE
199
23
0
16 Aug 2024
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
Sayna Ebrahimi
Sercan O. Arik
Tejas Nama
Tomas Pfister
188
4
0
13 Aug 2024
Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors
Fast-and-Frugal Text-Graph Transformers are Effective Link PredictorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Andrei Catalin Coman
Christos Theodoropoulos
Marie-Francine Moens
James Henderson
458
0
0
13 Aug 2024
FuxiTranyu: A Multilingual Large Language Model Trained with Balanced
  Data
FuxiTranyu: A Multilingual Large Language Model Trained with Balanced DataConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Haoran Sun
Renren Jin
Shaoyang Xu
Leiyu Pan
Supryadi
...
Lei Yang
Ling Shi
Juesi Xiao
Shaolin Zhu
Deyi Xiong
215
11
0
12 Aug 2024
Previous
123...91011...171819
Next
Page 10 of 19
Pageof 19