ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXivPDFHTML

Papers citing "GLU Variants Improve Transformer"

50 / 647 papers shown
Title
Selective Attention Improves Transformer
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
49
8
0
03 Oct 2024
Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in
  Neural Nets
Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets
Yuandong Tian
52
0
0
02 Oct 2024
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank
  Constraint?
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
Xi Chen
Kaituo Feng
Changsheng Li
Xunhao Lai
Xiangyu Yue
Ye Yuan
Guoren Wang
39
7
0
02 Oct 2024
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
Jiyeon Kim
Hyunji Lee
Hyowon Cho
Joel Jang
Hyeonbin Hwang
Seungpil Won
Youbin Ahn
Dohaeng Lee
Minjoon Seo
KELM
82
3
0
02 Oct 2024
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Philipp Mondorf
Sondre Wold
Barbara Plank
34
0
0
02 Oct 2024
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report
  Generation on CheXpert Plus Dataset
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset
Xiao Wang
Fuling Wang
Yuehang Li
Qingchuan Ma
Shiao Wang
Bo Jiang
Chuanfu Li
Jin Tang
29
2
0
01 Oct 2024
End-to-end Piano Performance-MIDI to Score Conversion with Transformers
End-to-end Piano Performance-MIDI to Score Conversion with Transformers
T. Beyer
Angela Dai
30
0
0
30 Sep 2024
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs
Mehdi Ali
Michael Fromm
Klaudia Thellmann
Jan Ebert
Alexander Arno Weber
...
René Jäkel
Georg Rehm
Stefan Kesselheim
Joachim Köhler
Nicolas Flores-Herr
64
6
0
30 Sep 2024
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Yejin Lee
Anna Y. Sun
Basil Hosmer
Bilge Acun
Can Balioglu
...
Ram Pasunuru
Scott Yih
Sravya Popuri
Xing Liu
Carole-Jean Wu
52
2
0
30 Sep 2024
Efficient Long-Form Speech Recognition for General Speech In-Context
  Learning
Efficient Long-Form Speech Recognition for General Speech In-Context Learning
Hao Yen
Shaoshi Ling
Guoli Ye
21
0
0
29 Sep 2024
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
34
152
0
27 Sep 2024
Investigating OCR-Sensitive Neurons to Improve Entity Recognition in
  Historical Documents
Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents
Emanuela Boros
Maud Ehrmann
31
0
0
25 Sep 2024
The Credibility Transformer
The Credibility Transformer
Ronald Richman
Salvatore Scognamiglio
M. Wüthrich
21
1
0
25 Sep 2024
Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based
  Low-light Image Enhancement
Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement
Guanlin Li
Ke Zhang
Ting Wang
Ming Li
Bin Zhao
Xuelong Li
14
0
0
25 Sep 2024
EuroLLM: Multilingual Language Models for Europe
EuroLLM: Multilingual Language Models for Europe
Pedro Henrique Martins
Patrick Fernandes
Joao Alves
Nuno M. Guerreiro
Ricardo Rei
...
Pierre Colombo
Barry Haddow
José G. C. de Souza
Alexandra Birch
André F. T. Martins
29
16
0
24 Sep 2024
dnaGrinder: a lightweight and high-capacity genomic foundation model
dnaGrinder: a lightweight and high-capacity genomic foundation model
Qihang Zhao
Chi Zhang
Weixiong Zhang
26
0
0
24 Sep 2024
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
X. Shi
Shiyu Wang
Yuqi Nie
Dianqi Li
Zhou Ye
Qingsong Wen
Ming Jin
AI4TS
34
26
0
24 Sep 2024
Domino: Eliminating Communication in LLM Training via Generic Tensor
  Slicing and Overlapping
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping
Guanhua Wang
Chengming Zhang
Zheyu Shen
Ang Li
Olatunji Ruwase
26
3
0
23 Sep 2024
Enhancing Aspect-based Sentiment Analysis in Tourism Using Large
  Language Models and Positional Information
Enhancing Aspect-based Sentiment Analysis in Tourism Using Large Language Models and Positional Information
Chun Xu
Mengmeng Wang
Yan Ren
Shaolin Zhu
13
1
0
23 Sep 2024
Is Tokenization Needed for Masked Particle Modelling?
Is Tokenization Needed for Masked Particle Modelling?
Matthew Leigh
Samuel Klein
François Charton
Tobias Golling
Lukas Heinrich
Michael Kagan
Ines Ochoa
Margarita Osadchy
25
7
0
19 Sep 2024
Mastering Chess with a Transformer Model
Mastering Chess with a Transformer Model
Daniel Monroe
The Leela Chess Zero Team
19
3
0
18 Sep 2024
Kolmogorov-Arnold Transformer
Kolmogorov-Arnold Transformer
Xingyi Yang
Xinchao Wang
39
15
0
16 Sep 2024
Cross-modality image synthesis from TOF-MRA to CTA using diffusion-based
  models
Cross-modality image synthesis from TOF-MRA to CTA using diffusion-based models
Alexander Koch
O. U. Aydin
A. Hilbert
Jana Rieger
Satoru Tanioka
F. Ishida
Dietmar Frey
DiffM
MedIm
31
1
0
16 Sep 2024
Flash STU: Fast Spectral Transform Units
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
64
1
0
16 Sep 2024
Ruri: Japanese General Text Embeddings
Ruri: Japanese General Text Embeddings
Hayato Tsukagoshi
Ryohei Sasano
24
0
0
12 Sep 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Yu Zhang
Songlin Yang
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
60
16
0
11 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
48
51
0
06 Sep 2024
Attention Heads of Large Language Models: A Survey
Attention Heads of Large Language Models: A Survey
Zifan Zheng
Yezhaohui Wang
Yuxin Huang
Shichao Song
Mingchuan Yang
Bo Tang
Feiyu Xiong
Zhiyu Li
LRM
52
21
0
05 Sep 2024
CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and
  Selective Sparsification
CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification
Junhui He
Shangyu Wu
Weidong Wen
Chun Jason Xue
Qingan Li
21
5
0
02 Sep 2024
CogVLM2: Visual Language Models for Image and Video Understanding
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
MLLM
45
88
0
29 Aug 2024
Nexus: Specialization meets Adaptability for Efficiently Training
  Mixture of Experts
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
Nikolas Gritsch
Qizhen Zhang
Acyr F. Locatelli
Sara Hooker
A. Ustun
MoE
50
1
0
28 Aug 2024
SpineMamba: Enhancing 3D Spinal Segmentation in Clinical Imaging through
  Residual Visual Mamba Layers and Shape Priors
SpineMamba: Enhancing 3D Spinal Segmentation in Clinical Imaging through Residual Visual Mamba Layers and Shape Priors
Zhiqing Zhang
Tianyong Liu
Guojia Fan
Bin Li
Qianjin Feng
Shoujun Zhou
Mamba
29
1
0
28 Aug 2024
Flexible Control in Symbolic Music Generation via Musical Metadata
Flexible Control in Symbolic Music Generation via Musical Metadata
Sangjun Han
Jiwon Ham
Chaeeun Lee
Heejin Kim
Soojong Do
Sihyuk Yi
Jun Seo
Seoyoon Kim
Yountae Jung
Woohyung Lim
35
0
0
28 Aug 2024
Legilimens: Practical and Unified Content Moderation for Large Language
  Model Services
Legilimens: Practical and Unified Content Moderation for Large Language Model Services
Jialin Wu
Jiangyi Deng
Shengyuan Pang
Yanjiao Chen
Jiayang Xu
Xinfeng Li
Wenyuan Xu
32
6
0
28 Aug 2024
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and
  Deduplication by Introducing a Competitive Large Language Model Baseline
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Guosheng Dong
Da Pan
Yiding Sun
Shusen Zhang
Zheng Liang
...
Bingning Wang
Wentao Zhang
Jiaxin Mao
Zenan Zhou
Weipeng Chen
ALM
38
2
0
27 Aug 2024
CLLMFS: A Contrastive Learning enhanced Large Language Model Framework
  for Few-Shot Named Entity Recognition
CLLMFS: A Contrastive Learning enhanced Large Language Model Framework for Few-Shot Named Entity Recognition
Yafeng Zhang
Zilan Yu
Yuang Huang
Jing Tang
28
2
0
23 Aug 2024
Transfusion: Predict the Next Token and Diffuse Images with One
  Multi-Modal Model
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou
Lili Yu
Arun Babu
Kushal Tirumala
Michihiro Yasunaga
Leonid Shamis
Jacob Kahn
Xuezhe Ma
Luke Zettlemoyer
Omer Levy
DiffM
28
147
0
20 Aug 2024
Beyond Labels: Aligning Large Language Models with Human-like Reasoning
Beyond Labels: Aligning Large Language Models with Human-like Reasoning
Muhammad Rafsan Kabir
Rafeed Mohammad Sultan
Ihsanul Haque Asif
Jawad Ibn Ahad
Fuad Rahman
Mohammad Ruhul Amin
Nabeel Mohammed
Shafin Rahman
LRM
33
2
0
20 Aug 2024
To Code, or Not To Code? Exploring Impact of Code in Pre-training
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Viraat Aryabumi
Yixuan Su
Raymond Ma
Adrien Morisot
Ivan Zhang
Acyr F. Locatelli
Marzieh Fadaee
A. Ustun
Sara Hooker
SyDa
AI4CE
40
18
0
20 Aug 2024
Performance Law of Large Language Models
Performance Law of Large Language Models
Chuhan Wu
Ruiming Tang
LRM
40
2
0
19 Aug 2024
OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction
OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction
Zhonghang Li
Long Xia
Lei Shi
Yong-mei Xu
Dawei Yin
Chao Huang
VLM
AI4TS
AI4CE
38
7
0
16 Aug 2024
Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors
Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors
Andrei Catalin Coman
Christos Theodoropoulos
Marie-Francine Moens
James Henderson
44
0
0
13 Aug 2024
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
Sayna Ebrahimi
Sercan Ö. Arik
Tejas Nama
Tomas Pfister
39
1
0
13 Aug 2024
FuxiTranyu: A Multilingual Large Language Model Trained with Balanced
  Data
FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data
Haoran Sun
Renren Jin
Shaoyang Xu
Leiyu Pan
Supryadi
...
Lei Yang
Ling Shi
Juesi Xiao
Shaolin Zhu
Deyi Xiong
57
0
0
12 Aug 2024
Retrieval-augmented code completion for local projects using large
  language models
Retrieval-augmented code completion for local projects using large language models
Marko Hostnik
Marko Robnik-Sikonja
RALM
27
0
0
09 Aug 2024
Diffusion Guided Language Modeling
Diffusion Guided Language Modeling
Justin Lovelace
Varsha Kishore
Yiwei Chen
Kilian Q. Weinberger
36
6
0
08 Aug 2024
wav2graph: A Framework for Supervised Learning Knowledge Graph from
  Speech
wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech
Khai Le-Duc
Quy-Anh Dang
Tan-Hanh Pham
Truong Son-Hy
32
0
0
08 Aug 2024
EXAONE 3.0 7.8B Instruction Tuned Language Model
EXAONE 3.0 7.8B Instruction Tuned Language Model
LG AI Research
:
Soyoung An
Kyunghoon Bae
Eunbi Choi
...
Boseong Seo
Sihoon Yang
Heuiyeen Yeen
Kyungjae Yoo
Hyeongu Yun
ELM
ALM
52
10
0
07 Aug 2024
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech
  Separation and Enhancement
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement
Kohei Saijo
G. Wichern
François G. Germain
Zexu Pan
Jonathan Le Roux
35
7
0
06 Aug 2024
MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial
  Optimization
MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial Optimization
Andoni I. Garmendia
Quentin Cappart
Josu Ceberio
A. Mendiburu
29
2
0
05 Aug 2024
Previous
123456...111213
Next