ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXivPDFHTML

Papers citing "GLU Variants Improve Transformer"

50 / 647 papers shown
Title
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
Panpan Wang
Liqiang Niu
Fandong Meng
Jinan Xu
Yufeng Chen
Jie Zhou
DiffM
45
0
0
21 Mar 2025
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Andrea Maracani
Savas Ozkan
Sijun Cho
Hyowon Kim
Eunchung Noh
Jeongwon Min
Cho Jung Min
Dookun Park
Mete Ozay
38
0
0
20 Mar 2025
Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
Daniel Haziza
Timothy Chou
Dhruv Choudhary
Luca Wehrstedt
Francisco Massa
Jiecao Yu
Geonhwa Jeong
Supriya Rao
Patrick Labatut
Jesse Cai
42
0
0
20 Mar 2025
Gene42: Long-Range Genomic Foundation Model With Dense Attention
Gene42: Long-Range Genomic Foundation Model With Dense Attention
Kirill Vishniakov
Boulbaba Ben Amor
Engin Tekin
Nancy A. ElNaker
Karthik Viswanathan
...
Tiago Magalhaes
Natalia Vassilieva
Dwarikanath Mahapatra
Marco Pimentel
and Shadab Khan
3DV
39
0
0
20 Mar 2025
Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens
Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens
Shuqi Lu
Haowei Lin
Lin Yao
Zhifeng Gao
Xiaohong Ji
W. Elwasif
Linfeng Zhang
Guolin Ke
43
0
0
20 Mar 2025
ACE: A Cardinality Estimator for Set-Valued Queries
ACE: A Cardinality Estimator for Set-Valued Queries
Yufan Sheng
Xin Cao
Kaiqi Zhao
Yixiang Fang
Jianzhong Qi
Wenjie Zhang
Christian S. Jensen
55
0
0
19 Mar 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
59
1
0
18 Mar 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
M. Beck
Korbinian Poppel
Phillip Lippe
Richard Kurle
P. Blies
G. Klambauer
Sebastian Böck
Sepp Hochreiter
LRM
40
1
0
17 Mar 2025
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Yuanze Li
Shihao Yuan
Haolin Wang
Qizhang Li
Ming-Yu Liu
Chen Xu
Guangming Shi
Wangmeng Zuo
56
0
0
17 Mar 2025
HAR-DoReMi: Optimizing Data Mixture for Self-Supervised Human Activity Recognition Across Heterogeneous IMU Datasets
HAR-DoReMi: Optimizing Data Mixture for Self-Supervised Human Activity Recognition Across Heterogeneous IMU Datasets
Lulu Ban
Tao Zhu
Xiangqing Lu
Qi Qiu
Wenyong Han
Shuangjian Li
L. Chen
Kevin I-Kai Wang
Mingxing Nie
Yaping Wan
59
0
0
16 Mar 2025
Text Compression for Efficient Language Generation
David Gu
Peter Belcak
Roger Wattenhofer
52
0
0
14 Mar 2025
Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models
Hongyang Wei
S. Liu
C. Yuan
L. Zhang
42
0
0
14 Mar 2025
Direction-Aware Diagonal Autoregressive Image Generation
Direction-Aware Diagonal Autoregressive Image Generation
Yijia Xu
Jianzhong Ju
Jian Luan
J. Cui
52
0
0
14 Mar 2025
Autoregressive Image Generation with Vision Full-view Prompt
Autoregressive Image Generation with Vision Full-view Prompt
Miaomiao Cai
G. Wang
Wei Li
Zhijun Tu
Hanting Chen
Shaohui Lin
Jie Hu
LRM
60
0
0
13 Mar 2025
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Mari Ashiga
Wei Jie
Fan Wu
Vardan K. Voskanyan
Fateme Dinmohammadi
P. Brookes
Jingzhi Gong
Zheng Wang
38
0
0
13 Mar 2025
Autoregressive Image Generation with Randomized Parallel Decoding
Haopeng Li
Jinyue Yang
Guoqi Li
Huan Wang
53
0
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
66
0
0
13 Mar 2025
Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining
Mikey Shechter
Yair Carmon
CLIP
42
0
0
11 Mar 2025
The Space Between: On Folding, Symmetries and Sampling
Michal Lewandowski
Bernhard Heinzl
Raphael Pisoni
Bernhard A.Moser
55
0
0
11 Mar 2025
YOLOE: Real-Time Seeing Anything
Ao Wang
Lihao Liu
Hui Chen
Zijia Lin
J. Han
Guiguang Ding
VLM
ObjD
72
1
0
10 Mar 2025
MELON: Multimodal Mixture-of-Experts with Spectral-Temporal Fusion for Long-Term Mobility Estimation in Critical Care
MELON: Multimodal Mixture-of-Experts with Spectral-Temporal Fusion for Long-Term Mobility Estimation in Critical Care
Jiaqing Zhang
Miguel Contreras
Jessica Sena
Andrea Davidson
Yuanfang Ren
...
T. Ozrazgat-Baslanti
Tyler J. Loftus
Subhash Nerella
A. Bihorac
Parisa Rashidi
49
0
0
10 Mar 2025
Small Vision-Language Models: A Survey on Compact Architectures and Techniques
Nitesh Patnaik
Navdeep Nayak
Himani Bansal Agrawal
Moinak Chinmoy Khamaru
Gourav Bal
Saishree Smaranika Panda
Rishi Raj
Vishal Meena
Kartheek Vadlamani
VLM
56
0
0
09 Mar 2025
Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior
Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior
Xianjie Liu
Keren Fu
Qijun Zhao
MDE
52
0
0
08 Mar 2025
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities
Yunfan Jiang
Ruohan Zhang
J. Wong
Chen Wang
Yanjie Ze
Hang Yin
Cem Gokmen
Shuran Song
Jiajun Wu
L. Fei-Fei
67
5
0
07 Mar 2025
EuroBERT: Scaling Multilingual Encoders for European Languages
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Duarte M. Alves
André F. T. Martins
Ayoub Hammal
...
Maxime Peyrard
Nuno M. Guerreiro
Patrick Fernandes
Ricardo Rei
Pierre Colombo
102
1
0
07 Mar 2025
Mixture of Experts Made Intrinsically Interpretable
Xingyi Yang
Constantin Venhoff
Ashkan Khakzar
Christian Schroeder de Witt
P. Dokania
Adel Bibi
Philip H. S. Torr
MoE
49
0
0
05 Mar 2025
SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture
Hocheol Lim
Hyein Cho
Jeonghoon Kim
67
0
0
04 Mar 2025
GPIoT: Tailoring Small Language Models for IoT Program Synthesis and Development
Leming Shen
Qiang Yang
Xinyu Huang
Zijing Ma
Yuanqing Zheng
27
1
0
02 Mar 2025
Proteina: Scaling Flow-based Protein Structure Generative Models
Tomas Geffner
Kieran Didi
Zuobai Zhang
Danny Reidenbach
Zhonglin Cao
...
Mario Geiger
Christian Dallago
E. Küçükbenli
Arash Vahdat
Karsten Kreis
DiffM
AI4CE
41
4
0
02 Mar 2025
Synthetic data enables context-aware bioacoustic sound event detection
Benjamin Hoffman
David Robinson
Marius Miron
V. Baglione
D. Canestrari
Damian Elias
Eva Trapote
Olivier Pietquin
32
0
0
01 Mar 2025
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
Yihong Dong
G. Li
Xue Jiang
Yongding Tao
Kechi Zhang
...
Huanyu Liu
Jiazheng Ding
Jia Li
Jinliang Deng
Hong Mei
AI4TS
41
0
0
28 Feb 2025
Protein Structure Tokenization: Benchmarking and New Recipe
Xinyu Yuan
Zichen Wang
Marcus Collins
Huzefa Rangwala
36
0
0
28 Feb 2025
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective
Yuko Nakagi
Keigo Tada
Sota Yoshino
Shinji Nishimoto
Yu Takagi
LRM
37
0
0
28 Feb 2025
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Taishi Nakamura
Takuya Akiba
Kazuki Fujii
Yusuke Oda
Rio Yokota
Jun Suzuki
MoMe
MoE
75
1
0
26 Feb 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
M. Lee
Shinbok Lee
Gaeun Seo
88
1
0
26 Feb 2025
(Mis)Fitting: A Survey of Scaling Laws
(Mis)Fitting: A Survey of Scaling Laws
Margaret Li
Sneha Kudugunta
Luke Zettlemoyer
69
2
0
26 Feb 2025
NeoBERT: A Next-Generation BERT
NeoBERT: A Next-Generation BERT
Lola Le Breton
Quentin Fournier
Mariam El Mezouar
Sarath Chandar
AI4TS
60
1
0
26 Feb 2025
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik
Tim Lawson
Conor Houghton
Laurence Aitchison
54
0
0
25 Feb 2025
Dual Classification Head Self-training Network for Cross-scene Hyperspectral Image Classification
Dual Classification Head Self-training Network for Cross-scene Hyperspectral Image Classification
Rong Liu
Junye Liang
Jiaqi Yang
Jiang He
Peng Zhu
71
5
0
25 Feb 2025
Patient Trajectory Prediction: Integrating Clinical Notes with Transformers
Patient Trajectory Prediction: Integrating Clinical Notes with Transformers
Sifal Klioui
Sana Sellami
Youssef Trardi
71
0
0
25 Feb 2025
Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems
Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems
Maksim Zhdanov
Max Welling
Jan Willem van de Meent
AI4CE
42
1
0
24 Feb 2025
Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps
Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps
Yen-Che Hsiao
Abhishek Dutta
LRM
ReLM
ELM
54
0
0
24 Feb 2025
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI
Syed Abdul Gaffar Shakhadri
Kruthika KR
Kartik Basavaraj Angadi
VLM
50
0
0
24 Feb 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu (Allen) Zhang
Gaojie Jin
X. Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
35
0
0
24 Feb 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
50
5
0
21 Feb 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu-Xi Cheng
KELM
75
3
0
19 Feb 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
B. Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Z. Wu
LM&MA
ELM
AI4MH
37
4
0
18 Feb 2025
Mixture of Attention Yields Accurate Results for Tabular Data
Mixture of Attention Yields Accurate Results for Tabular Data
Xuechen Li
Yupeng Li
Jian Liu
Xiaolin Jin
Tian Yang
Xin Hu
47
0
0
18 Feb 2025
Frequency-Aware Masked Autoencoders for Human Activity Recognition using Accelerometers
Frequency-Aware Masked Autoencoders for Human Activity Recognition using Accelerometers
Niels R. Lorenzen
P. Jennum
Emmanuel Mignot
A. Brink-Kjaer
31
0
0
17 Feb 2025
Understanding Silent Data Corruption in LLM Training
Understanding Silent Data Corruption in LLM Training
Jeffrey Ma
Hengzhi Pei
Leonard Lausen
George Karypis
37
0
0
17 Feb 2025
Previous
12345...111213
Next