ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXivPDFHTML

Papers citing "GLU Variants Improve Transformer"

50 / 647 papers shown
Title
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
22
0
0
14 May 2025
Large Language Models for Computer-Aided Design: A Survey
Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang
Bach Le
Naveed Akhtar
Siew-Kei Lam
Tuan Ngo
3DV
AI4CE
32
0
0
13 May 2025
Circuit Partitioning Using Large Language Models for Quantum Compilation and Simulations
Circuit Partitioning Using Large Language Models for Quantum Compilation and Simulations
Pranav Sinha
Sumit Kumar Jha
Sunny Raj
31
0
0
12 May 2025
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity
Guang Yan
Yuhui Zhang
Zimu Guo
Lutan Zhao
Xiaojun Chen
Chen Wang
Wenhao Wang
Dan Meng
Rui Hou
29
0
0
12 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Z. Qiu
Z. Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
20
0
0
10 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Y. Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
43
0
0
09 May 2025
Faster MoE LLM Inference for Extremely Large Models
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
61
0
0
06 May 2025
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
Hanyu Hu
Xiaoming Yuan
46
0
0
06 May 2025
Bielik 11B v2 Technical Report
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
29
0
0
05 May 2025
Bielik v3 Small: Technical Report
Bielik v3 Small: Technical Report
Krzysztof Ociepa
Łukasz Flis
Remigiusz Kinas
Krzysztof Wróbel
Adrian Gwoździej
27
0
0
05 May 2025
Parameter-Efficient Transformer Embeddings
Parameter-Efficient Transformer Embeddings
Henry Ndubuaku
Mouad Talhi
24
0
0
04 May 2025
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
Zayd Muhammad Kawakibi Zuhri
Erland Hilman Fuadi
Alham Fikri Aji
31
0
0
29 Apr 2025
Blockbuster, Part 1: Block-level AI Operator Fusion
Blockbuster, Part 1: Block-level AI Operator Fusion
Ofer Dekel
14
0
0
29 Apr 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
47
1
0
28 Apr 2025
CasaGPT: Cuboid Arrangement and Scene Assembly for Interior Design
CasaGPT: Cuboid Arrangement and Scene Assembly for Interior Design
Weitao Feng
Hang Zhou
Jing Liao
Li Cheng
Wenbo Zhou
3DV
58
0
0
28 Apr 2025
Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities
Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities
Xi Fu
Wei-Bang Jiang
Yi Ding
Cuntai Guan
41
0
0
28 Apr 2025
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Sandipan Dhar
N. D. Jana
Swagatam Das
43
0
0
27 Apr 2025
SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations
SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations
Shuting Zhao
Linxin Bai
Liangjing Shao
Ye Zhang
Xinrong Chen
24
0
0
25 Apr 2025
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
Hongyu Wang
Shuming Ma
Furu Wei
MQ
48
1
0
25 Apr 2025
GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning
GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning
Luu Quy Tung
Hoang Quoc Viet
Vo Trong Thu
LRM
27
0
0
23 Apr 2025
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
Fengze Liu
Weidong Zhou
Binbin Liu
Zhimiao Yu
Yifan Zhang
...
Yifeng Yu
Bingni Zhang
Xiaohuan Zhou
Taifeng Wang
Yong Cao
55
0
0
23 Apr 2025
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies
Bartosz Piotrowski
Witold Drzewakowski
Konrad Staniszewski
Piotr Miłoś
LRM
36
0
0
23 Apr 2025
Kolmogorov-Arnold Networks: Approximation and Learning Guarantees for Functions and their Derivatives
Kolmogorov-Arnold Networks: Approximation and Learning Guarantees for Functions and their Derivatives
Anastasis Kratsios
Takashi Furuya
27
0
0
21 Apr 2025
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park
Dalton Jones
Matt Morse
Raghavv Goel
Mingu Lee
Chris Lott
22
0
0
21 Apr 2025
Trillion 7B Technical Report
Trillion 7B Technical Report
Sungjun Han
Juyoung Suk
Suyeong An
Hyungguk Kim
Kyuseok Kim
Wonsuk Yang
Seungtaek Choi
Jamin Shin
78
0
0
21 Apr 2025
Natural Fingerprints of Large Language Models
Natural Fingerprints of Large Language Models
Teppei Suzuki
Ryokan Ri
Sho Takase
28
0
0
21 Apr 2025
Kuwain 1.5B: An Arabic SLM via Language Injection
Kuwain 1.5B: An Arabic SLM via Language Injection
Khalil Hennara
Sara Chrouf
Mohamed Motaism Hamed
Zeina Aldallal
Omar Hadid
Safwan AlModhayan
29
1
0
21 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
29
0
0
19 Apr 2025
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Ashwinee Panda
Vatsal Baherwani
Zain Sarwar
Benjamin Thérien
Supriyo Chakraborty
Tom Goldstein
MoE
37
0
0
16 Apr 2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Wissam Antoun
B. Sagot
Djamé Seddah
MQ
35
0
0
11 Apr 2025
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
ViT
27
0
0
11 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
C. Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
79
2
0
10 Apr 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
38
106
0
10 Apr 2025
On Model and Data Scaling for Skeleton-based Self-Supervised Gait Recognition
On Model and Data Scaling for Skeleton-based Self-Supervised Gait Recognition
Adrian Cosma
Andy Catruna
Emilian Radoi
31
0
0
10 Apr 2025
A Novel Mamba-based Sequential Recommendation Method
A Novel Mamba-based Sequential Recommendation Method
Jun Yuan
Mamba
109
0
0
10 Apr 2025
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Pedro Hermosilla
Christian Stippel
Leon Sick
SSL
3DPC
74
0
0
09 Apr 2025
Foundation Models for Time Series: A Survey
Foundation Models for Time Series: A Survey
Siva Rama Krishna Kottapalli
Karthik Hubli
Sandeep Chandrashekhara
Garima Jain
Sunayana Hubli
Gayathri Botla
Ramesh Doddaiah
AI4TS
AI4CE
23
0
0
05 Apr 2025
Clinical ModernBERT: An efficient and long context encoder for biomedical text
Clinical ModernBERT: An efficient and long context encoder for biomedical text
Simon A. Lee
Anthony Wu
Jeffrey N. Chiang
MedIm
42
3
0
04 Apr 2025
Compositionality Unlocks Deep Interpretable Models
Compositionality Unlocks Deep Interpretable Models
Thomas Dooms
Ward Gauderis
Geraint A. Wiggins
José Oramas
FAtt
CoGe
AI4CE
59
0
0
03 Apr 2025
Multi-Token Attention
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
48
1
0
01 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
39
0
0
29 Mar 2025
GmNet: Revisiting Gating Mechanisms From A Frequency View
GmNet: Revisiting Gating Mechanisms From A Frequency View
Yifan Wang
Xu Ma
Yitian Zhang
Zhongruo Wang
Sung-Cheol Kim
Vahid Mirjalili
Vidya Renganathan
Y. Fu
36
0
0
28 Mar 2025
Named Entity Recognition in Context
Named Entity Recognition in Context
Colin Brisson
Ayoub Kahfy
Marc Bui
Frédéric Constant
54
0
0
26 Mar 2025
IgCraft: A versatile sequence generation framework for antibody discovery and engineering
IgCraft: A versatile sequence generation framework for antibody discovery and engineering
Matthew Greenig
Haowen Zhao
Vladimir Radenkovic
Aubin Ramon
Pietro Sormanni
44
0
0
25 Mar 2025
Ab-initio simulation of excited-state potential energy surfaces with transferable deep quantum Monte Carlo
Ab-initio simulation of excited-state potential energy surfaces with transferable deep quantum Monte Carlo
Zeno Schätzle
P. Szabó
Alice Cuzzocrea
Frank Noé
40
0
0
25 Mar 2025
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Zhanda Zhu
Christina Giannoula
Muralidhar Andoorveedu
Qidong Su
Karttikeya Mangalam
Bojian Zheng
Gennady Pekhimenko
VLM
MoE
49
0
0
24 Mar 2025
Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning
Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning
Xiang Fang
S. Zhang
Hao Zhang
Tao Lu
Huabing Zhou
Jiayi Ma
Mamba
75
0
0
23 Mar 2025
Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters
Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters
Roberto Garcia
Jerry Liu
Daniel Sorvisto
Sabri Eyuboglu
90
0
0
23 Mar 2025
TRACE: Time SeRies PArameter EffiCient FinE-tuning
TRACE: Time SeRies PArameter EffiCient FinE-tuning
Yuze Li
Wei Zhu
AI4TS
73
0
0
21 Mar 2025
Variance Control via Weight Rescaling in LLM Pre-training
Variance Control via Weight Rescaling in LLM Pre-training
Louis Owen
Abhay Kumar
Nilabhra Roy Chowdhury
Fabian Güra
31
0
0
21 Mar 2025
1234...111213
Next