Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GLU Variants Improve Transformer"
50 / 647 papers shown
Title
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
22
0
0
14 May 2025
Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang
Bach Le
Naveed Akhtar
Siew-Kei Lam
Tuan Ngo
3DV
AI4CE
32
0
0
13 May 2025
Circuit Partitioning Using Large Language Models for Quantum Compilation and Simulations
Pranav Sinha
Sumit Kumar Jha
Sunny Raj
31
0
0
12 May 2025
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity
Guang Yan
Yuhui Zhang
Zimu Guo
Lutan Zhao
Xiaojun Chen
Chen Wang
Wenhao Wang
Dan Meng
Rui Hou
29
0
0
12 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Z. Qiu
Z. Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
20
0
0
10 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Y. Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
43
0
0
09 May 2025
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
61
0
0
06 May 2025
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
Hanyu Hu
Xiaoming Yuan
46
0
0
06 May 2025
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
29
0
0
05 May 2025
Bielik v3 Small: Technical Report
Krzysztof Ociepa
Łukasz Flis
Remigiusz Kinas
Krzysztof Wróbel
Adrian Gwoździej
27
0
0
05 May 2025
Parameter-Efficient Transformer Embeddings
Henry Ndubuaku
Mouad Talhi
24
0
0
04 May 2025
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
Zayd Muhammad Kawakibi Zuhri
Erland Hilman Fuadi
Alham Fikri Aji
31
0
0
29 Apr 2025
Blockbuster, Part 1: Block-level AI Operator Fusion
Ofer Dekel
14
0
0
29 Apr 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
47
1
0
28 Apr 2025
CasaGPT: Cuboid Arrangement and Scene Assembly for Interior Design
Weitao Feng
Hang Zhou
Jing Liao
Li Cheng
Wenbo Zhou
3DV
58
0
0
28 Apr 2025
Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities
Xi Fu
Wei-Bang Jiang
Yi Ding
Cuntai Guan
41
0
0
28 Apr 2025
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Sandipan Dhar
N. D. Jana
Swagatam Das
43
0
0
27 Apr 2025
SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations
Shuting Zhao
Linxin Bai
Liangjing Shao
Ye Zhang
Xinrong Chen
24
0
0
25 Apr 2025
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
Hongyu Wang
Shuming Ma
Furu Wei
MQ
48
1
0
25 Apr 2025
GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning
Luu Quy Tung
Hoang Quoc Viet
Vo Trong Thu
LRM
27
0
0
23 Apr 2025
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
Fengze Liu
Weidong Zhou
Binbin Liu
Zhimiao Yu
Yifan Zhang
...
Yifeng Yu
Bingni Zhang
Xiaohuan Zhou
Taifeng Wang
Yong Cao
55
0
0
23 Apr 2025
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies
Bartosz Piotrowski
Witold Drzewakowski
Konrad Staniszewski
Piotr Miłoś
LRM
36
0
0
23 Apr 2025
Kolmogorov-Arnold Networks: Approximation and Learning Guarantees for Functions and their Derivatives
Anastasis Kratsios
Takashi Furuya
27
0
0
21 Apr 2025
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park
Dalton Jones
Matt Morse
Raghavv Goel
Mingu Lee
Chris Lott
22
0
0
21 Apr 2025
Trillion 7B Technical Report
Sungjun Han
Juyoung Suk
Suyeong An
Hyungguk Kim
Kyuseok Kim
Wonsuk Yang
Seungtaek Choi
Jamin Shin
78
0
0
21 Apr 2025
Natural Fingerprints of Large Language Models
Teppei Suzuki
Ryokan Ri
Sho Takase
28
0
0
21 Apr 2025
Kuwain 1.5B: An Arabic SLM via Language Injection
Khalil Hennara
Sara Chrouf
Mohamed Motaism Hamed
Zeina Aldallal
Omar Hadid
Safwan AlModhayan
29
1
0
21 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
29
0
0
19 Apr 2025
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Ashwinee Panda
Vatsal Baherwani
Zain Sarwar
Benjamin Thérien
Supriyo Chakraborty
Tom Goldstein
MoE
37
0
0
16 Apr 2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Wissam Antoun
B. Sagot
Djamé Seddah
MQ
35
0
0
11 Apr 2025
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
ViT
27
0
0
11 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
C. Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
79
2
0
10 Apr 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
38
106
0
10 Apr 2025
On Model and Data Scaling for Skeleton-based Self-Supervised Gait Recognition
Adrian Cosma
Andy Catruna
Emilian Radoi
31
0
0
10 Apr 2025
A Novel Mamba-based Sequential Recommendation Method
Jun Yuan
Mamba
109
0
0
10 Apr 2025
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Pedro Hermosilla
Christian Stippel
Leon Sick
SSL
3DPC
74
0
0
09 Apr 2025
Foundation Models for Time Series: A Survey
Siva Rama Krishna Kottapalli
Karthik Hubli
Sandeep Chandrashekhara
Garima Jain
Sunayana Hubli
Gayathri Botla
Ramesh Doddaiah
AI4TS
AI4CE
23
0
0
05 Apr 2025
Clinical ModernBERT: An efficient and long context encoder for biomedical text
Simon A. Lee
Anthony Wu
Jeffrey N. Chiang
MedIm
42
3
0
04 Apr 2025
Compositionality Unlocks Deep Interpretable Models
Thomas Dooms
Ward Gauderis
Geraint A. Wiggins
José Oramas
FAtt
CoGe
AI4CE
59
0
0
03 Apr 2025
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
48
1
0
01 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
39
0
0
29 Mar 2025
GmNet: Revisiting Gating Mechanisms From A Frequency View
Yifan Wang
Xu Ma
Yitian Zhang
Zhongruo Wang
Sung-Cheol Kim
Vahid Mirjalili
Vidya Renganathan
Y. Fu
36
0
0
28 Mar 2025
Named Entity Recognition in Context
Colin Brisson
Ayoub Kahfy
Marc Bui
Frédéric Constant
54
0
0
26 Mar 2025
IgCraft: A versatile sequence generation framework for antibody discovery and engineering
Matthew Greenig
Haowen Zhao
Vladimir Radenkovic
Aubin Ramon
Pietro Sormanni
44
0
0
25 Mar 2025
Ab-initio simulation of excited-state potential energy surfaces with transferable deep quantum Monte Carlo
Zeno Schätzle
P. Szabó
Alice Cuzzocrea
Frank Noé
40
0
0
25 Mar 2025
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Zhanda Zhu
Christina Giannoula
Muralidhar Andoorveedu
Qidong Su
Karttikeya Mangalam
Bojian Zheng
Gennady Pekhimenko
VLM
MoE
49
0
0
24 Mar 2025
Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning
Xiang Fang
S. Zhang
Hao Zhang
Tao Lu
Huabing Zhou
Jiayi Ma
Mamba
75
0
0
23 Mar 2025
Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters
Roberto Garcia
Jerry Liu
Daniel Sorvisto
Sabri Eyuboglu
90
0
0
23 Mar 2025
TRACE: Time SeRies PArameter EffiCient FinE-tuning
Yuze Li
Wei Zhu
AI4TS
73
0
0
21 Mar 2025
Variance Control via Weight Rescaling in LLM Pre-training
Louis Owen
Abhay Kumar
Nilabhra Roy Chowdhury
Fabian Güra
31
0
0
21 Mar 2025
1
2
3
4
...
11
12
13
Next