Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"GLU Variants Improve Transformer"
50 / 904 papers shown
ACE: A Cardinality Estimator for Set-Valued Queries
Proceedings of the VLDB Endowment (PVLDB), 2025
Yufan Sheng
Xin Cao
Kaiqi Zhao
Yixiang Fang
Jianzhong Qi
Wenjie Zhang
Christian S. Jensen
324
0
0
19 Mar 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
451
8
0
18 Mar 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
M. Beck
Korbinian Poppel
Phillip Lippe
Richard Kurle
P. Blies
Günter Klambauer
Sebastian Böck
Sepp Hochreiter
LRM
274
11
0
17 Mar 2025
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Yuanze Li
Shihao Yuan
Haolin Wang
Qizhang Li
Ming-Yu Liu
Chen Xu
Guangming Shi
Wangmeng Zuo
300
4
0
17 Mar 2025
HAR-DoReMi: Optimizing Data Mixture for Self-Supervised Human Activity Recognition Across Heterogeneous IMU Datasets
Lulu Ban
Tao Zhu
Xiangqing Lu
Qi Qiu
Wenyong Han
Shuangjian Li
L. Chen
Kevin I-Kai Wang
Mingxing Nie
Yaping Wan
418
2
0
16 Mar 2025
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
Leqi Shen
Guoqiang Gong
Tao He
Yifeng Zhang
Pengzhang Liu
Sicheng Zhao
Guiguang Ding
VLM
410
14
0
14 Mar 2025
Direction-Aware Diagonal Autoregressive Image Generation
Yijia Xu
Jianzhong Ju
Jian Luan
J. Cui
409
4
0
14 Mar 2025
Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models
Hongyang Wei
Shixuan Liu
C. Yuan
Guang Dai
178
12
0
14 Mar 2025
Text Compression for Efficient Language Generation
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
David Gu
Peter Belcak
Roger Wattenhofer
242
1
0
14 Mar 2025
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Mari Ashiga
Wei Jie
Fan Wu
Vardan K. Voskanyan
Fateme Dinmohammadi
P. Brookes
Jingzhi Gong
Zheng Wang
337
8
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
527
12
0
13 Mar 2025
Autoregressive Image Generation with Vision Full-view Prompt
Miaomiao Cai
G. Wang
Wei Li
Zhijun Tu
Hanting Chen
Shaohui Lin
Jie Hu
LRM
450
0
0
13 Mar 2025
Autoregressive Image Generation with Randomized Parallel Decoding
Haopeng Li
Jinyue Yang
Guoqi Li
Huan Wang
273
7
0
13 Mar 2025
Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining
Mikey Shechter
Yair Carmon
CLIP
379
1
0
11 Mar 2025
The Space Between: On Folding, Symmetries and Sampling
Michal Lewandowski
Bernhard Heinzl
Raphael Pisoni
Bernhard A.Moser
245
0
0
11 Mar 2025
MELON: Multimodal Mixture-of-Experts with Spectral-Temporal Fusion for Long-Term Mobility Estimation in Critical Care
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Jiaqing Zhang
Miguel Contreras
Jessica Sena
Andrea Davidson
Yuanfang Ren
...
T. Ozrazgat-Baslanti
Tyler J. Loftus
Subhash Nerella
A. Bihorac
Parisa Rashidi
356
1
0
10 Mar 2025
YOLOE: Real-Time Seeing Anything
Ao Wang
Lihao Liu
Hui Chen
Zijia Lin
Jiawei Han
Guiguang Ding
VLM
ObjD
543
34
0
10 Mar 2025
Small Vision-Language Models: A Survey on Compact Architectures and Techniques
Nitesh Patnaik
Navdeep Nayak
Himani Bansal Agrawal
Moinak Chinmoy Khamaru
Gourav Bal
Saishree Smaranika Panda
Rishi Raj
Vishal Meena
Kartheek Vadlamani
VLM
268
3
0
09 Mar 2025
High-Precision Dichotomous Image Segmentation via Depth Integrity-Prior and Fine-Grained Patch Strategy
Xianjie Liu
Keren Fu
Qijun Zhao
MDE
564
0
0
08 Mar 2025
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities
Yunfan Jiang
Ruohan Zhang
J. Wong
Chen Wang
Yanjie Ze
Hang Yin
Cem Gokmen
Shuran Song
Jiajun Wu
L. Fei-Fei
367
29
0
07 Mar 2025
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Duarte M. Alves
André F. T. Martins
Ayoub Hammal
...
Maxime Peyrard
Nuno M. Guerreiro
Patrick Fernandes
Ricardo Rei
Pierre Colombo
1.1K
15
0
07 Mar 2025
Mixture of Experts Made Intrinsically Interpretable
Xingyi Yang
Constantin Venhoff
Ashkan Khakzar
Christian Schroeder de Witt
P. Dokania
Adel Bibi
Juil Sock
MoE
327
10
0
05 Mar 2025
SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture
Hocheol Lim
Hyein Cho
Jeonghoon Kim
242
1
0
04 Mar 2025
Proteina: Scaling Flow-based Protein Structure Generative Models
International Conference on Learning Representations (ICLR), 2025
Tomas Geffner
Kieran Didi
Zuobai Zhang
Danny Reidenbach
Zhonglin Cao
...
Mario Geiger
Christian Dallago
E. Küçükbenli
Arash Vahdat
Karsten Kreis
DiffM
AI4CE
305
54
0
02 Mar 2025
GPIoT: Tailoring Small Language Models for IoT Program Synthesis and Development
ACM International Conference on Embedded Networked Sensor Systems (SenSys), 2025
Leming Shen
Qiang Yang
Xinyu Huang
Zijing Ma
Yuanqing Zheng
246
14
0
02 Mar 2025
Synthetic data enables context-aware bioacoustic sound event detection
Benjamin Hoffman
David Robinson
Marius Miron
V. Baglione
D. Canestrari
...
Eva Trapote
Olivier Pietquin
M. Cusimano
Masato Hagiwara
Olivier Pietquin
417
2
0
01 Mar 2025
Protein Structure Tokenization: Benchmarking and New Recipe
Xinyu Yuan
Zichen Wang
Marcus Collins
Huzefa Rangwala
234
6
0
28 Feb 2025
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective
Yuko Nakagi
Keigo Tada
Sota Yoshino
Shinji Nishimoto
Yu Takagi
LRM
363
3
0
28 Feb 2025
Reasoning is Periodicity? Improving Large Language Models Through Effective Periodicity Modeling
Yihong Dong
Ge Li
Xue Jiang
Yongding Tao
Kechi Zhang
...
Huanyu Liu
Jiazheng Ding
Jia Li
Jinliang Deng
Hong Mei
AI4TS
563
2
0
28 Feb 2025
(Mis)Fitting: A Survey of Scaling Laws
Margaret Li
Sneha Kudugunta
Luke Zettlemoyer
413
12
0
26 Feb 2025
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
International Conference on Learning Representations (ICLR), 2025
Taishi Nakamura
Takuya Akiba
Kazuki Fujii
Yusuke Oda
Rio Yokota
Jun Suzuki
MoMe
MoE
336
8
0
26 Feb 2025
NeoBERT: A Next-Generation BERT
Lola Le Breton
Quentin Fournier
Mariam El Mezouar
John X. Morris
Sarath Chandar
AI4TS
348
8
0
26 Feb 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
MinHyung Lee
Shinbok Lee
Gaeun Seo
364
13
0
26 Feb 2025
Patient Trajectory Prediction: Integrating Clinical Notes with Transformers
Sifal Klioui
Sana Sellami
Youssef Trardi
295
0
0
25 Feb 2025
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik
Tim Lawson
Conor Houghton
Laurence Aitchison
329
6
0
25 Feb 2025
Dual Classification Head Self-training Network for Cross-scene Hyperspectral Image Classification
Rong Liu
Junye Liang
Jiaqi Yang
Jiang He
Peng Zhu
273
6
0
25 Feb 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu Zhang
Gaojie Jin
Xianrui Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
358
6
0
24 Feb 2025
Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems
Maksim Zhdanov
Max Welling
Jan-Willem van de Meent
AI4CE
325
15
0
24 Feb 2025
Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps
Yen-Che Hsiao
Abhishek Dutta
LRM
ReLM
ELM
255
1
0
24 Feb 2025
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Chengyin Xu
Kaiyuan Chen
Xiao Li
Ke Shen
Chenggang Li
OffRL
649
3
0
24 Feb 2025
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI
Syed Abdul Gaffar Shakhadri
Kruthika KR
Kartik Basavaraj Angadi
VLM
186
0
0
24 Feb 2025
Predictive Modeling: BIM Command Recommendation Based on Large-scale Usage Logs
Advanced Engineering Informatics (AEI), 2025
Changyu Du
Zihan Deng
Stavros Nousias
André Borrmann
AI4CE
217
1
0
23 Feb 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
International Conference on Learning Representations (ICLR), 2025
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
336
22
0
21 Feb 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu Cheng
KELM
560
15
0
19 Feb 2025
Multi-branch of Attention Yields Accurate Results for Tabular Data
Xuechen Li
Yupeng Li
Jian Liu
Xiaolin Jin
Tian Yang
253
0
0
18 Feb 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
Binghai Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Zhikai Wu
LM&MA
ELM
AI4MH
384
32
0
18 Feb 2025
Understanding Silent Data Corruption in LLM Training
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jeffrey Ma
Hengzhi Pei
Leonard Lausen
George Karypis
218
7
0
17 Feb 2025
Frequency-Aware Masked Autoencoders for Human Activity Recognition using Accelerometers
Niels R. Lorenzen
P. Jennum
Emmanuel Mignot
A. Brink-Kjaer
208
0
0
17 Feb 2025
Blessing of Multilinguality: A Systematic Analysis of Multilingual In-Context Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yilei Tu
Andrew Xue
Freda Shi
401
1
0
17 Feb 2025
Large Language Diffusion Models
Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin
Ji-Rong Wen
Chongxuan Li
1.1K
323
0
14 Feb 2025
Previous
1
2
3
...
6
7
8
...
17
18
19
Next