Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.07921
Cited By
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
AAAI Conference on Artificial Intelligence (AAAI), 2024
28 January 2025
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Merino: Entropy-driven Design for Generative Language Models on IoT Devices"
38 / 38 papers shown
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu
Qinghao Hu
Shang Yang
Haocheng Xi
Junyu Chen
Song Han
Han Cai
313
19
0
21 Aug 2025
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Nolan Dey
Gurpreet Gosal
Zhiming Chen
Chen
Hemant Khachane
William Marshall
Ribhu Pathria
Marvin Tom
Joel Hestness
MoE
LRM
392
129
0
06 Apr 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
International Conference on Machine Learning (ICML), 2023
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
500
1,740
0
03 Apr 2023
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
Computer Vision and Pattern Recognition (CVPR), 2023
Xuan Shen
Yaohua Wang
Ming Lin
Yi-Li Huang
Hao Tang
Xiuyu Sun
Yanzhi Wang
528
42
0
05 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
20.0K
19,109
0
27 Feb 2023
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLM
OSLM
AI4CE
1.0K
4,591
0
02 May 2022
Training-free Transformer Architecture Search
Computer Vision and Pattern Recognition (CVPR), 2022
Qinqin Zhou
Kekai Sheng
Xiawu Zheng
Ke Li
Xing Sun
Yonghong Tian
Jie Chen
Rongrong Ji
ViT
209
57
0
23 Mar 2022
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Jack W. Rae
Sebastian Borgeaud
Trevor Cai
Katie Millican
Jordan Hoffmann
...
Jeff Stanway
L. Bennett
Demis Hassabis
Koray Kavukcuoglu
G. Irving
609
1,562
0
08 Dec 2021
MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection
International Conference on Machine Learning (ICML), 2021
Zhenhong Sun
Ming Lin
Xiuyu Sun
Zhiyu Tan
Hao Li
Rong Jin
405
40
0
26 Nov 2021
A Short Study on Compressing Decoder-Based Language Models
Tianda Li
Yassir El Mesbahi
I. Kobyzev
Ahmad Rashid
A. Mahmud
Nithin Anchuri
Habib Hajimolahoseini
Yang Liu
Mehdi Rezagholizadeh
309
30
0
16 Oct 2021
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Yichun Yin
Cheng Chen
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
VLM
204
52
0
29 Jul 2021
The Principles of Deep Learning Theory
Daniel A. Roberts
Sho Yaida
Boris Hanin
FaML
PINN
GNN
428
279
0
18 Jun 2021
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search
Knowledge Discovery and Data Mining (KDD), 2021
Jin Xu
Xu Tan
Renqian Luo
Kaitao Song
Jian Li
Tao Qin
Tie-Yan Liu
MQ
182
92
0
30 May 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
1.1K
2,701
0
31 Dec 2020
Evaluating Efficient Performance Estimators of Neural Architectures
Neural Information Processing Systems (NeurIPS), 2020
Xuefei Ning
Changcheng Tang
Wenshuo Li
Zixuan Zhou
Shuang Liang
Huazhong Yang
Yu Wang
679
89
0
07 Aug 2020
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning
Jian Liu
Leyang Cui
Hanmeng Liu
Dandan Huang
Yile Wang
Yue Zhang
RALM
293
445
0
16 Jul 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Hanrui Wang
Zhanghao Wu
Zhijian Liu
Han Cai
Ligeng Zhu
Chuang Gan
Song Han
317
284
0
28 May 2020
Language Models are Few-Shot Learners
Neural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.3K
55,939
0
28 May 2020
ReZero is All You Need: Fast Convergence at Large Depth
Conference on Uncertainty in Artificial Intelligence (UAI), 2020
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
497
362
0
10 Mar 2020
Transformers without Tears: Improving the Normalization of Self-Attention
International Workshop on Spoken Language Translation (IWSLT), 2019
Toan Q. Nguyen
Julian Salazar
370
260
0
14 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
3.4K
9,369
0
02 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
International Conference on Learning Representations (ICLR), 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
1.5K
7,332
0
26 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Findings (Findings), 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
738
2,269
0
23 Sep 2019
PubMedQA: A Dataset for Biomedical Research Question Answering
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
1.0K
1,468
0
13 Sep 2019
Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Biao Zhang
Ivan Titov
Rico Sennrich
242
119
0
29 Aug 2019
Once-for-All: Train One Network and Specialize it for Efficient Deployment
International Conference on Learning Representations (ICLR), 2019
Han Cai
Chuang Gan
Tianzhe Wang
Zhekai Zhang
Song Han
OOD
654
1,498
0
26 Aug 2019
Patient Knowledge Distillation for BERT Model Compression
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
415
942
0
25 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
6.0K
28,988
0
26 Jul 2019
Learning Deep Transformer Models for Machine Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Yang Li
Lidia S. Chao
348
759
0
05 Jun 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
871
3,831
0
19 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Neural Information Processing Systems (NeurIPS), 2019
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
914
2,704
0
02 May 2019
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Han Cai
Ligeng Zhu
Song Han
740
2,023
0
02 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
3.1K
112,182
0
11 Oct 2018
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov
Peter Clark
Tushar Khot
Ashish Sabharwal
1.3K
2,240
0
08 Sep 2018
Attention Is All You Need
Neural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
8.2K
171,167
0
12 Jun 2017
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
4.1K
224,064
0
10 Dec 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
934
23,444
0
09 Mar 2015
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
Interspeech (Interspeech), 2013
Ciprian Chelba
Tomas Mikolov
M. Schuster
Qi Ge
T. Brants
P. Koehn
T. Robinson
710
1,168
0
11 Dec 2013
1
Page 1 of 1