ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07921
  4. Cited By
Merino: Entropy-driven Design for Generative Language Models on IoT Devices

Merino: Entropy-driven Design for Generative Language Models on IoT Devices

AAAI Conference on Artificial Intelligence (AAAI), 2024
28 January 2025
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
ArXiv (abs)PDFHTMLGithub

Papers citing "Merino: Entropy-driven Design for Generative Language Models on IoT Devices"

38 / 38 papers shown
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu
Qinghao Hu
Shang Yang
Haocheng Xi
Junyu Chen
Song Han
Han Cai
313
19
0
21 Aug 2025
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the
  Cerebras Wafer-Scale Cluster
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Nolan Dey
Gurpreet Gosal
Zhiming Chen
Chen
Hemant Khachane
William Marshall
Ribhu Pathria
Marvin Tom
Joel Hestness
MoELRM
392
129
0
06 Apr 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and
  Scaling
Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingInternational Conference on Machine Learning (ICML), 2023
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
500
1,740
0
03 Apr 2023
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural
  Network
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural NetworkComputer Vision and Pattern Recognition (CVPR), 2023
Xuan Shen
Yaohua Wang
Ming Lin
Yi-Li Huang
Hao Tang
Xiuyu Sun
Yanzhi Wang
528
42
0
05 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
20.0K
19,109
0
27 Feb 2023
OPT: Open Pre-trained Transformer Language Models
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLMOSLMAI4CE
1.0K
4,591
0
02 May 2022
Training-free Transformer Architecture Search
Training-free Transformer Architecture SearchComputer Vision and Pattern Recognition (CVPR), 2022
Qinqin Zhou
Kekai Sheng
Xiawu Zheng
Ke Li
Xing Sun
Yonghong Tian
Jie Chen
Rongrong Ji
ViT
209
57
0
23 Mar 2022
Scaling Language Models: Methods, Analysis & Insights from Training
  Gopher
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Jack W. Rae
Sebastian Borgeaud
Trevor Cai
Katie Millican
Jordan Hoffmann
...
Jeff Stanway
L. Bennett
Demis Hassabis
Koray Kavukcuoglu
G. Irving
609
1,562
0
08 Dec 2021
MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for
  Efficient Object Detection
MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object DetectionInternational Conference on Machine Learning (ICML), 2021
Zhenhong Sun
Ming Lin
Xiuyu Sun
Zhiyu Tan
Hao Li
Rong Jin
405
40
0
26 Nov 2021
A Short Study on Compressing Decoder-Based Language Models
A Short Study on Compressing Decoder-Based Language Models
Tianda Li
Yassir El Mesbahi
I. Kobyzev
Ahmad Rashid
A. Mahmud
Nithin Anchuri
Habib Hajimolahoseini
Yang Liu
Mehdi Rezagholizadeh
309
30
0
16 Oct 2021
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient
  Pre-trained Language Models
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Yichun Yin
Cheng Chen
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
VLM
204
52
0
29 Jul 2021
The Principles of Deep Learning Theory
The Principles of Deep Learning Theory
Daniel A. Roberts
Sho Yaida
Boris Hanin
FaMLPINNGNN
428
279
0
18 Jun 2021
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural
  Architecture Search
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture SearchKnowledge Discovery and Data Mining (KDD), 2021
Jin Xu
Xu Tan
Renqian Luo
Kaitao Song
Jian Li
Tao Qin
Tie-Yan Liu
MQ
182
92
0
30 May 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
1.1K
2,701
0
31 Dec 2020
Evaluating Efficient Performance Estimators of Neural Architectures
Evaluating Efficient Performance Estimators of Neural ArchitecturesNeural Information Processing Systems (NeurIPS), 2020
Xuefei Ning
Changcheng Tang
Wenshuo Li
Zixuan Zhou
Shuang Liang
Huazhong Yang
Yu Wang
679
89
0
07 Aug 2020
LogiQA: A Challenge Dataset for Machine Reading Comprehension with
  Logical Reasoning
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning
Jian Liu
Leyang Cui
Hanmeng Liu
Dandan Huang
Yile Wang
Yue Zhang
RALM
293
445
0
16 Jul 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language
  Processing
HAT: Hardware-Aware Transformers for Efficient Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Hanrui Wang
Zhanghao Wu
Zhijian Liu
Han Cai
Ligeng Zhu
Chuang Gan
Song Han
317
284
0
28 May 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.3K
55,939
0
28 May 2020
ReZero is All You Need: Fast Convergence at Large Depth
ReZero is All You Need: Fast Convergence at Large DepthConference on Uncertainty in Artificial Intelligence (UAI), 2020
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
497
362
0
10 Mar 2020
Transformers without Tears: Improving the Normalization of
  Self-Attention
Transformers without Tears: Improving the Normalization of Self-AttentionInternational Workshop on Spoken Language Translation (IWSLT), 2019
Toan Q. Nguyen
Julian Salazar
370
260
0
14 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
3.4K
9,369
0
02 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language RepresentationsInternational Conference on Learning Representations (ICLR), 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSLAIMat
1.5K
7,332
0
26 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language UnderstandingFindings (Findings), 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
738
2,269
0
23 Sep 2019
PubMedQA: A Dataset for Biomedical Research Question Answering
PubMedQA: A Dataset for Biomedical Research Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
1.0K
1,468
0
13 Sep 2019
Improving Deep Transformer with Depth-Scaled Initialization and Merged
  Attention
Improving Deep Transformer with Depth-Scaled Initialization and Merged AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Biao Zhang
Ivan Titov
Rico Sennrich
242
119
0
29 Aug 2019
Once-for-All: Train One Network and Specialize it for Efficient
  Deployment
Once-for-All: Train One Network and Specialize it for Efficient DeploymentInternational Conference on Learning Representations (ICLR), 2019
Han Cai
Chuang Gan
Tianzhe Wang
Zhekai Zhang
Song Han
OOD
654
1,498
0
26 Aug 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model CompressionConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
415
942
0
25 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
6.0K
28,988
0
26 Jul 2019
Learning Deep Transformer Models for Machine Translation
Learning Deep Transformer Models for Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Yang Li
Lidia S. Chao
348
759
0
05 Jun 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
HellaSwag: Can a Machine Really Finish Your Sentence?Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
871
3,831
0
19 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding SystemsNeural Information Processing Systems (NeurIPS), 2019
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
914
2,704
0
02 May 2019
ProxylessNAS: Direct Neural Architecture Search on Target Task and
  Hardware
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Han Cai
Ligeng Zhu
Song Han
740
2,023
0
02 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
3.1K
112,182
0
11 Oct 2018
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book
  Question Answering
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov
Peter Clark
Tushar Khot
Ashish Sabharwal
1.3K
2,240
0
08 Sep 2018
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
8.2K
171,167
0
12 Jun 2017
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
4.1K
224,064
0
10 Dec 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
934
23,444
0
09 Mar 2015
One Billion Word Benchmark for Measuring Progress in Statistical
  Language Modeling
One Billion Word Benchmark for Measuring Progress in Statistical Language ModelingInterspeech (Interspeech), 2013
Ciprian Chelba
Tomas Mikolov
M. Schuster
Qi Ge
T. Brants
P. Koehn
T. Robinson
710
1,168
0
11 Dec 2013
1
Page 1 of 1