ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.06905
  4. Cited By
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

13 December 2021
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
Yuanzhong Xu
M. Krikun
Yanqi Zhou
Adams Wei Yu
Orhan Firat
Barret Zoph
Liam Fedus
Maarten Bosma
Zongwei Zhou
Tao Wang
Yu Emma Wang
Kellie Webster
Marie Pellat
Kevin Robinson
Kathy Meier-Hellstern
Toju Duke
Lucas Dixon
Kun Zhang
Quoc V. Le
Yonghui Wu
Z. Chen
Claire Cui
    ALM
    MoE
ArXivPDFHTML

Papers citing "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts"

50 / 141 papers shown
Title
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs
Quang H. Nguyen
Duy C. Hoang
Juliette Decugis
Saurav Manchanda
Nitesh V. Chawla
Khoa D. Doan
Khoa D. Doan
37
6
0
15 Jul 2024
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts
  Approach
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach
Haoyu Han
Juanhui Li
Wei Huang
Xianfeng Tang
Hanqing Lu
Chen Luo
Hui Liu
Jiliang Tang
38
5
0
05 Jun 2024
Ensembling Diffusion Models via Adaptive Feature Aggregation
Ensembling Diffusion Models via Adaptive Feature Aggregation
Cong Wang
Kuan Tian
Yonghang Guan
Jun Zhang
Zhiwei Jiang
Fei Shen
Xiao Han
34
5
0
27 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
34
28
0
18 May 2024
From Matching to Generation: A Survey on Generative Information Retrieval
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
67
46
0
23 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive
  Review and Analysis of Paradigms and Fine-Tuning Strategies
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
35
7
0
13 Apr 2024
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Yikang Shen
Zhen Guo
Tianle Cai
Zengyi Qin
MoE
ALM
33
26
0
11 Apr 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
37
62
0
25 Mar 2024
Balanced Data Sampling for Language Model Training with Clustering
Balanced Data Sampling for Language Model Training with Clustering
Yunfan Shao
Linyang Li
Zhaoye Fei
Hang Yan
Dahua Lin
Xipeng Qiu
29
8
0
22 Feb 2024
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori
Tian Tang
Yile Gu
Kan Zhu
Baris Kasikci
63
20
0
10 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
120
362
0
09 Feb 2024
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large
  Models
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models
Yi Zhao
Yilin Zhang
Rong Xiang
Jing Li
Hillming Li
31
16
0
29 Jan 2024
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
38
29
0
19 Dec 2023
TinyGSM: achieving >80% on GSM8k with small language models
TinyGSM: achieving >80% on GSM8k with small language models
Bingbin Liu
Sébastien Bubeck
Ronen Eldan
Janardhan Kulkarni
Yuanzhi Li
Anh Nguyen
Rachel A. Ward
Yi Zhang
ALM
19
47
0
14 Dec 2023
Enhancing Molecular Property Prediction via Mixture of Collaborative
  Experts
Enhancing Molecular Property Prediction via Mixture of Collaborative Experts
Xu Yao
Shuang Liang
Songqiao Han
Hailiang Huang
19
4
0
06 Dec 2023
Mixture of Weak & Strong Experts on Graphs
Mixture of Weak & Strong Experts on Graphs
Hanqing Zeng
Hanjia Lyu
Diyi Hu
Yinglong Xia
Jiebo Luo
23
3
0
09 Nov 2023
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Simon Lermen
Charlie Rogers-Smith
Jeffrey Ladish
ALM
15
80
0
31 Oct 2023
Image Clustering Conditioned on Text Criteria
Image Clustering Conditioned on Text Criteria
Sehyun Kwon
Jaeseung Park
Minkyu Kim
Jaewoong Cho
Ernest K. Ryu
Kangwook Lee
VLM
34
11
0
27 Oct 2023
A General Theory for Softmax Gating Multinomial Logistic Mixture of
  Experts
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Huy Nguyen
Pedram Akbarian
TrungTin Nguyen
Nhat Ho
23
10
0
22 Oct 2023
Farzi Data: Autoregressive Data Distillation
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva
Zexue He
Wang-Cheng Kang
Jianmo Ni
D. Cheng
Julian McAuley
DD
19
3
0
15 Oct 2023
Diversifying the Mixture-of-Experts Representation for Language Models
  with Orthogonal Optimizer
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
34
7
0
15 Oct 2023
Adaptivity and Modularity for Efficient Generalization Over Task
  Complexity
Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Samira Abnar
Omid Saremi
Laurent Dinh
Shantel Wilson
Miguel Angel Bautista
...
Vimal Thilak
Etai Littwin
Jiatao Gu
Josh Susskind
Samy Bengio
32
5
0
13 Oct 2023
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Boxin Wang
Wei Ping
Lawrence C. McAfee
Peng-Tao Xu
Bo Li
M. Shoeybi
Bryan Catanzaro
RALM
16
45
0
11 Oct 2023
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language
  Models
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models
Ahmad Faiz
S. Kaneda
Ruhan Wang
Rita Osi
Parteek Sharma
Fan Chen
Lei Jiang
23
56
0
25 Sep 2023
Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI
Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI
Dustin Wright
Christian Igel
Gabrielle Samuel
Raghavendra Selvan
27
15
0
05 Sep 2023
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Yihua Zhang
Ruisi Cai
Tianlong Chen
Guanhua Zhang
Huan Zhang
Pin-Yu Chen
Shiyu Chang
Zhangyang Wang
Sijia Liu
MoE
AAML
OOD
30
16
0
19 Aug 2023
Large Language Models and Foundation Models in Smart Agriculture:
  Basics, Opportunities, and Challenges
Large Language Models and Foundation Models in Smart Agriculture: Basics, Opportunities, and Challenges
Jiajia Li
Mingle Xu
Lirong Xiang
Dong Chen
Weichao Zhuang
Xunyuan Yin
Zhao Li
25
3
0
13 Aug 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
26
1
0
31 Jul 2023
Generator-Retriever-Generator Approach for Open-Domain Question
  Answering
Generator-Retriever-Generator Approach for Open-Domain Question Answering
Abdelrahman Abdallah
Adam Jatowt
RALM
20
10
0
21 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
88
10,977
0
18 Jul 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
44
187
0
29 May 2023
Pre-training Multi-task Contrastive Learning Models for Scientific
  Literature Understanding
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding
Yu Zhang
Hao Cheng
Zhihong Shen
Xiaodong Liu
Yejiang Wang
Jianfeng Gao
19
13
0
23 May 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
27
19
0
20 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through
  the Lens of Verification and Validation
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
27
81
0
19 May 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Sang Michael Xie
Hieu H. Pham
Xuanyi Dong
Nan Du
Hanxiao Liu
Yifeng Lu
Percy Liang
Quoc V. Le
Tengyu Ma
Adams Wei Yu
MoMe
MoE
31
174
0
17 May 2023
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
62
1,142
0
17 May 2023
Language Models can Solve Computer Tasks
Language Models can Solve Computer Tasks
Geunwoo Kim
Pierre Baldi
Stephen Marcus McAleer
LLMAG
LM&Ro
40
338
0
30 Mar 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLM
VLM
23
23
0
29 Mar 2023
On the Creativity of Large Language Models
On the Creativity of Large Language Models
Giorgio Franceschelli
Mirco Musolesi
69
51
0
27 Mar 2023
Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable
  Reward Function
Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable Reward Function
A. B. Siddique
M. H. Maqbool
Kshitija Taywade
H. Foroosh
24
12
0
24 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on
  Tasks and Challenges
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
24
4
0
04 Mar 2023
Massively Multilingual Shallow Fusion with Large Language Models
Massively Multilingual Shallow Fusion with Large Language Models
Ke Hu
Tara N. Sainath
Bo-wen Li
Nan Du
Yanping Huang
Andrew M. Dai
Yu Zhang
Rodrigo Cabrera
Z. Chen
Trevor Strohman
27
13
0
17 Feb 2023
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on
  Production AI Platform
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
Shiwei Zhang
Lansong Diao
Siyu Wang
Zongyan Cao
Yiliang Gu
Chang Si
Ziji Shi
Zhen Zheng
Chuan Wu
W. Lin
AI4CE
22
4
0
16 Feb 2023
Symbolic Discovery of Optimization Algorithms
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
50
350
0
13 Feb 2023
Multipath agents for modular multitask ML systems
Multipath agents for modular multitask ML systems
Andrea Gesmundo
18
1
0
06 Feb 2023
Learning a Fourier Transform for Linear Relative Positional Encodings in
  Transformers
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers
K. Choromanski
Shanda Li
Valerii Likhosherstov
Kumar Avinava Dubey
Shengjie Luo
Di He
Yiming Yang
Tamás Sarlós
Thomas Weingarten
Adrian Weller
23
8
0
03 Feb 2023
Lego-MT: Learning Detachable Models for Massively Multilingual Machine
  Translation
Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation
Fei Yuan
Yinquan Lu
Wenhao Zhu
Lingpeng Kong
Lei Li
Yu Qiao
Jingjing Xu
MoE
25
22
0
20 Dec 2022
Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language
  Models
Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models
Jingjing Xu
Qingxiu Dong
Hongyi Liu
Lei Li
ALM
LRM
13
1
0
20 Dec 2022
Memory-efficient NLLB-200: Language-specific Expert Pruning of a
  Massively Multilingual Machine Translation Model
Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model
Yeskendir Koishekenov
Alexandre Berard
Vassilina Nikoulina
MoE
27
29
0
19 Dec 2022
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual
  Machine Translation
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation
Maha Elbayad
Anna Y. Sun
Shruti Bhosale
MoE
46
8
0
15 Dec 2022
Previous
123
Next