ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.10684
  4. Cited By
Efficient Large Scale Language Modeling with Mixtures of Experts

Efficient Large Scale Language Modeling with Mixtures of Experts

20 December 2021
Mikel Artetxe
Shruti Bhosale
Naman Goyal
Todor Mihaylov
Myle Ott
Sam Shleifer
Xi Victoria Lin
Jingfei Du
Srini Iyer
Ramakanth Pasunuru
Giridhar Anantharaman
Xian Li
Shuohui Chen
H. Akın
Mandeep Baines
Louis Martin
Xing Zhou
Punit Singh Koura
Brian O'Horo
Jeff Wang
Luke Zettlemoyer
Mona T. Diab
Zornitsa Kozareva
Ves Stoyanov
    MoE
ArXivPDFHTML

Papers citing "Efficient Large Scale Language Modeling with Mixtures of Experts"

50 / 145 papers shown
Title
Backdoor Attacks Against Patch-based Mixture of Experts
Backdoor Attacks Against Patch-based Mixture of Experts
Cedric Chan
Jona te Lintelo
S. Picek
AAML
MoE
96
0
0
03 May 2025
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
Tam Minh Nguyen
Ngoc N. Tran
Khai Nguyen
Richard G. Baraniuk
MoE
59
0
0
01 May 2025
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
77
0
0
26 Apr 2025
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
Dennis Liu
Zijie Yan
Xin Yao
Tong Liu
V. Korthikanti
...
Jiajie Yao
Chandler Zhou
David Wu
Xipeng Li
J. Yang
MoE
56
0
0
21 Apr 2025
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
Beichen Huang
Yueming Yuan
Zelei Shao
Minjia Zhang
MQ
MoE
37
0
0
03 Apr 2025
SCE: Scalable Consistency Ensembles Make Blackbox Large Language Model Generation More Reliable
Jiaxin Zhang
Z. Li
Wendi Cui
Kamalika Das
Bradley Malin
Sricharan Kumar
41
0
0
13 Mar 2025
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Yan Li
Pengfei Zheng
Shuang Chen
Zewei Xu
Yuanhao Lai
Yunfei Du
Z. Wang
MoE
95
0
0
06 Mar 2025
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
Gyeongman Kim
Gyouk Chu
Eunho Yang
MoE
54
0
0
18 Feb 2025
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving
Hanfei Yu
Xingqi Cui
H. M. Zhang
H. Wang
Hao Wang
MoE
54
0
0
07 Feb 2025
Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction
Kritarth Prasad
Mohammadi Zaki
Pratik Rakesh Singh
Pankaj Wasnik
31
0
0
28 Jan 2025
HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages
HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages
Aman Chaturvedi
Daniel Nichols
Siddharth Singh
A. Bhatele
82
1
0
19 Dec 2024
HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting
Shaohan Yu
Pan Deng
Yu Zhao
J. Liu
Ziáng Wang
MoE
131
0
0
30 Nov 2024
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive
  Hashing
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
Xiaonan Nie
Qibin Liu
Fangcheng Fu
Shenhan Zhu
Xupeng Miao
X. Li
Y. Zhang
Shouda Liu
Bin Cui
MoE
21
1
0
13 Nov 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the
  Hints from Its Router
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Yanyue Xie
Zhi Zhang
Ding Zhou
Cong Xie
Ziang Song
Xin Liu
Yanzhi Wang
Xue Lin
An Xu
LLMAG
30
3
0
15 Oct 2024
Data Selection via Optimal Control for Language Models
Data Selection via Optimal Control for Language Models
Yuxian Gu
Li Dong
Hongning Wang
Y. Hao
Qingxiu Dong
Furu Wei
Minlie Huang
AI4CE
48
4
0
09 Oct 2024
Exploring the Benefit of Activation Sparsity in Pre-training
Exploring the Benefit of Activation Sparsity in Pre-training
Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin
Zhiyuan Zeng
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
58
3
0
04 Oct 2024
No Need to Talk: Asynchronous Mixture of Language Models
No Need to Talk: Asynchronous Mixture of Language Models
Anastasiia Filippova
Angelos Katharopoulos
David Grangier
Ronan Collobert
MoE
33
0
0
04 Oct 2024
Spatial-Temporal Mixture-of-Graph-Experts for Multi-Type Crime
  Prediction
Spatial-Temporal Mixture-of-Graph-Experts for Multi-Type Crime Prediction
Ziyang Wu
Fan Liu
Jindong Han
Yuxuan Liang
Hao Liu
18
2
0
24 Sep 2024
Small Language Models: Survey, Measurements, and Insights
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu
Xiang Li
Dongqi Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
ObjD
LRM
51
36
0
24 Sep 2024
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
Maryam Akhavan Aghdam
Hongpeng Jin
Yanzhao Wu
MoE
16
3
0
10 Sep 2024
Duplex: A Device for Large Language Models with Mixture of Experts,
  Grouped Query Attention, and Continuous Batching
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
Sungmin Yun
Kwanhee Kyung
Juhwan Cho
Jaewan Choi
Jongmin Kim
Byeongho Kim
Sukhan Lee
Kyomin Sohn
Jung Ho Ahn
MoE
36
5
0
02 Sep 2024
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware
  Experts
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Xi Victoria Lin
Akshat Shrivastava
Liang Luo
Srinivasan Iyer
Mike Lewis
Gargi Gosh
Luke Zettlemoyer
Armen Aghajanyan
MoE
38
20
0
31 Jul 2024
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for
  Robot Learning
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning
Yixiao Wang
Yifei Zhang
Mingxiao Huo
Ran Tian
Xiang Zhang
...
Chenfeng Xu
Pengliang Ji
Wei Zhan
Mingyu Ding
M. Tomizuka
MoE
36
18
0
01 Jul 2024
Towards Comprehensive Preference Data Collection for Reward Modeling
Towards Comprehensive Preference Data Collection for Reward Modeling
Yulan Hu
Qingyang Li
Sheng Ouyang
Ge Chen
Kaihui Chen
Lijun Mei
Xucheng Ye
Fuzheng Zhang
Yong Liu
SyDa
32
4
0
24 Jun 2024
A Resource-Adaptive Approach for Federated Learning under
  Resource-Constrained Environments
A Resource-Adaptive Approach for Federated Learning under Resource-Constrained Environments
Ruirui Zhang
Xingze Wu
Yifei Zou
Zhenzhen Xie
Peng Li
Xiuzhen Cheng
Dongxiao Yu
FedML
21
0
0
19 Jun 2024
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with
  Sparse Mixture-of-Experts
MoE-RBench\texttt{MoE-RBench}MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
62
5
0
17 Jun 2024
Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion
  Prediction
Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction
Wenzhao Jiang
Jindong Han
Hao Liu
Tao Tao
Naiqiang Tan
Hui Xiong
MoE
29
8
0
14 Jun 2024
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts
  Language Models
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
Tianwen Wei
Bo Zhu
Liang Zhao
Cheng Cheng
Biye Li
...
Yutuan Ma
Rui Hu
Shuicheng Yan
Han Fang
Yahui Zhou
MoE
41
24
0
03 Jun 2024
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive
  Language Model Pre-training
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Zexuan Zhong
Mengzhou Xia
Danqi Chen
Mike Lewis
MoE
49
15
0
06 May 2024
Enabling Natural Zero-Shot Prompting on Encoder Models via
  Statement-Tuning
Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning
A. Elshabrawy
Yongix Huang
Iryna Gurevych
Alham Fikri Aji
27
0
0
19 Apr 2024
MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face
  Forgery Detection
MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection
Chenqi Kong
Anwei Luo
Song Xia
Yi Yu
Haoliang Li
Zengwei Zheng
Shiqi Wang
Alex C. Kot
MoE
31
5
0
12 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating
  Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
48
7
0
07 Apr 2024
DiPaCo: Distributed Path Composition
DiPaCo: Distributed Path Composition
Arthur Douillard
Qixuang Feng
Andrei A. Rusu
A. Kuncoro
Yani Donchev
Rachita Chhaparia
Ionel Gog
MarcÁurelio Ranzato
Jiajun Shen
Arthur Szlam
MoE
35
2
0
15 Mar 2024
Language models scale reliably with over-training and on downstream
  tasks
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
103
40
0
13 Mar 2024
A Question-centric Multi-experts Contrastive Learning Framework for
  Improving the Accuracy and Interpretability of Deep Sequential Knowledge
  Tracing Models
A Question-centric Multi-experts Contrastive Learning Framework for Improving the Accuracy and Interpretability of Deep Sequential Knowledge Tracing Models
Hengyuan Zhang
Zitao Liu
Chenming Shang
Dawei Li
Yong Jiang
AI4Ed
39
8
0
12 Mar 2024
Tuning-Free Accountable Intervention for LLM Deployment -- A
  Metacognitive Approach
Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach
Zhen Tan
Jie Peng
Tianlong Chen
Huan Liu
16
6
0
08 Mar 2024
How does Architecture Influence the Base Capabilities of Pre-trained
  Language Models? A Case Study Based on FFN-Wider Transformer Models
How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider Transformer Models
Xin Lu
Yanyan Zhao
Bing Qin
31
0
0
04 Mar 2024
Not All Layers of LLMs Are Necessary During Inference
Not All Layers of LLMs Are Necessary During Inference
Siqi Fan
Xin Jiang
Xiang Li
Xuying Meng
Peng Han
Shuo Shang
Aixin Sun
Yequan Wang
Zhongyuan Wang
44
32
0
04 Mar 2024
Vanilla Transformers are Transfer Capability Teachers
Vanilla Transformers are Transfer Capability Teachers
Xin Lu
Yanyan Zhao
Bing Qin
MoE
28
0
0
04 Mar 2024
Large Language Models for Data Annotation: A Survey
Large Language Models for Data Annotation: A Survey
Zhen Tan
Dawei Li
Song Wang
Alimohammad Beigi
Bohan Jiang
Amrita Bhattacharjee
Mansooreh Karami
Jundong Li
Lu Cheng
Huan Liu
SyDa
42
46
0
21 Feb 2024
HyperMoE: Towards Better Mixture of Experts via Transferring Among
  Experts
HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
Hao Zhao
Zihan Qiu
Huijia Wu
Zili Wang
Zhaofeng He
Jie Fu
MoE
30
9
0
20 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
36
46
0
15 Feb 2024
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
  LLMs
ReLU2^22 Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang
Yixin Song
Guanghui Yu
Xu Han
Yankai Lin
Chaojun Xiao
Chenyang Song
Zhiyuan Liu
Zeyu Mi
Maosong Sun
20
31
0
06 Feb 2024
Plan-Grounded Large Language Models for Dual Goal Conversational
  Settings
Plan-Grounded Large Language Models for Dual Goal Conversational Settings
Diogo Glória-Silva
Rafael Ferreira
Diogo Tavares
David Semedo
João Magalhães
LLMAG
23
4
0
01 Feb 2024
Contextual Feature Extraction Hierarchies Converge in Large Language
  Models and the Brain
Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain
Gavin Mischler
Yinghao Aaron Li
Stephan Bickel
A. Mehta
N. Mesgarani
17
23
0
31 Jan 2024
Checkmating One, by Using Many: Combining Mixture of Experts with MCTS
  to Improve in Chess
Checkmating One, by Using Many: Combining Mixture of Experts with MCTS to Improve in Chess
Felix Helfenstein
Jannis Blüml
Johannes Czech
Kristian Kersting
16
0
0
30 Jan 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
15
87
0
29 Jan 2024
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
Leyang Xue
Yao Fu
Zhan Lu
Luo Mai
Mahesh Marina
MoE
8
4
0
25 Jan 2024
Exploiting Inter-Layer Expert Affinity for Accelerating
  Mixture-of-Experts Model Inference
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
Jinghan Yao
Quentin G. Anthony
A. Shafi
Hari Subramoni
Dhabaleswar K.
Panda
MoE
18
13
0
16 Jan 2024
123
Next