ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.09811
  4. Cited By
Memory-efficient NLLB-200: Language-specific Expert Pruning of a
  Massively Multilingual Machine Translation Model

Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model

19 December 2022
Yeskendir Koishekenov
Alexandre Berard
Vassilina Nikoulina
    MoE
ArXivPDFHTML

Papers citing "Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model"

24 / 24 papers shown
Title
Team ACK at SemEval-2025 Task 2: Beyond Word-for-Word Machine Translation for English-Korean Pairs
Team ACK at SemEval-2025 Task 2: Beyond Word-for-Word Machine Translation for English-Korean Pairs
Daniel Lee
Harsh Sharma
Jieun Han
Sunny Jeong
Alice H. Oh
Vered Shwartz
46
0
0
29 Apr 2025
MoE-I$^2$: Compressing Mixture of Experts Models through Inter-Expert
  Pruning and Intra-Expert Low-Rank Decomposition
MoE-I2^22: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
Yuanlin Duan
Wenqi Jia
Miao Yin
Yu Cheng
Bo Yuan
MoE
71
4
0
01 Nov 2024
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the
  Hints from Its Router
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Yanyue Xie
Zhi Zhang
Ding Zhou
Cong Xie
Ziang Song
Xin Liu
Yanzhi Wang
Xue Lin
An Xu
LLMAG
30
3
0
15 Oct 2024
Mitigating the Language Mismatch and Repetition Issues in LLM-based
  Machine Translation via Model Editing
Mitigating the Language Mismatch and Repetition Issues in LLM-based Machine Translation via Model Editing
Weichuan Wang
Zhaoyi Li
Defu Lian
Chen Ma
Linqi Song
Ying Wei
46
5
0
09 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
36
3
0
08 Oct 2024
Efficient Technical Term Translation: A Knowledge Distillation Approach
  for Parenthetical Terminology Translation
Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation
Jiyoon Myung
Jihyeon Park
Jungki Son
Kyungro Lee
Joohyung Han
21
0
0
01 Oct 2024
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Jaeseong Lee
seung-won hwang
Aurick Qiao
Daniel F Campos
Z. Yao
Yuxiong He
18
2
0
10 Sep 2024
Investigating the potential of Sparse Mixtures-of-Experts for
  multi-domain neural machine translation
Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation
Nadezhda Chirkova
Vassilina Nikoulina
Jean-Luc Meunier
Alexandre Berard
MoE
32
0
0
01 Jul 2024
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models:
  Enhancing Performance and Reducing Inference Costs
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Enshu Liu
Junyi Zhu
Zinan Lin
Xuefei Ning
Matthew B. Blaschko
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MoE
52
5
0
01 Jul 2024
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse
  Mixture-of-Experts
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Mohammed Nowaz Rabbani Chowdhury
Meng Wang
K. E. Maghraoui
Naigang Wang
Pin-Yu Chen
Christopher Carothers
MoE
24
4
0
26 May 2024
From LLM to NMT: Advancing Low-Resource Machine Translation with Claude
From LLM to NMT: Advancing Low-Resource Machine Translation with Claude
Maxim Enis
Mark Hopkins
25
38
0
22 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive
  Review and Analysis of Paradigms and Fine-Tuning Strategies
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
30
7
0
13 Apr 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and
  Frontiers
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Yinghui Li
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
47
36
0
07 Apr 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for
  Mixture-of-Experts Large Language Models
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Qi Liu
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo-Wen Zhang
Junchi Yan
Hongsheng Li
MoE
27
25
0
22 Feb 2024
Turn Waste into Worth: Rectifying Top-$k$ Router of MoE
Turn Waste into Worth: Rectifying Top-kkk Router of MoE
Zhiyuan Zeng
Qipeng Guo
Zhaoye Fei
Zhangyue Yin
Yunhua Zhou
Linyang Li
Tianxiang Sun
Hang Yan
Dahua Lin
Xipeng Qiu
MoE
MoMe
15
4
0
17 Feb 2024
Tuning LLMs with Contrastive Alignment Instructions for Machine
  Translation in Unseen, Low-resource Languages
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages
Zhuoyuan Mao
Yen Yu
ALM
15
2
0
11 Jan 2024
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Elias Frantar
Dan Alistarh
MQ
MoE
19
24
0
25 Oct 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its
  Routing Policy
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
Pingzhi Li
Zhenyu (Allen) Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
21
33
0
02 Oct 2023
ParameterNet: Parameters Are All You Need
ParameterNet: Parameters Are All You Need
Kai Han
Yunhe Wang
Jianyuan Guo
Enhua Wu
VLM
AI4CE
11
25
0
26 Jun 2023
Towards Understanding the Generalization of Medical Text-to-SQL Models
  and Datasets
Towards Understanding the Generalization of Medical Text-to-SQL Models and Datasets
Richard Tarbell
Kim-Kwang Raymond Choo
Glenn Dietrich
Anthony Rios
11
9
0
22 Mar 2023
Shapley Head Pruning: Identifying and Removing Interference in
  Multilingual Transformers
Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers
William B. Held
Diyi Yang
VLM
30
5
0
11 Oct 2022
Beyond Distillation: Task-level Mixture-of-Experts for Efficient
  Inference
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
105
0
24 Sep 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
94
83
0
22 Sep 2021
Carbon Emissions and Large Neural Network Training
Carbon Emissions and Large Neural Network Training
David A. Patterson
Joseph E. Gonzalez
Quoc V. Le
Chen Liang
Lluís-Miquel Munguía
D. Rothchild
David R. So
Maud Texier
J. Dean
AI4CE
239
642
0
21 Apr 2021
1