ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.10908
  4. Cited By
Unlocking Emergent Modularity in Large Language Models

Unlocking Emergent Modularity in Large Language Models

17 October 2023
Zihan Qiu
Zeyu Huang
Jie Fu
ArXivPDFHTML

Papers citing "Unlocking Emergent Modularity in Large Language Models"

9 / 9 papers shown
Title
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Z. Qiu
Zeyu Huang
Bo Zheng
Kaiyue Wen
Z. Wang
Rui Men
Ivan Titov
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
49
5
0
21 Jan 2025
Mixture of Hidden-Dimensions Transformer
Mixture of Hidden-Dimensions Transformer
Yilong Chen
Junyuan Shang
Zhengyu Zhang
Jiawei Sheng
Tingwen Liu
Shuohuan Wang
Yu Sun
Hua-Hong Wu
Haifeng Wang
MoE
68
0
0
07 Dec 2024
Post-hoc Reward Calibration: A Case Study on Length Bias
Post-hoc Reward Calibration: A Case Study on Length Bias
Zeyu Huang
Zihan Qiu
Zili Wang
Edoardo M. Ponti
Ivan Titov
38
5
0
25 Sep 2024
Layerwise Recurrent Router for Mixture-of-Experts
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
68
2
0
13 Aug 2024
Adaptive Computation Modules: Granular Conditional Computation For
  Efficient Inference
Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference
Bartosz Wójcik
Alessio Devoto
Karol Pustelnik
Pasquale Minervini
Simone Scardapane
15
5
0
15 Dec 2023
Exploiting Activation Sparsity with Dense to Dynamic-k
  Mixture-of-Experts Conversion
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion
Filip Szatkowski
Eric Elmoznino
Younesse Kaddar
Simone Scardapane
MoE
19
5
0
06 Oct 2023
Mixture of Attention Heads: Selecting Attention Heads Per Token
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
96
42
0
11 Oct 2022
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures
  of Soft Prompts
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts
Akari Asai
Mohammadreza Salehi
Matthew E. Peters
Hannaneh Hajishirzi
120
100
0
24 May 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1