ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.09762
  4. Cited By
Diversifying the Mixture-of-Experts Representation for Language Models
  with Orthogonal Optimizer

Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer

15 October 2023
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
    MoE
ArXivPDFHTML

Papers citing "Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer"

12 / 12 papers shown
Title
FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks
FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks
Wenjing Xiao
Wenhao Song
Miaojiang Chen
Ruikun Luo
Min Chen
MoE
44
0
0
29 Apr 2025
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models
Andy Zhou
MoMe
89
0
0
13 Mar 2025
Spatio-Temporal Multi-Subgraph GCN for 3D Human Motion Prediction
Jiexin Wang
Yiju Guo
Bing-Huang Su
3DH
45
0
0
03 Jan 2025
Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection
Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection
Yaning Zhang
Qiufu Li
Zitong Yu
L. Shen
ViT
40
3
0
31 Dec 2024
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the
  Generative Artificial Intelligence (AI) Research Landscape
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
79
46
0
18 Dec 2023
Establishing Vocabulary Tests as a Benchmark for Evaluating Large
  Language Models
Establishing Vocabulary Tests as a Benchmark for Evaluating Large Language Models
Gonzalo Martínez
Javier Conde
Elena Merino-Gómez
Beatriz Bermúdez-Margaretto
José Alberto Hernández
Pedro Reviriego
Marc Brysbaert
ELM
20
1
0
23 Oct 2023
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning
  Rate and Momentum for Training Deep Neural Networks
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks
Hao Sun
Li Shen
Qihuang Zhong
Liang Ding
Shi-Yong Chen
Jingwei Sun
Jing Li
Guangzhong Sun
Dacheng Tao
41
31
0
01 Mar 2023
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language
  Understanding and Generation
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
29
26
0
30 May 2022
Mixture-of-Experts with Expert Choice Routing
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
147
323
0
18 Feb 2022
Achieving Forgetting Prevention and Knowledge Transfer in Continual
  Learning
Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning
Zixuan Ke
Bing-Quan Liu
Nianzu Ma
Hu Xu
Lei Shu
CLL
173
121
0
05 Dec 2021
Understanding and Improving Lexical Choice in Non-Autoregressive
  Translation
Understanding and Improving Lexical Choice in Non-Autoregressive Translation
Liang Ding
Longyue Wang
Xuebo Liu
Derek F. Wong
Dacheng Tao
Zhaopeng Tu
91
76
0
29 Dec 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1