Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.09762
Cited By
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
15 October 2023
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer"
12 / 12 papers shown
Title
FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks
Wenjing Xiao
Wenhao Song
Miaojiang Chen
Ruikun Luo
Min Chen
MoE
42
0
0
29 Apr 2025
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models
Andy Zhou
MoMe
87
0
0
13 Mar 2025
Spatio-Temporal Multi-Subgraph GCN for 3D Human Motion Prediction
Jiexin Wang
Yiju Guo
Bing-Huang Su
3DH
45
0
0
03 Jan 2025
Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection
Yaning Zhang
Qiufu Li
Zitong Yu
L. Shen
ViT
40
3
0
31 Dec 2024
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
79
46
0
18 Dec 2023
Establishing Vocabulary Tests as a Benchmark for Evaluating Large Language Models
Gonzalo Martínez
Javier Conde
Elena Merino-Gómez
Beatriz Bermúdez-Margaretto
José Alberto Hernández
Pedro Reviriego
Marc Brysbaert
ELM
18
1
0
23 Oct 2023
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks
Hao Sun
Li Shen
Qihuang Zhong
Liang Ding
Shi-Yong Chen
Jingwei Sun
Jing Li
Guangzhong Sun
Dacheng Tao
41
31
0
01 Mar 2023
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
29
26
0
30 May 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
145
323
0
18 Feb 2022
Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning
Zixuan Ke
Bing-Quan Liu
Nianzu Ma
Hu Xu
Lei Shu
CLL
165
121
0
05 Dec 2021
Understanding and Improving Lexical Choice in Non-Autoregressive Translation
Liang Ding
Longyue Wang
Xuebo Liu
Derek F. Wong
Dacheng Tao
Zhaopeng Tu
91
76
0
29 Dec 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1