ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1701.06538
  4. Cited By
Outrageously Large Neural Networks: The Sparsely-Gated
  Mixture-of-Experts Layer

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

23 January 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
    MoE
ArXivPDFHTML

Papers citing "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"

50 / 495 papers shown
Title
The Society of HiveMind: Multi-Agent Optimization of Foundation Model Swarms to Unlock the Potential of Collective Intelligence
Noah Mamie
Susie Xi Rao
LLMAG
AI4CE
51
0
0
07 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
103
2
0
07 Mar 2025
TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster
TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster
Kanghui Ning
Zijie Pan
Yu Liu
Yushan Jiang
J. Zhang
Kashif Rasul
Anderson Schneider
Lintao Ma
Yuriy Nevmyvaka
Dongjin Song
VLM
AI4TS
57
1
0
06 Mar 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
58
3
0
06 Mar 2025
VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection
Enkhtogtokh Togootogtokh
Christian Klasen
MedIm
60
0
0
05 Mar 2025
Shazam: Unifying Multiple Foundation Models for Advanced Computational Pathology
Wenhui Lei
Anqi Li
Yusheng Tan
Hanyu Chen
Xiaofan Zhang
34
0
0
02 Mar 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Zhongyang Li
Ziyue Li
Dinesh Manocha
MoE
53
0
0
27 Feb 2025
Similarity-Distance-Magnitude Universal Verification
Similarity-Distance-Magnitude Universal Verification
Allen Schmaltz
UQCV
AAML
149
0
0
27 Feb 2025
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Taishi Nakamura
Takuya Akiba
Kazuki Fujii
Yusuke Oda
Rio Yokota
Jun Suzuki
MoMe
MoE
94
1
0
26 Feb 2025
CAMEx: Curvature-aware Merging of Experts
CAMEx: Curvature-aware Merging of Experts
Dung V. Nguyen
Minh H. Nguyen
Luc Q. Nguyen
R. Teo
T. Nguyen
Linh Duy Tran
MoMe
104
2
0
26 Feb 2025
Enhancing the Scalability and Applicability of Kohn-Sham Hamiltonians for Molecular Systems
Enhancing the Scalability and Applicability of Kohn-Sham Hamiltonians for Molecular Systems
Yunyang Li
Zaishuo Xia
Lin Huang
Xinran Wei
Han Yang
...
Zun Wang
Chang-Shu Liu
Jia Zhang
Bin Shao
Mark B. Gerstein
77
0
0
26 Feb 2025
MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing
MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing
Matvey Skripkin
Elizaveta Goncharova
Dmitrii Tarasov
Andrey Kuznetsov
67
0
0
24 Feb 2025
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Raeid Saqur
Anastasis Kratsios
Florian Krach
Yannick Limmer
Jacob-Junqi Tian
John Willes
Blanka Horvath
Frank Rudzicz
MoE
53
0
0
24 Feb 2025
Yes, Q-learning Helps Offline In-Context RL
Yes, Q-learning Helps Offline In-Context RL
Denis Tarasov
Alexander Nikulin
Ilya Zisman
Albina Klepach
Andrei Polubarov
Nikita Lyubaykin
Alexander Derevyagin
Igor Kiselev
Vladislav Kurenkov
OffRL
OnRL
175
0
0
24 Feb 2025
MoMa: A Modular Deep Learning Framework for Material Property Prediction
MoMa: A Modular Deep Learning Framework for Material Property Prediction
Botian Wang
Y. Ouyang
Yaohui Li
Yuhui Wang
Haorui Cui
Jianbing Zhang
Xiaonan Wang
Wei-Ying Ma
Hao Zhou
49
0
0
21 Feb 2025
Neural Attention Search
Neural Attention Search
Difan Deng
Marius Lindauer
93
0
0
21 Feb 2025
Tight Clusters Make Specialized Experts
Tight Clusters Make Specialized Experts
Stefan K. Nielsen
R. Teo
Laziz U. Abdullaev
Tan M. Nguyen
MoE
66
2
0
21 Feb 2025
Stacking as Accelerated Gradient Descent
Stacking as Accelerated Gradient Descent
Naman Agarwal
Pranjal Awasthi
Satyen Kale
Eric Zhao
ODL
73
2
0
20 Feb 2025
Theory on Mixture-of-Experts in Continual Learning
Theory on Mixture-of-Experts in Continual Learning
Hongbo Li
Sen-Fon Lin
Lingjie Duan
Yingbin Liang
Ness B. Shroff
MoE
MoMe
CLL
153
14
0
20 Feb 2025
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Zhili Liu
Yunhao Gou
Kai Chen
Lanqing Hong
Jiahui Gao
...
Yu Zhang
Zhenguo Li
Xin Jiang
Qiang Liu
James T. Kwok
MoE
101
9
0
20 Feb 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu-Xi Cheng
KELM
75
3
0
19 Feb 2025
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Dong Wang
Haris Šikić
Lothar Thiele
O. Saukh
59
0
0
17 Feb 2025
Linear Mode Connectivity in Differentiable Tree Ensembles
Linear Mode Connectivity in Differentiable Tree Ensembles
Ryuichi Kanoh
M. Sugiyama
72
1
0
17 Feb 2025
Skill Expansion and Composition in Parameter Space
Skill Expansion and Composition in Parameter Space
Tenglong Liu
J. Li
Yinan Zheng
Haoyi Niu
Yixing Lan
Xin Xu
Xianyuan Zhan
58
4
0
09 Feb 2025
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Zhiyuan Fang
Yuegui Huang
Zicong Hong
Yufeng Lyu
Wuhui Chen
Yue Yu
Fan Yu
Zibin Zheng
MoE
48
0
0
09 Feb 2025
Importance Sampling via Score-based Generative Models
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
138
1
0
07 Feb 2025
Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models
Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models
Rui Cai
Chao Wang
Qianyi Cai
Dazhong Shen
Hui Xiong
RALM
85
0
0
06 Feb 2025
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
Haibo Tong
Zhaoyang Wang
Zhengzhang Chen
Haonian Ji
Shi Qiu
...
Peng Xia
Mingyu Ding
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
EGVM
VGen
104
2
0
03 Feb 2025
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs
Yuhang Zhou
Giannis Karamanolakis
Victor Soto
Anna Rumshisky
Mayank Kulkarni
Furong Huang
Wei Ai
Jianhua Lu
MoMe
106
0
0
03 Feb 2025
Scaling Embedding Layers in Language Models
Scaling Embedding Layers in Language Models
Da Yu
Edith Cohen
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Daogao Liu
Chiyuan Zhang
82
0
0
03 Feb 2025
Position: AI Scaling: From Up to Down and Out
Position: AI Scaling: From Up to Down and Out
Yunke Wang
Yanxi Li
Chang Xu
HAI
88
1
0
02 Feb 2025
Multilingual State Space Models for Structured Question Answering in Indic Languages
Multilingual State Space Models for Structured Question Answering in Indic Languages
A. Vats
Rahul Raja
Mrinal Mathur
Vinija Jain
Aman Chadha
70
1
0
01 Feb 2025
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Zishun Yu
Tengyu Xu
Di Jin
Karthik Abinav Sankararaman
Yun He
...
Eryk Helenowski
Chen Zhu
Sinong Wang
Hao Ma
Han Fang
LRM
54
4
0
29 Jan 2025
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar
Harshay Shah
Dan Busbridge
Alaaeldin Mohamed Elnouby Ali
J. Susskind
Vimal Thilak
MoE
LRM
39
5
0
28 Jan 2025
CSAOT: Cooperative Multi-Agent System for Active Object Tracking
CSAOT: Cooperative Multi-Agent System for Active Object Tracking
Hy Nguyen
Bao Pham
Hung Du
Srikanth Thudumu
Rajesh Vasa
K. Mouzakis
42
1
0
23 Jan 2025
SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection
SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection
Xuzhi Zhang
Zhuangzhuang Ye
GuoPing Zhao
Jianing Wang
Xiaohong Su
29
0
0
21 Jan 2025
Modality Interactive Mixture-of-Experts for Fake News Detection
Modality Interactive Mixture-of-Experts for Fake News Detection
Yifan Liu
Y. Liu
Zehan Li
Ruichen Yao
Yang Zhang
Dong Wang
MoE
36
0
0
21 Jan 2025
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
Xueyan Li
Xinyan Chen
Yazhe Niu
Shuai Hu
Yu Liu
OffRL
65
3
0
17 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
105
18
0
17 Jan 2025
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
Hanwen Zhong
Jiaxin Chen
Yutong Zhang
Di Huang
Yunhong Wang
MoE
42
0
0
12 Jan 2025
Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning
Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning
Zhongyi Zhou
Yaxin Peng
Pin Yi
Minjie Zhu
Chaomin Shen
136
0
0
04 Jan 2025
Generate to Discriminate: Expert Routing for Continual Learning
Generate to Discriminate: Expert Routing for Continual Learning
Yewon Byun
Sanket Vaibhav Mehta
Saurabh Garg
Emma Strubell
Michael Oberst
Bryan Wilder
Zachary Chase Lipton
78
0
0
31 Dec 2024
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Q. He
Jinlong Peng
P. Xu
Boyuan Jiang
Xiaobin Hu
...
Yong Liu
Yuping Wang
Chengjie Wang
Xiaomeng Li
Jianwei Zhang
DiffM
122
1
0
04 Dec 2024
Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond
Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond
Loukas Ilias
George Doukas
Vangelis Lamprou
Christos Ntanos
D. Askounis
MoE
77
1
0
04 Dec 2024
HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting
Shaohan Yu
Pan Deng
Yu Zhao
J. Liu
Ziáng Wang
MoE
186
0
0
30 Nov 2024
Task Singular Vectors: Reducing Task Interference in Model Merging
Task Singular Vectors: Reducing Task Interference in Model Merging
Antonio Andrea Gargiulo
Donato Crisostomi
Maria Sofia Bucarelli
Simone Scardapane
Fabrizio Silvestri
Emanuele Rodolà
MoMe
87
9
0
26 Nov 2024
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen
Chengyu Wang
Dakan Wang
Taolin Zhang
Wangyue Li
Xiaofeng He
KELM
83
1
0
23 Nov 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Yalin Wang
Gangshan Wu
Tong He
Limin Wang
103
2
0
21 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
200
3
0
20 Nov 2024
Collective Model Intelligence Requires Compatible Specialization
Collective Model Intelligence Requires Compatible Specialization
Jyothish Pari
Samy Jelassi
Pulkit Agrawal
MoMe
51
1
0
04 Nov 2024
Previous
12345...8910
Next