Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization

DeepSeek-R1, the largest open-source Mixture-of-Experts (MoE) model, has demonstrated reasoning capabilities comparable to proprietary frontier models. Prior research has explored expert routing in MoE models, but findings suggest that expert selection is often token-dependent rather than semantically driven. Given DeepSeek-R1's enhanced reasoning abilities, we investigate whether its routing mechanism exhibits greater semantic specialization than previous MoE models. To explore this, we conduct two key experiments: (1) a word sense disambiguation task, where we examine expert activation patterns for words with differing senses, and (2) a cognitive reasoning analysis, where we assess DeepSeek-R1's structured thought process in an interactive task setting of DiscoveryWorld. We conclude that DeepSeek-R1's routing mechanism is more semantically aware and it engages in structured cognitive processes.
View on arXiv@article{olson2025_2502.10928, title={ Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization }, author={ Matthew Lyle Olson and Neale Ratzlaff and Musashi Hinck and Man Luo and Sungduk Yu and Chendi Xue and Vasudev Lal }, journal={arXiv preprint arXiv:2502.10928}, year={ 2025 } }