ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.04434
  4. Cited By
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
  Language Model

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

7 May 2024
DeepSeek-AI
Aixin Liu
Bei Feng
Bin Wang
Bingxuan Wang
Bo Liu
Chenggang Zhao
Chengqi Dengr
Chong Ruan
Damai Dai
Daya Guo
Dejian Yang
Deli Chen
Dongjie Ji
Erhang Li
Fangyun Lin
Fuli Luo
Guangbo Hao
Guanting Chen
Guowei Li
Hai-Tao Zhang
Hanwei Xu
Hao-Yu Yang
Haowei Zhang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Li
Hui Qu
J. L. Cai
Jian Liang
Jianzhong Guo
Jiaqi Ni
Jiashi Li
Jin Chen
Jingyang Yuan
Junjie Qiu
Junxiao Song
Kai Dong
Kaige Gao
Kang Guan
Lean Wang
Lecong Zhang
Lei Xu
Leyi Xia
Liang Zhao
Liyue Zhang
Meng Li
Miaojun Wang
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Mingming Li
Ning Tian
Panpan Huang
Peiyi Wang
Peng Zhang
Qihao Zhu
Qinyu Chen
Qiushi Du
R. J. Chen
R. L. Jin
Ruiqi Ge
Ruizhe Pan
Runxin Xu
Ruyi Chen
S. S. Li
Shanghao Lu
Shangyan Zhou
Shanhuang Chen
Shaoqing Wu
Shengfeng Ye
Shirong Ma
Shiyu Wang
Shuang Zhou
Shuiping Yu
Shunfeng Zhou
Size Zheng
Tao Wang
Tian Pei
Tian Yuan
Tianyu Sun
W. L. Xiao
Wangding Zeng
Wei An
Wen Liu
Wenfeng Liang
Wenjun Gao
Wentao Zhang
X. Q. Li
Xiangyue Jin
Xianzu Wang
Xiao Bi
Xiaodong Liu
Xiaohan Wang
Xiaojin Shen
Xiaokang Chen
Xiaosha Chen
Xiaotao Nie
Xiaowen Sun
Xiaoxiang Wang
Xin Liu
Xin Xie
Xingkai Yu
Xinnan Song
Xinyi Zhou
Xinyu Yang
Xuan Lu
Xuecheng Su
Ying Wu
Y. K. Li
Y. X. Wei
Y. X. Zhu
Yanhong Xu
Yanping Huang
Yao Li
Yao-Min Zhao
Yaofeng Sun
Yaohui Li
Yaohui Wang
Yi Zheng
Yichao Zhang
Yiliang Xiong
Yilong Zhao
Ying He
Ying Tang
Yishi Piao
Yixin Dong
Yixuan Tan
Yiyuan Liu
Yongji Wang
Yongqiang Guo
Yuchen Zhu
Yuduan Wang
Yuheng Zou
Yukun Zha
Yunxian Ma
Yuting Yan
Yuxiang You
Yuxuan Liu
Z. Z. Ren
Zehui Ren
Zhangli Sha
Zhe Fu
Zhen Huang
Zhen Zhang
Zhenda Xie
Zhewen Hao
Zhihong Shao
Zhiniu Wen
Zhipeng Xu
Zhongyu Zhang
Zhuoshu Li
Zihan Wang
Zihui Gu
Zilin Li
Ziwei Xie
    MoE
ArXivPDFHTML

Papers citing "DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model"

50 / 74 papers shown
Title
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
57
0
0
09 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Y. Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
37
0
0
09 May 2025
Faster MoE LLM Inference for Extremely Large Models
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
56
0
0
06 May 2025
Beyond the model: Key differentiators in large language models and multi-agent services
Beyond the model: Key differentiators in large language models and multi-agent services
Muskaan Goyal
Pranav Bhasin
LLMAG
ELM
34
0
0
05 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
52
0
0
04 May 2025
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
Xing Hu
Zhixuan Chen
Dawei Yang
Zukang Xu
Chen Xu
Zhihang Yuan
Sifan Zhou
Jiangyong Yu
MoE
MQ
25
0
0
02 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
K. Zhang
Lizhuang Ma
J. Wang
J. Wang
W. Zhang
MQ
51
0
0
01 May 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
71
0
0
29 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
X. Li
MLLM
68
0
0
29 Apr 2025
Efficient Pretraining Length Scaling
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
37
0
0
21 Apr 2025
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
Dennis Liu
Zijie Yan
Xin Yao
Tong Liu
V. Korthikanti
...
Jiajie Yao
Chandler Zhou
David Wu
Xipeng Li
J. Yang
MoE
52
0
0
21 Apr 2025
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Ashwinee Panda
Vatsal Baherwani
Zain Sarwar
Benjamin Thérien
Supriyo Chakraborty
Tom Goldstein
MoE
34
0
0
16 Apr 2025
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
Longguang Zhong
Fanqi Wan
Ziyi Yang
Guosheng Liang
Tianyuan Shi
Xiaojun Quan
MoMe
53
0
0
09 Apr 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
39
1
0
03 Apr 2025
Cognitive Memory in Large Language Models
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAG
KELM
60
1
0
03 Apr 2025
Rethinking industrial artificial intelligence: a unified foundation framework
Rethinking industrial artificial intelligence: a unified foundation framework
Jay Lee
Hanqi Su
AI4CE
36
1
0
02 Apr 2025
TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification
TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification
Junnan Zhu
Min Xiao
Yining Wang
Feifei Zhai
Yu Zhou
Chengqing Zong
55
0
0
19 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
H. Zhang
Jun Wang
64
0
0
15 Mar 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
46
0
0
14 Mar 2025
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Neusha Javidnia
B. Rouhani
F. Koushanfar
47
0
0
14 Mar 2025
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges
Xiaoxiao Liu
Qingying Xiao
Junying Chen
Xiangyi Feng
Xiangbo Wu
...
Xiang Wan
Jian Chang
Guangjun Yu
Yan Hu
Benyou Wang
LM&MA
LRM
56
0
0
11 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
C. Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
100
2
0
07 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
55
1
0
07 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
46
0
0
06 Mar 2025
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Yan Li
Pengfei Zheng
Shuang Chen
Zewei Xu
Yuanhao Lai
Yunfei Du
Z. Wang
MoE
50
0
0
06 Mar 2025
PanguIR Technical Report for NTCIR-18 AEOLLM Task
Lang Mei
Chong Chen
Jiaxin Mao
ALM
45
1
0
04 Mar 2025
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
Xujie Yuan
Y. Liu
Shimin Di
Shiwen Wu
Libin Zheng
Rui Meng
Lei Chen
Xiaofang Zhou
Jian Yin
33
0
0
28 Feb 2025
END: Early Noise Dropping for Efficient and Effective Context Denoising
END: Early Noise Dropping for Efficient and Effective Context Denoising
Hongye Jin
Pei Chen
Jingfeng Yang
Z. Wang
Meng-Long Jiang
...
X. Zhang
Zheng Li
Tianyi Liu
Huasheng Li
Bing Yin
54
0
0
26 Feb 2025
UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering
UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering
L. Liu
Shilei Liu
Yujin Yuan
Y. Zhang
Bencheng Yan
...
Di Wang
Wenbo Su
Pengjie Wang
Jian Xu
Bo Zheng
42
1
0
26 Feb 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Siyuan Zhang
Y. Zhang
Yinpeng Dong
Hang Su
HILM
KELM
82
0
0
26 Feb 2025
An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law
An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law
Songpei Xu
Shijia Wang
Da Guo
Xianwen Guo
Qiang Xiao
Fangjian Li
Chuanjiang Luo
71
0
0
17 Feb 2025
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity
Dylan Zhang
Justin Wang
Tianran Sun
36
0
0
17 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoE
AI4CE
54
0
0
13 Feb 2025
Position: AI Scaling: From Up to Down and Out
Position: AI Scaling: From Up to Down and Out
Yunke Wang
Yanxi Li
Chang Xu
HAI
68
1
0
02 Feb 2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu
Qiyun Xu
Tong Xiao
Tianhao Chen
Yuchen Yan
Jiaxin Zhang
Shizhe Diao
Can Yang
Yang Wang
ELM
LRM
AI4CE
81
2
0
01 Feb 2025
StringLLM: Understanding the String Processing Capability of Large Language Models
StringLLM: Understanding the String Processing Capability of Large Language Models
Xilong Wang
Hao Fu
Jindong Wang
Neil Zhenqiang Gong
49
0
0
28 Jan 2025
Panoramic Interests: Stylistic-Content Aware Personalized Headline Generation
Panoramic Interests: Stylistic-Content Aware Personalized Headline Generation
Junhong Lian
Xiang Ao
Xinyu Liu
Yang Liu
Qing He
32
0
0
21 Jan 2025
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
Xueyan Li
Xinyan Chen
Yazhe Niu
Shuai Hu
Yu Liu
OffRL
53
3
0
17 Jan 2025
Tensor Product Attention Is All You Need
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
62
8
0
11 Jan 2025
Scaling Laws for Floating Point Quantization Training
Scaling Laws for Floating Point Quantization Training
X. Sun
Shuaipeng Li
Ruobing Xie
Weidong Han
Kan Wu
...
Yangyu Tao
Zhanhui Kang
C. Xu
Di Wang
Jie Jiang
MQ
AIFin
53
0
0
05 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
76
207
0
03 Jan 2025
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Shanghaoran Quan
Jiaxi Yang
Bowen Yu
Bo Zheng
Dayiheng Liu
...
Zeyu Cui
Yang Fan
Y. Zhang
Binyuan Hui
Junyang Lin
ALM
ELM
LRM
64
13
0
02 Jan 2025
BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters
BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters
Ting Bai
Jiazheng Kang
Jiayang Fan
AI4CE
29
2
0
28 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
J. Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
111
6
0
05 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
89
5
0
04 Dec 2024
Yi-Lightning Technical Report
Yi-Lightning Technical Report
01. AI
:
Alan Wake
Albert Wang
Bei Chen
...
Yuxuan Sha
Zhaodong Yan
Zhiyuan Liu
Zirui Zhang
Zonghong Dai
OSLM
94
3
0
02 Dec 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Y. Wang
Gangshan Wu
Tong He
Limin Wang
83
2
0
21 Nov 2024
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen
Zizheng Huang
Y. Hong
Yanshuo Wang
Zhongcai Lyu
Zhuoer Xu
Jun Lan
Zhangxuan Gu
VLM
37
0
0
18 Nov 2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun
Li-Wen Chang
Wenlei Bao
Size Zheng
Ningxin Zheng
Xin Liu
Harry Dong
Yuejie Chi
Beidi Chen
VLM
78
16
0
28 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
L. Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
44
3
0
24 Oct 2024
12
Next