Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.17276
Cited By
PanGu-
π
π
π
: Enhancing Language Model Architectures via Nonlinearity Compensation
27 December 2023
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
Ying Nie
Xutao Wang
Hailin Hu
Zheyuan Bai
Yun Wang
Fangcheng Liu
Zhicheng Liu
Jianyuan Guo
Sinan Zeng
Yinchen Zhang
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation"
14 / 14 papers shown
Title
ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking
Wenshuo Li
Xinghao Chen
Han Shu
Yehui Tang
Yunhe Wang
MQ
26
2
0
17 Jun 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
17
11
0
25 Jan 2024
DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning
Wei Chen
Qiushi Wang
Zefei Long
Xianyin Zhang
Zhongtian Lu
...
Siyuan Wang
Jiarong Xu
Xiang Bai
Xuanjing Huang
Zhongyu Wei
68
43
0
23 Oct 2023
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Xiaozhe Ren
Pingyi Zhou
Xinfan Meng
Xinjing Huang
Yadao Wang
...
Jiansheng Wei
Xin Jiang
Teng Su
Qun Liu
Jun Yao
ALM
MoE
53
59
0
20 Mar 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
240
1,070
0
05 Oct 2022
CSL: A Large-scale Chinese Scientific Literature Dataset
Yudong Li
Yuqing Zhang
Zhe Zhao
Lin-cheng Shen
Weijie Liu
Weiquan Mao
Hui Zhang
AILaw
121
50
0
12 Sep 2022
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
127
94
0
01 Jul 2022
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
186
1,148
0
05 Oct 2021
CMT: Convolutional Neural Networks Meet Vision Transformers
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Chunjing Xu
Yunhe Wang
Chang Xu
ViT
325
500
0
13 Jul 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,554
0
04 May 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
282
1,490
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
1,982
0
28 Jul 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1