ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.12369
  4. Cited By
PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language
  Models with Auto-parallel Computation

PanGu-ααα: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

26 April 2021
Wei Zeng
Xiaozhe Ren
Teng Su
Hui Wang
Yi-Lun Liao
Zhiwei Wang
Xin Jiang
ZhenZhang Yang
Kaisheng Wang
Xiaoda Zhang
Chen Li
Ziyan Gong
Yifan Yao
Xinjing Huang
Jun Wang
Jianfeng Yu
Qiwei Guo
Yue Yu
Yan Zhang
Jin Wang
Heng Tao
Dasen Yan
Z. Yi
Fang Peng
Fan Jiang
Han Zhang
Lingfeng Deng
Yehong Zhang
Zhengping Lin
Chao Zhang
Shaojie Zhang
Mingyue Guo
Shanzhi Gu
Gaojun Fan
Yaowei Wang
Xuefeng Jin
Qun Liu
Yonghong Tian
    ALM
    MoE
    AI4CE
ArXivPDFHTML

Papers citing "PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation"

33 / 33 papers shown
Title
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
60
0
0
05 May 2025
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings
Layba Fiaz
Munief Hassan Tahir
Sana Shams
Sarmad Hussain
49
0
0
24 Feb 2025
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
Ziyao Zhang
Yanlin Wang
Chong Wang
Jiachi Chen
Zibin Zheng
114
13
0
20 Jan 2025
Hengqin-RA-v1: Advanced Large Language Model for Diagnosis and Treatment of Rheumatoid Arthritis with Dataset based Traditional Chinese Medicine
Hengqin-RA-v1: Advanced Large Language Model for Diagnosis and Treatment of Rheumatoid Arthritis with Dataset based Traditional Chinese Medicine
Yishen Liu
Shengda Luo
Zishao Zhong
Tongtong Wu
J. Zhang
Peiyao Ou
Yong Liang
Liang Liu
Hudan Pan
LM&MA
33
0
0
05 Jan 2025
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
57
15
0
06 Oct 2024
How Does Code Pretraining Affect Language Model Task Performance?
How Does Code Pretraining Affect Language Model Task Performance?
Jackson Petty
Sjoerd van Steenkiste
Tal Linzen
55
7
0
06 Sep 2024
The Mosaic Memory of Large Language Models
The Mosaic Memory of Large Language Models
Igor Shilov
Matthieu Meeus
Yves-Alexandre de Montjoye
39
3
0
24 May 2024
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation
  Benchmark for Chinese Large Language Models
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Chuang Liu
Renren Jin
Yuqi Ren
Deyi Xiong
ELM
14
0
0
19 Mar 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
25
11
0
25 Jan 2024
CValues: Measuring the Values of Chinese Large Language Models from
  Safety to Responsibility
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
Guohai Xu
Jiayi Liu
Mingshi Yan
Haotian Xu
Jinghui Si
...
Rong Zhang
Ji Zhang
Chao Peng
Feiyan Huang
Jingren Zhou
ALM
ELM
23
72
0
19 Jul 2023
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
Ariel Ekgren
Amaru Cuba Gyllensten
Felix Stollenwerk
Joey Öhman
T. Isbister
Evangelia Gogoulou
F. Carlsson
Alice Heiman
Judit Casademont
Magnus Sahlgren
16
13
0
22 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through
  the Lens of Verification and Validation
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
27
81
0
19 May 2023
Safety Analysis in the Era of Large Language Models: A Case Study of
  STPA using ChatGPT
Safety Analysis in the Era of Large Language Models: A Case Study of STPA using ChatGPT
Yi Qi
Xingyu Zhao
Siddartha Khastgir
Xiaowei Huang
19
14
0
03 Apr 2023
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual
  Benchmarking on HumanEval-X
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X
Qinkai Zheng
Xiao Xia
Xu Zou
Yuxiao Dong
Shanshan Wang
...
Andi Wang
Yang Li
Teng Su
Zhilin Yang
Jie Tang
ELM
ALM
SyDa
50
314
0
30 Mar 2023
Understanding BLOOM: An empirical study on diverse NLP tasks
Understanding BLOOM: An empirical study on diverse NLP tasks
Parag Dakle
Sai Krishna Rallabandi
Preethi Raghavan
AI4CE
17
3
0
27 Nov 2022
PromptTTS: Controllable Text-to-Speech with Text Descriptions
PromptTTS: Controllable Text-to-Speech with Text Descriptions
Zhifang Guo
Yichong Leng
Yihan Wu
Sheng Zhao
Xuejiao Tan
DiffM
11
88
0
22 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
54
2,297
0
09 Nov 2022
Changes from Classical Statistics to Modern Statistics and Data Science
Changes from Classical Statistics to Modern Statistics and Data Science
Kai Zhang
Shan-Yu Liu
M. Xiong
21
0
0
30 Oct 2022
PanGu-Coder: Program Synthesis with Function-Level Language Modeling
PanGu-Coder: Program Synthesis with Function-Level Language Modeling
Fenia Christopoulou
Gerasimos Lampouras
Milan Gritta
Guchun Zhang
Yinpeng Guo
...
Guangtai Liang
Jia Wei
Xin Jiang
Qianxiang Wang
Qun Liu
ELM
SyDa
ALM
24
74
0
22 Jul 2022
Machine Learning Model Sizes and the Parameter Gap
Machine Learning Model Sizes and the Parameter Gap
Pablo Villalobos
J. Sevilla
T. Besiroglu
Lennart Heim
A. Ho
Marius Hobbhahn
ALM
ELM
AI4CE
18
55
0
05 Jul 2022
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language
  Understanding and Generation
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
29
27
0
30 May 2022
mGPT: Few-Shot Learners Go Multilingual
mGPT: Few-Shot Learners Go Multilingual
Oleh Shliazhko
Alena Fenogenova
Maria Tikhonova
Vladislav Mikhailov
Anastasia Kozlova
Tatiana Shavrina
14
148
0
15 Apr 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
16
797
0
14 Apr 2022
EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with
  Large-Scale Pre-Training
EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training
Yuxian Gu
Jiaxin Wen
Hao-Lun Sun
Yi Song
Pei Ke
...
Zheng Zhang
Jianzhu Yao
Lei Liu
Xiaoyan Zhu
Minlie Huang
19
55
0
17 Mar 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
  Large-Scale Generative Language Model
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
...
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
20
728
0
28 Jan 2022
Black-Box Tuning for Language-Model-as-a-Service
Black-Box Tuning for Language-Model-as-a-Service
Tianxiang Sun
Yunfan Shao
Hong Qian
Xuanjing Huang
Xipeng Qiu
VLM
39
255
0
10 Jan 2022
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training
  for Language Understanding and Generation
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
26
76
0
23 Dec 2021
On Transferability of Prompt Tuning for Natural Language Processing
On Transferability of Prompt Tuning for Natural Language Processing
Yusheng Su
Xiaozhi Wang
Yujia Qin
Chi-Min Chan
Yankai Lin
...
Peng Li
Juanzi Li
Lei Hou
Maosong Sun
Jie Zhou
AAML
VLM
11
98
0
12 Nov 2021
bert2BERT: Towards Reusable Pretrained Language Models
bert2BERT: Towards Reusable Pretrained Language Models
Cheng Chen
Yichun Yin
Lifeng Shang
Xin Jiang
Yujia Qin
Fengyu Wang
Zhi Wang
Xiao Chen
Zhiyuan Liu
Qun Liu
VLM
9
59
0
14 Oct 2021
Towards Efficient Post-training Quantization of Pre-trained Language
  Models
Towards Efficient Post-training Quantization of Pre-trained Language Models
Haoli Bai
Lu Hou
Lifeng Shang
Xin Jiang
Irwin King
M. Lyu
MQ
56
47
0
30 Sep 2021
What Changes Can Large-scale Language Models Bring? Intensive Study on
  HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Boseop Kim
Hyoungseok Kim
Sang-Woo Lee
Gichang Lee
Donghyun Kwak
...
Jaewook Kang
Inho Kang
Jung-Woo Ha
W. Park
Nako Sung
VLM
241
121
0
10 Sep 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
24
807
0
14 Jun 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
1