ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.01373
  4. Cited By
Pythia: A Suite for Analyzing Large Language Models Across Training and
  Scaling

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

3 April 2023
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
Eric Hallahan
Mohammad Aflah Khan
Shivanshu Purohit
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
ArXivPDFHTML

Papers citing "Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling"

21 / 171 papers shown
Title
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
25
350
0
20 Jun 2023
LEACE: Perfect linear concept erasure in closed form
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELM
MU
39
102
0
06 Jun 2023
LLM-powered Data Augmentation for Enhanced Cross-lingual Performance
LLM-powered Data Augmentation for Enhanced Cross-lingual Performance
Chenxi Whitehouse
Monojit Choudhury
Alham Fikri Aji
SyDa
LRM
14
68
0
23 May 2023
Enhancing Chat Language Models by Scaling High-quality Instructional
  Conversations
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
Ning Ding
Yulin Chen
Bokai Xu
Yujia Qin
Zhi Zheng
Shengding Hu
Zhiyuan Liu
Maosong Sun
Bowen Zhou
ALM
11
474
0
23 May 2023
RWKV: Reinventing RNNs for the Transformer Era
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
43
550
0
22 May 2023
Smaller Language Models are Better Black-box Machine-Generated Text
  Detectors
Smaller Language Models are Better Black-box Machine-Generated Text Detectors
Niloofar Mireshghallah
Justus Mattern
Sicun Gao
Reza Shokri
Taylor Berg-Kirkpatrick
DeLMO
6
46
0
17 May 2023
A Language Model of Java Methods with Train/Test Deduplication
A Language Model of Java Methods with Train/Test Deduplication
Chia-Yi Su
Aakash Bansal
Vijayanta Jain
S. Ghanavati
Collin McMillan
SyDa
VLM
9
8
0
15 May 2023
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
43
63
0
10 May 2023
Scaling Transformer to 1M tokens and beyond with RMT
Scaling Transformer to 1M tokens and beyond with RMT
Aydar Bulatov
Yuri Kuratov
Yermek Kapushev
Mikhail Burtsev
LRM
11
86
0
19 Apr 2023
The MiniPile Challenge for Data-Efficient Language Models
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoE
ALM
8
41
0
17 Apr 2023
What Language Model to Train if You Have One Million GPU Hours?
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
212
103
0
27 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
240
1,070
0
05 Oct 2022
A Systematic Evaluation of Large Language Models of Code
A Systematic Evaluation of Large Language Models of Code
Frank F. Xu
Uri Alon
Graham Neubig
Vincent J. Hellendoorn
ELM
ALM
188
624
0
26 Feb 2022
A Systematic Study of Bias Amplification
A Systematic Study of Bias Amplification
Melissa Hall
L. V. D. van der Maaten
Laura Gustafson
Maxwell Jones
Aaron B. Adcock
83
69
0
27 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Stepmothers are mean and academics are pretentious: What do pretrained
  language models learn about you?
Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?
Rochelle Choenni
Ekaterina Shutova
R. Rooij
71
29
0
21 Sep 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
234
447
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
264
1,798
0
14 Dec 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
Previous
1234