ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.01373
  4. Cited By
Pythia: A Suite for Analyzing Large Language Models Across Training and
  Scaling

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

3 April 2023
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
Eric Hallahan
Mohammad Aflah Khan
Shivanshu Purohit
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
ArXivPDFHTML

Papers citing "Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling"

50 / 171 papers shown
Title
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
21
3
0
29 May 2024
Scaling Laws for Discriminative Classification in Large Language Models
Scaling Laws for Discriminative Classification in Large Language Models
Dean Wyatte
Fatemeh Tahmasbi
Ming Li
Thomas Markovich
20
2
0
24 May 2024
Understanding the differences in Foundation Models: Attention, State
  Space Models, and Recurrent Neural Networks
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Jerome Sieber
Carmen Amo Alonso
A. Didier
M. Zeilinger
Antonio Orvieto
AAML
39
7
0
24 May 2024
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng
Diego Doimo
Corentin Kervadec
Iuri Macocco
Jade Yu
A. Laio
Marco Baroni
101
11
0
24 May 2024
Bayesian WeakS-to-Strong from Text Classification to Generation
Bayesian WeakS-to-Strong from Text Classification to Generation
Ziyun Cui
Ziyang Zhang
Wen Wu
Wen Wu
Chao Zhang
25
1
0
24 May 2024
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Ziqiao Ma
Zekun Wang
Joyce Chai
43
2
0
22 May 2024
A Multi-Perspective Analysis of Memorization in Large Language Models
A Multi-Perspective Analysis of Memorization in Large Language Models
Bowen Chen
Namgi Han
Yusuke Miyao
28
1
0
19 May 2024
A Systematic Evaluation of Large Language Models for Natural Language
  Generation Tasks
A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks
Xuanfan Ni
Piji Li
ELM
LRM
21
8
0
16 May 2024
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee
Jae Oh Woo
Juree Seok
Parisa Hassanzadeh
Wooseok Jang
...
Hankyu Moon
Wenjun Hu
Yeong-Dae Kwon
Taehee Lee
Seungjai Min
40
2
0
10 May 2024
BMRetriever: Tuning Large Language Models as Better Biomedical Text
  Retrievers
BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers
Ran Xu
Wenqi Shi
Yue Yu
Yuchen Zhuang
Yanqiao Zhu
M. D. Wang
Joyce C. Ho
Chao Zhang
Carl Yang
LM&MA
40
19
0
29 Apr 2024
Temporal Scaling Law for Large Language Models
Temporal Scaling Law for Large Language Models
Yizhe Xiong
Xiansheng Chen
Xin Ye
Hui Chen
Zijia Lin
Haoran Lian
Zhenpeng Su
Jianwei Niu
Guiguang Ding
28
9
0
27 Apr 2024
Embarrassingly Simple Unsupervised Aspect Based Sentiment Tuple
  Extraction
Embarrassingly Simple Unsupervised Aspect Based Sentiment Tuple Extraction
Kevin Scaria
Abyn Scaria
Ben Scaria
CoGe
16
0
0
21 Apr 2024
Learn Your Reference Model for Real Good Alignment
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski
Boris Shaposhnikov
Alexey Malakhov
Nikita Surnachev
Yaroslav Aksenov
Ian Maksimov
Nikita Balagansky
Daniil Gavrilov
OffRL
45
25
0
15 Apr 2024
PRobELM: Plausibility Ranking Evaluation for Language Models
PRobELM: Plausibility Ranking Evaluation for Language Models
Moy Yuan
Chenxi Whitehouse
Eric Chamoun
Rami Aly
Andreas Vlachos
68
4
0
04 Apr 2024
Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models
Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models
Jingyang Zhang
Jingwei Sun
Eric C. Yeats
Ouyang Yang
Martin Kuo
Jianyi Zhang
Hao Frank Yang
Hai Li
29
41
0
03 Apr 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
16
5
0
29 Mar 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
35
58
0
25 Mar 2024
AIOS: LLM Agent Operating System
AIOS: LLM Agent Operating System
Kai Mei
Zelong Li
Wujiang Xu
Wenyue Hua
Mingyu Jin
Yongfeng Zhang
Shuyuan Xu
Ruosong Ye
Yingqiang Ge
Yongfeng Zhang
LLMAG
23
17
0
25 Mar 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Understanding Emergent Abilities of Language Models from the Loss Perspective
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCV
LRM
55
46
0
23 Mar 2024
SaulLM-7B: A pioneering Large Language Model for Law
SaulLM-7B: A pioneering Large Language Model for Law
Pierre Colombo
T. Pires
Malik Boudiaf
Dominic Culver
Rui Melo
...
Andre F. T. Martins
Fabrizio Esposito
Vera Lúcia Raposo
Sofia Morgado
Michael Desa
ELM
AILaw
28
63
0
06 Mar 2024
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs
Aly M. Kassem
Omar Mahmoud
Niloofar Mireshghallah
Hyunwoo J. Kim
Yulia Tsvetkov
Yejin Choi
Sherif Saad
Santu Rana
44
18
0
05 Mar 2024
Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Rodrigo Santos
Joao Silva
Luís Gomes
João Rodrigues
António Branco
28
10
0
29 Feb 2024
COBIAS: Contextual Reliability in Bias Assessment
COBIAS: Contextual Reliability in Bias Assessment
Priyanshul Govil
Hemang Jain
Vamshi Bonagiri
Aman Chadha
Ponnurangam Kumaraguru
Manas Gaur
S. Dey
27
2
0
22 Feb 2024
Analysing The Impact of Sequence Composition on Language Model
  Pre-Training
Analysing The Impact of Sequence Composition on Language Model Pre-Training
Yu Zhao
Yuanbin Qu
Konrad Staniszewski
Szymon Tworkowski
Wei Liu
Piotr Milo's
Yuxiang Wu
Pasquale Minervini
14
13
0
21 Feb 2024
FL-NAS: Towards Fairness of NAS for Resource Constrained Devices via
  Large Language Models
FL-NAS: Towards Fairness of NAS for Resource Constrained Devices via Large Language Models
Ruiyang Qin
Yuting Hu
Zheyu Yan
Jinjun Xiong
Ahmed Abbasi
Yiyu Shi
6
5
0
09 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
107
347
0
09 Feb 2024
CroissantLLM: A Truly Bilingual French-English Language Model
CroissantLLM: A Truly Bilingual French-English Language Model
Manuel Faysse
Patrick Fernandes
Nuno M. Guerreiro
António Loison
Duarte M. Alves
...
François Yvon
André F.T. Martins
Gautier Viaud
C´eline Hudelot
Pierre Colombo
39
33
0
01 Feb 2024
Knowledge Fusion of Large Language Models
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
14
61
0
19 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations
  of Language Models
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
14
86
0
11 Jan 2024
Learn or Recall? Revisiting Incremental Learning with Pre-trained
  Language Models
Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models
Junhao Zheng
Shengjie Qiu
Qianli Ma
13
9
0
13 Dec 2023
DYAD: A Descriptive Yet Abjuring Density efficient approximation to
  linear neural network layers
DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers
S. Chandy
Varun Gangal
Yi Yang
Gabriel Maggiotti
18
0
0
11 Dec 2023
LLM360: Towards Fully Transparent Open-Source LLMs
LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu
Aurick Qiao
W. Neiswanger
Hongyi Wang
Bowen Tan
...
Zhiting Hu
Mark Schulze
Preslav Nakov
Timothy Baldwin
Eric P. Xing
27
68
0
11 Dec 2023
Efficient Online Data Mixing For Language Model Pre-Training
Efficient Online Data Mixing For Language Model Pre-Training
Alon Albalak
Liangming Pan
Colin Raffel
W. Wang
15
32
0
05 Dec 2023
Controlled Text Generation via Language Model Arithmetic
Controlled Text Generation via Language Model Arithmetic
Jasper Dekoninck
Marc Fischer
Luca Beurer-Kellner
Martin Vechev
24
36
0
24 Nov 2023
Boosting the Power of Small Multimodal Reasoning Models to Match Larger
  Models with Self-Consistency Training
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan
Jingxuan Wei
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Ruifeng Guo
Xihong Yang
Stan Z. Li
LRM
14
7
0
23 Nov 2023
Prompt have evil twins
Prompt have evil twins
Rimon Melamed
Lucas H. McCabe
T. Wakhare
Yejin Kim
H. H. Huang
Enric Boix-Adsera
8
3
0
13 Nov 2023
Pre-training LLMs using human-like development data corpus
Pre-training LLMs using human-like development data corpus
Khushi Bhardwaj
Raj Sanjay Shah
Sashank Varma
14
6
0
08 Nov 2023
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
54
103
0
26 Oct 2023
Beyond Reverse KL: Generalizing Direct Preference Optimization with
  Diverse Divergence Constraints
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
Chaoqi Wang
Yibo Jiang
Yuguang Yang
Han Liu
Yuxin Chen
11
81
0
28 Sep 2023
Knowledgeable In-Context Tuning: Exploring and Exploiting Factual
  Knowledge for In-Context Learning
Knowledgeable In-Context Tuning: Exploring and Exploiting Factual Knowledge for In-Context Learning
J. Wang
Chengyu Wang
Chuanqi Tan
Jun Huang
Ming Gao
KELM
21
4
0
26 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
27
678
0
19 Sep 2023
Knowledge-tuning Large Language Models with Structured Medical Knowledge
  Bases for Reliable Response Generation in Chinese
Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese
Hao Wang
Sendong Zhao
Zewen Qiang
Zijian Li
Nuwa Xi
...
Haoqiang Guo
Yuhan Chen
Haoming Xu
Bing Qin
Ting Liu
LM&MA
AI4MH
11
16
0
08 Sep 2023
From Base to Conversational: Japanese Instruction Dataset and Tuning
  Large Language Models
From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models
Masahiro Suzuki
Masanori Hirano
Hiroki Sakaji
26
6
0
07 Sep 2023
Benchmarks for Detecting Measurement Tampering
Benchmarks for Detecting Measurement Tampering
Fabien Roger
Ryan Greenblatt
Max Nadeau
Buck Shlegeris
Nate Thomas
19
2
0
29 Aug 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
50
3
0
28 Aug 2023
Situated Natural Language Explanations
Situated Natural Language Explanations
Zining Zhu
Hao Jiang
Jingfeng Yang
Sreyashi Nag
Chao Zhang
Jie Huang
Yifan Gao
Frank Rudzicz
Bing Yin
LRM
22
1
0
27 Aug 2023
Continual Pre-Training of Large Language Models: How to (re)warm your
  model?
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Kshitij Gupta
Benjamin Thérien
Adam Ibrahim
Mats L. Richter
Quentin G. Anthony
Eugene Belilovsky
Irina Rish
Timothée Lesort
KELM
13
98
0
08 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
12
3
0
07 Aug 2023
Mini-Giants: "Small" Language Models and Open Source Win-Win
Mini-Giants: "Small" Language Models and Open Source Win-Win
Zhengping Zhou
Lezhi Li
Xinxi Chen
Andy Li
SyDa
ALM
MoE
15
5
0
17 Jul 2023
Layer-wise Linear Mode Connectivity
Layer-wise Linear Mode Connectivity
Linara Adilova
Maksym Andriushchenko
Michael Kamp
Asja Fischer
Martin Jaggi
FedML
FAtt
MoMe
18
15
0
13 Jul 2023
Previous
1234
Next