Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.01373
Cited By
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
3 April 2023
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
Eric Hallahan
Mohammad Aflah Khan
Shivanshu Purohit
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling"
50 / 171 papers shown
Title
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Yujuan Fu
Özlem Uzuner
Meliha Yetisgen
Fei Xia
38
3
0
24 Oct 2024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Aaron C. Courville
OffRL
74
4
0
23 Oct 2024
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
Zhenpeng Su
Xing Wu
Zijia Lin
Yizhe Xiong
Minxuan Lv
Guangyuan Ma
Hui Chen
Songlin Hu
Guiguang Ding
MoE
26
2
0
21 Oct 2024
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Guo
Aaquib Syed
Abhay Sheshadri
Aidan Ewart
Gintare Karolina Dziugaite
KELM
MU
29
5
0
16 Oct 2024
Reconstruction of Differentially Private Text Sanitization via Large Language Models
Shuchao Pang
Zhigang Lu
H. Wang
Peng Fu
Yongbin Zhou
Minhui Xue
AAML
39
4
0
16 Oct 2024
Scaling laws for post-training quantized large language models
Zifei Xu
Alexander Lan
W. Yazar
T. Webb
Sayeh Sharify
Xin Eric Wang
MQ
13
0
0
15 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
29
13
0
15 Oct 2024
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Yein Park
Chanwoong Yoon
Jungwoo Park
Donghyeon Lee
Minbyul Jeong
Jaewoo Kang
KELM
45
1
0
13 Oct 2024
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
20
0
0
12 Oct 2024
ELICIT: LLM Augmentation via External In-Context Capability
Futing Wang
Jianhao Yan
Yue Zhang
Tao Lin
35
0
0
12 Oct 2024
Detecting Training Data of Large Language Models via Expectation Maximization
Gyuwan Kim
Yang Li
Evangelia Spiliopoulou
Jie Ma
Miguel Ballesteros
William Yang Wang
MIALM
85
3
2
10 Oct 2024
Fine-tuning can Help Detect Pretraining Data from Large Language Models
H. Zhang
Songxin Zhang
Bingyi Jing
Hongxin Wei
31
0
0
09 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
78
0
0
09 Oct 2024
DEPT: Decoupled Embeddings for Pre-training Language Models
Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
VLM
43
0
0
07 Oct 2024
Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information
Yongheng Zhang
Qiguang Chen
Jingxuan Zhou
Peng Wang
Jiasheng Si
Jin Wang
Wenpeng Lu
Libo Qin
LRM
36
3
0
06 Oct 2024
Mitigating Memorization In Language Models
Mansi Sakarvadia
Aswathy Ajith
Arham Khan
Nathaniel Hudson
Caleb Geniesse
Kyle Chard
Yaoqing Yang
Ian Foster
Michael W. Mahoney
KELM
MU
40
0
0
03 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
67
7
0
03 Oct 2024
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
Jiyeon Kim
Hyunji Lee
Hyowon Cho
Joel Jang
Hyeonbin Hwang
Seungpil Won
Youbin Ahn
Dohaeng Lee
Minjoon Seo
KELM
46
2
0
02 Oct 2024
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
Tung-Yu Wu
Pei-Yu Lo
ReLM
LRM
40
2
0
02 Oct 2024
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Jin Hwa Lee
Thomas Jiralerspong
Lei Yu
Yoshua Bengio
Emily Cheng
CoGe
75
0
0
02 Oct 2024
Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Jie Zhang
Debeshee Das
Gautam Kamath
Florian Tramèr
MIALM
MIACV
207
16
1
29 Sep 2024
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method
Weichao Zhang
Ruqing Zhang
Jiafeng Guo
Maarten de Rijke
Yixing Fan
Xueqi Cheng
20
7
0
23 Sep 2024
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
Junjie Wen
Y. X. Zhu
Jinming Li
Minjie Zhu
Kun Wu
...
Ran Cheng
Chaomin Shen
Yaxin Peng
Feifei Feng
Jian Tang
LM&Ro
56
41
0
19 Sep 2024
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
Tianle Gu
Kexin Huang
Ruilin Luo
Yuanqi Yao
Yujiu Yang
Yan Teng
Yingchun Wang
MU
15
4
0
18 Sep 2024
KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models
Neel Rajani
Lilli Kiessling
Aleksandr Ogaltsov
Claus Lang
ALM
9
0
0
13 Sep 2024
Your Weak LLM is Secretly a Strong Teacher for Alignment
Leitian Tao
Yixuan Li
81
5
0
13 Sep 2024
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
Atilla Akkus
Mingjie Li
Junjie Chu
Junjie Chu
Michael Backes
Sinem Sav
Sinem Sav
SILM
SyDa
28
1
0
12 Sep 2024
Improving Pretraining Data Using Perplexity Correlations
Tristan Thrush
Christopher Potts
Tatsunori Hashimoto
32
17
0
09 Sep 2024
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding
Cheng Wang
Yiwei Wang
Bryan Hooi
Yujun Cai
Nanyun Peng
Kai-Wei Chang
35
2
0
05 Sep 2024
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Sepehr Kamahi
Yadollah Yaghoobzadeh
30
0
0
21 Aug 2024
CogLM: Tracking Cognitive Development of Large Language Models
Xinglin Wang
Peiwen Yuan
Shaoxiong Feng
Yiwei Li
Boyuan Pan
Heda Wang
Yao Hu
Kan Li
ELM
47
0
0
17 Aug 2024
Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Zi Liang
Haibo Hu
Qingqing Ye
Yaxin Xiao
Haoyang Li
AAML
ELM
SILM
38
4
0
05 Aug 2024
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data
Mingshu Li
18
3
0
01 Aug 2024
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
M. Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
SyDa
MQ
27
36
0
19 Jul 2024
Transformer Layers as Painters
Qi Sun
Marc Pickett
Aakash Kumar Nain
Llion Jones
AI4CE
29
13
0
12 Jul 2024
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo
Florian E. Dorner
Moritz Hardt
ELM
47
6
1
10 Jul 2024
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
Maor Ivgi
Ori Yoran
Jonathan Berant
Mor Geva
HILM
38
8
0
08 Jul 2024
How Effective are State Space Models for Machine Translation?
Hugo Pitorro
Pavlo Vasylenko
Marcos Vinícius Treviso
André F. T. Martins
Mamba
30
2
0
07 Jul 2024
Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Shumaila Javaid
R. A. Khalil
Nasir Saeed
Bin He
Mohamed-Slim Alouini
29
8
0
05 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
V. Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
28
17
0
01 Jul 2024
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth
Alvin Deng
Kyle O'Brien
Jyothir S V
Mohammad Aflah Khan
...
Jacob Ray Fuehne
Stella Biderman
Tracy Ke
Katherine Lee
Naomi Saphra
49
12
0
25 Jun 2024
Memorizing Documents with Guidance in Large Language Models
Bumjin Park
Jaesik Choi
KELM
RALM
16
1
0
23 Jun 2024
Blind Baselines Beat Membership Inference Attacks for Foundation Models
Debeshee Das
Jie Zhang
Florian Tramèr
MIALM
59
28
1
23 Jun 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
58
12
0
20 Jun 2024
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Hoyeon Chang
Jinho Park
Seonghyeon Ye
Sohee Yang
Youngkyung Seo
Du-Seong Chang
Minjoon Seo
KELM
23
30
0
17 Jun 2024
ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking
Wenshuo Li
Xinghao Chen
Han Shu
Yehui Tang
Yunhe Wang
MQ
26
2
0
17 Jun 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
43
12
0
13 Jun 2024
REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy
Haw-Shiuan Chang
Nanyun Peng
Mohit Bansal
Anil Ramakrishna
Tagyoung Chung
HILM
25
2
0
11 Jun 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
S. Feizi
A. Bhatele
19
13
0
04 Jun 2024
Stress-Testing Capability Elicitation With Password-Locked Models
Ryan Greenblatt
Fabien Roger
Dmitrii Krasheninnikov
David M. Krueger
27
12
0
29 May 2024
Previous
1
2
3
4
Next