Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2409.04556
Cited By
v1
v2 (latest)
How Does Code Pretraining Affect Language Model Task Performance?
6 September 2024
Jackson Petty
Sjoerd van Steenkiste
Tal Linzen
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"How Does Code Pretraining Affect Language Model Task Performance?"
48 / 48 papers shown
Empirical Study of Code Large Language Models for Binary Security Patch Detection
Qingyuan Li
Binchang Li
Cuiyun Gao
Shuzheng Gao
Zongjie Li
131
1
0
07 Sep 2025
Don't throw the baby out with the bathwater: How and why deep learning for ARC
Jack Cole
Mohamed Osman
LRM
398
6
0
17 Jun 2025
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Pierre-Carl Langlais
Carlos Rosas Hinostroza
Mattia Nee
Catherine Arnett
Pavel Chizhov
Eliot Jones
Irène Girard
David Mach
Anastasia Stasenko
Ivan P. Yamshchikov
AILaw
279
7
0
02 Jun 2025
Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data
Shaoxiong Ji
Zihao Li
Jaakko Paavola
Indraneil Paul
Hengyu Luo
CLL
527
5
0
31 May 2025
Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning
Zachary Shinnick
Liangze Jiang
Hemanth Saratchandran
Anton Van Den Hengel
Damien Teney
205
3
0
28 May 2025
Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning
Yoichi Ishibashi
Taro Yano
Masafumi Oyamada
SyDa
LRM
333
7
0
15 May 2025
Trillion 7B Technical Report
Sungjun Han
Juyoung Suk
Suyeong An
Hyungguk Kim
Kyuseok Kim
Wonsuk Yang
Seungtaek Choi
Jamin Shin
901
4
0
21 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
Cengiz Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
983
94
0
10 Apr 2025
Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources
Zihao Li
Shaoxiong Ji
Hengyu Luo
Jörg Tiedemann
CLL
893
5
0
05 Apr 2025
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
Emmy Liu
Amanda Bertsch
Lintang Sutawika
Lindia Tjuatja
Patrick Fernandes
...
Siyang Song
Carolin (Haas) Lawrence
Aditi Raghunathan
Kiril Gashteovski
Graham Neubig
620
7
0
05 Mar 2025
General Intelligence Requires Reward-based Pretraining
Seungwook Han
Jyothish Pari
Samuel J. Gershman
Pulkit Agrawal
LRM
865
2
0
26 Feb 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
435
14
0
26 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
1.1K
17
0
22 Feb 2025
Privacy-Preserving Dataset Combination
Keren Fuentes
Mimee Xu
Irene Chen
386
1
0
09 Feb 2025
Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event Representation
Safeyah Khaled Alshemali
Daniel Bauer
Yuval Marton
BDL
1.1K
0
0
19 Oct 2024
Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations
Matthias Lindemann
Alexander Koller
Ivan Titov
AI4CE
NAI
296
5
0
05 Jul 2024
CodeGemma: Open Code Models Based on Gemma
CodeGemma Team
Heri Zhao
Jeffrey Hui
Joshua Howland
Nam Nguyen
...
Ale Jakse Hartman
Bin Ni
Kathy Korevec
Kelly Schaefer
Scott Huffman
VLM
428
224
0
17 Jun 2024
Code Pretraining Improves Entity Tracking Abilities of Language Models
Najoung Kim
Sebastian Schuster
Shubham Toshniwal
350
22
0
31 May 2024
CogBench: a large language model walks into a psychology lab
Julian Coda-Forno
Marcel Binz
Jane X. Wang
Eric Schulz
ELM
ALM
LLMAG
LM&MA
410
57
0
28 Feb 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
727
604
0
01 Feb 2024
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Luca Soldaini
Rodney Michael Kinney
Akshita Bhagia
Dustin Schwenk
David Atkinson
...
Hanna Hajishirzi
Iz Beltagy
Dirk Groeneveld
Jesse Dodge
Kyle Lo
444
445
0
31 Jan 2024
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents
Ke Yang
Jiateng Liu
John Wu
Chaoqi Yang
Yi R. Fung
...
Xu Cao
Xingyao Wang
Yiquan Wang
Chenhui Xu
Chengxiang Zhai
LLMAG
ELM
520
124
0
01 Jan 2024
In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Aaron Mueller
Albert Webson
Jackson Petty
Tal Linzen
ReLM
LRM
319
21
0
13 Nov 2023
The Impact of Depth on Compositional Generalization in Transformer Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jackson Petty
Sjoerd van Steenkiste
Ishita Dasgupta
Fei Sha
Daniel H Garrette
Tal Linzen
AI4CE
VLM
372
35
0
30 Oct 2023
At Which Training Stage Does Code Data Help LLMs Reasoning?
International Conference on Learning Representations (ICLR), 2023
Xiaogang Jia
Yue Liu
Yue Yu
Yuanliang Zhang
Yu Jiang
Changjian Wang
Shanshan Li
LRM
SyDa
418
100
0
28 Sep 2023
Code Llama: Open Foundation Models for Code
Baptiste Rozière
Jonas Gehring
Fabian Gloeckle
Sten Sootla
Itai Gat
...
Hugo Touvron
Louis Martin
Nicolas Usunier
Thomas Scialom
Gabriel Synnaeve
ELM
ALM
671
3,008
0
24 Aug 2023
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Shayne Longpre
Gregory Yauney
Emily Reif
Katherine Lee
Adam Roberts
...
Denny Zhou
Jason W. Wei
Kevin Robinson
David M. Mimno
Daphne Ippolito
458
224
0
22 May 2023
Exploring the Curious Case of Code Prompts
Li Zhang
Liam Dugan
Hainiu Xu
Chris Callison-Burch
LRM
226
15
0
26 Apr 2023
Injecting structural hints: Using language models to study inductive biases in language learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Isabel Papadimitriou
Dan Jurafsky
325
29
0
25 Apr 2023
Beyond the C: Retargetable Decompilation using Neural Machine Translation
Iman Hosseini
Brendan Dolan-Gavitt
300
31
0
17 Dec 2022
Complementary Explanations for Effective In-Context Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Xi Ye
Srini Iyer
Asli Celikyilmaz
Ves Stoyanov
Greg Durrett
Ramakanth Pasunuru
ReLM
LRM
310
119
0
25 Nov 2022
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
International Conference on Learning Representations (ICLR), 2022
Denny Zhou
Nathanael Scharli
Le Hou
Jason W. Wei
Nathan Scales
...
Dale Schuurmans
Claire Cui
Olivier Bousquet
Quoc Le
Ed H. Chi
RALM
LRM
AI4CE
922
1,636
0
21 May 2022
The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning
Neural Information Processing Systems (NeurIPS), 2022
Xi Ye
Greg Durrett
ReLM
LRM
387
242
0
06 May 2022
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
International Conference on Machine Learning (ICML), 2022
Thomas Wang
Adam Roberts
Daniel Hesslow
Teven Le Scao
Hyung Won Chung
Iz Beltagy
Julien Launay
Colin Raffel
327
225
0
12 Apr 2022
Scaling Up Models and Data with
t5x
\texttt{t5x}
t5x
and
seqio
\texttt{seqio}
seqio
Journal of machine learning research (JMLR), 2022
Adam Roberts
Hyung Won Chung
Anselm Levskaya
Gaurav Mishra
James Bradbury
...
Brennan Saeta
Ryan Sepassi
A. Spiridonov
Joshua Newlan
Andrea Gesmundo
ALM
487
215
0
31 Mar 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
1.2K
2,990
0
29 Mar 2022
Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models
Findings (Findings), 2022
Aaron Mueller
Robert Frank
Tal Linzen
Luheng Wang
Sebastian Schuster
AIMat
320
36
0
17 Mar 2022
Training language models to follow instructions with human feedback
Neural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
2.4K
19,487
0
04 Mar 2022
Improving Compositional Generalization with Latent Structure and Data Augmentation
Linlu Qiu
Peter Shaw
Panupong Pasupat
Pawel Krzysztof Nowak
Tal Linzen
Fei Sha
Kristina Toutanova
CoGe
400
60
0
14 Dec 2021
Examining Zero-Shot Vulnerability Repair with Large Language Models
IEEE Symposium on Security and Privacy (IEEE S&P), 2021
Hammond Pearce
Benjamin Tan
Baleegh Ahmad
Ramesh Karri
Brendan Dolan-Gavitt
AAML
ELM
356
302
0
03 Dec 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
2.3K
4,887
0
03 Sep 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
2.7K
8,889
0
07 Jul 2021
PanGu-
α
α
α
: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Wei Zeng
Xiaozhe Ren
Teng Su
Hui Wang
Yi-Lun Liao
...
Gaojun Fan
Yaowei Wang
Xuefeng Jin
Qun Liu
Yonghong Tian
ALM
MoE
AI4CE
316
241
0
26 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
1.1K
2,701
0
31 Dec 2020
COGS: A Compositional Generalization Challenge Based on Semantic Interpretation
Najoung Kim
Tal Linzen
CoGe
402
316
0
12 Oct 2020
Unsupervised Translation of Programming Languages
Marie-Anne Lachaux
Baptiste Roziere
L. Chanussot
Guillaume Lample
458
529
0
05 Jun 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Journal of machine learning research (JMLR), 2019
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
1.8K
25,036
0
23 Oct 2019
Attention Is All You Need
Neural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
8.3K
172,602
0
12 Jun 2017
1
Page 1 of 1