Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.07759
Cited By
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
12 May 2023
Ronen Eldan
Yuan-Fang Li
SyDa
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TinyStories: How Small Can Language Models Be and Still Speak Coherent English?"
37 / 37 papers shown
Title
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions
Dhruvesh Patel
Aishwarya Sahoo
Avinash Amballa
Tahira Naseem
Tim G. J. Rudner
Andrew McCallum
KELM
42
0
0
09 May 2025
Demystifying optimized prompts in language models
Rimon Melamed
Lucas H. McCabe
H. H. Huang
37
0
0
04 May 2025
Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models
Xuhui Jiang
Shengjie Ma
Chengjin Xu
Cehao Yang
Liyu Zhang
Jian Guo
SyDa
28
0
0
02 May 2025
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Mihai Nadas
Laura Diosan
Andrei Piscoran
Andreea Tomescu
VGen
57
0
0
29 Apr 2025
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
63
0
0
25 Apr 2025
Looking beyond the next token
Abitha Thankaraj
Yiding Jiang
J. Zico Kolter
Yonatan Bisk
LRM
51
1
0
15 Apr 2025
BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training
Nikitas Theodoropoulos
Giorgos Filandrianos
Vassilis Lyberatos
Maria Lymperaiou
Giorgos Stamou
SyDa
46
1
0
24 Feb 2025
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Anton Razzhigaev
Matvey Mikhalchuk
Temurbek Rahmatullaev
Elizaveta Goncharova
Polina Druzhinina
Ivan V. Oseledets
Andrey Kuznetsov
57
1
0
20 Feb 2025
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Yize Zhao
Tina Behnia
V. Vakilian
Christos Thrampoulidis
55
8
0
20 Feb 2025
Few-shot LLM Synthetic Data with Distribution Matching
Jiyuan Ren
Zhaocheng Du
Zhihao Wen
Qinglin Jia
Sunhao Dai
Chuhan Wu
Zhenhua Dong
SyDa
75
0
0
09 Feb 2025
Training Bilingual LMs with Data Constraints in the Targeted Language
Skyler Seto
Maartje ter Hoeve
He Bai
Natalie Schluter
David Grangier
74
0
0
20 Nov 2024
Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection in Language Models
Minki Kang
Sung Ju Hwang
Gibbeum Lee
Jaewoong Cho
KELM
32
0
0
01 Nov 2024
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
Artem Basharin
Andrei Chertkov
Ivan V. Oseledets
36
1
0
23 Oct 2024
Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Joshua Kazdan
Rylan Schaeffer
Apratim Dey
Matthias Gerstgrasser
Rafael Rafailov
D. Donoho
Sanmi Koyejo
45
11
0
22 Oct 2024
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Syeda Nahida Akter
Shrimai Prabhumoye
John Kamalu
S. Satheesh
Eric Nyberg
M. Patwary
M. Shoeybi
Bryan Catanzaro
LRM
SyDa
ReLM
98
1
0
15 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
44
4
0
09 Oct 2024
Collapsed Language Models Promote Fairness
Jingxuan Xu
Wuyang Chen
Linyi Li
Yao Zhao
Yunchao Wei
39
0
0
06 Oct 2024
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu
Xiang Li
Dongqi Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
ObjD
LRM
51
36
0
24 Sep 2024
Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs
Guillermo Marco
Luz Rello
Julio Gonzalo
LM&MA
ALM
39
6
0
17 Sep 2024
Masked Mixers for Language Generation and Retrieval
Benjamin L. Badger
37
0
0
02 Sep 2024
An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication
Yunmeng Li
Jun Suzuki
Makoto Morishita
Kaori Abe
Kentaro Inui
53
1
0
28 Aug 2024
Too Late to Train, Too Early To Use? A Study on Necessity and Viability of Low-Resource Bengali LLMs
Tamzeed Mahfuz
Satak Kumar Dey
Ruwad Naswan
Hasnaen Adil
Khondker Salman Sayeed
Haz Sameen Shahgir
29
0
0
29 Jun 2024
From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation
Ali Malik
Stephen Mayhew
Chris Piech
K. Bicknell
24
3
0
05 Jun 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
42
6
0
24 May 2024
Memory Mosaics
Jianyu Zhang
Niklas Nolte
Ranajoy Sadhukhan
Beidi Chen
Léon Bottou
VLM
54
3
0
10 May 2024
LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing
Zeyang Ma
A. Chen
Dong Jae Kim
Tse-Husn Chen
Shaowei Wang
27
44
0
27 Apr 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
41
79
0
26 Mar 2024
Pretrained Generative Language Models as General Learning Frameworks for Sequence-Based Tasks
Ben Fauber
11
2
0
08 Feb 2024
TinyGSM: achieving >80% on GSM8k with small language models
Bingbin Liu
Sébastien Bubeck
Ronen Eldan
Janardhan Kulkarni
Yuanzhi Li
Anh Nguyen
Rachel A. Ward
Yi Zhang
ALM
19
47
0
14 Dec 2023
DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers
S. Chandy
Varun Gangal
Yi Yang
Gabriel Maggiotti
25
0
0
11 Dec 2023
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
Giovanni Monea
Maxime Peyrard
Martin Josifoski
Vishrav Chaudhary
Jason Eisner
Emre Kiciman
Hamid Palangi
Barun Patra
Robert West
KELM
47
12
0
04 Dec 2023
Steering Language Generation: Harnessing Contrastive Expert Guidance and Negative Prompting for Coherent and Diverse Synthetic Data Generation
Charles OÑeill
Y. Ting 丁
I. Ciucă
Jack Miller
Thang Bui
SyDa
29
1
0
15 Aug 2023
Mini-Giants: "Small" Language Models and Open Source Win-Win
Zhengping Zhou
Lezhi Li
Xinxi Chen
Andy Li
SyDa
ALM
MoE
24
6
0
17 Jul 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
230
2,989
0
22 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
107
61
0
07 Mar 2023
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
1