ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.01413
  4. Cited By
Is Model Collapse Inevitable? Breaking the Curse of Recursion by
  Accumulating Real and Synthetic Data

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

1 April 2024
Matthias Gerstgrasser
Rylan Schaeffer
Apratim Dey
Rafael Rafailov
Henry Sleight
John Hughes
Tomasz Korbak
Rajashree Agrawal
Dhruv Pai
Andrey Gromov
Daniel A. Roberts
Diyi Yang
D. Donoho
Oluwasanmi Koyejo
ArXivPDFHTML

Papers citing "Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data"

40 / 40 papers shown
Title
Self-Consuming Generative Models with Adversarially Curated Data
Self-Consuming Generative Models with Adversarially Curated Data
Xiukun Wei
Xueru Zhang
WIGM
39
0
0
14 May 2025
On the generalization of language models from in-context learning and finetuning: a controlled study
On the generalization of language models from in-context learning and finetuning: a controlled study
Andrew Kyle Lampinen
Arslan Chaudhry
Stephanie Chan
Cody Wild
Diane Wan
Alex Ku
Jorg Bornschein
Razvan Pascanu
Murray Shanahan
James L. McClelland
46
0
0
01 May 2025
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models
Caia Costello
Simon Guo
Anna Goldie
Azalia Mirhoseini
ReLM
SyDa
LRM
111
1
0
25 Apr 2025
Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Thomas F Burns
Letitia Parcalabescu
Stephan Wäldchen
Michael Barlow
Gregor Ziegltrum
Volker Stampa
Bastian Harren
Björn Deiseroth
SyDa
41
0
0
24 Apr 2025
$\texttt{Complex-Edit}$: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark
Complex-Edit\texttt{Complex-Edit}Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark
S. Yang
Mude Hui
Bingchen Zhao
Yuyin Zhou
Nataniel Ruiz
Cihang Xie
CoGe
73
0
0
17 Apr 2025
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
Vivek Iyer
Ricardo Rei
Pinzhen Chen
Alexandra Birch
SyDa
LM&MA
70
0
0
29 Mar 2025
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
Mihai Nadas
Laura Diosan
Andreea Tomescu
SyDa
72
0
0
18 Mar 2025
Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity
HyunJin Kim
Xiaoyuan Yi
Jing Yao
Muhua Huang
Jinyeong Bak
James Evans
Xing Xie
44
0
0
08 Mar 2025
Position: Model Collapse Does Not Mean What You Think
Position: Model Collapse Does Not Mean What You Think
Rylan Schaeffer
Joshua Kazdan
Alvan Caleb Arulandu
Sanmi Koyejo
71
0
0
05 Mar 2025
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops
Shi Fu
Yingjie Wang
Yuzhu Chen
Xinmei Tian
Dacheng Tao
53
1
0
26 Feb 2025
Machine-generated text detection prevents language model collapse
Machine-generated text detection prevents language model collapse
George Drayson
Emine Yilmaz
Vasileios Lampos
DeLMO
62
0
0
21 Feb 2025
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
Kareem Amin
Sara Babakniya
Alex Bie
Weiwei Kong
Umar Syed
Sergei Vassilvitskii
70
1
0
13 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
117
4
0
06 Feb 2025
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Nayoung Lee
Ziyang Cai
Avi Schwarzschild
Kangwook Lee
Dimitris Papailiopoulos
ReLM
VLM
LRM
AI4CE
83
4
0
03 Feb 2025
Rate of Model Collapse in Recursive Training
Rate of Model Collapse in Recursive Training
A. Suresh
A. Thangaraj
Aditya Nanda Kishore Khandavally
SyDa
27
5
0
23 Dec 2024
A Review of Fairness and A Practical Guide to Selecting Context-Appropriate Fairness Metrics in Machine Learning
A Review of Fairness and A Practical Guide to Selecting Context-Appropriate Fairness Metrics in Machine Learning
Caleb J. S. Barr
Olivia Erdelyi
Paul D. Docherty
Randolph C. Grace
FaML
70
0
0
10 Nov 2024
One fish, two fish, but not the whole sea: Alignment reduces language
  models' conceptual diversity
One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity
Sonia K. Murthy
Tomer Ullman
Jennifer Hu
ALM
43
11
0
07 Nov 2024
Universality of the $π^2/6$ Pathway in Avoiding Model Collapse
Universality of the π2/6π^2/6π2/6 Pathway in Avoiding Model Collapse
Apratim Dey
D. Donoho
58
5
0
30 Oct 2024
Intention Is All You Need
Intention Is All You Need
Advait Sarkar
34
2
0
24 Oct 2024
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Elyas Obbad
Iddah Mlauzi
Brando Miranda
Rylan Schaeffer
Kamal Obbad
Suhana Bedi
Sanmi Koyejo
CVBM
53
0
0
23 Oct 2024
Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Joshua Kazdan
Rylan Schaeffer
Apratim Dey
Matthias Gerstgrasser
Rafael Rafailov
D. Donoho
Sanmi Koyejo
53
11
0
22 Oct 2024
Bias Amplification: Large Language Models as Increasingly Biased Media
Bias Amplification: Large Language Models as Increasingly Biased Media
Ze Wang
Zekun Wu
Jeremy Zhang
Navya Jain
Xin Guan
Skylar Lu
Saloni Gupta
Adriano Soares Koshiyama
39
0
0
19 Oct 2024
Maximizing the Potential of Synthetic Data: Insights from Random Matrix
  Theory
Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory
Aymane El Firdoussi
M. Seddik
Soufiane Hayou
Réda Alami
Ahmed Alzubaidi
Hakim Hacid
28
1
0
11 Oct 2024
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
Yougang Lyu
Lingyong Yan
Zihan Wang
Dawei Yin
Pengjie Ren
Maarten de Rijke
Z. Z. Ren
63
6
0
10 Oct 2024
O1 Replication Journey: A Strategic Progress Report -- Part 1
O1 Replication Journey: A Strategic Progress Report -- Part 1
Yiwei Qin
Xuefeng Li
Haoyang Zou
Yixiu Liu
Shijie Xia
...
Yixin Ye
Weizhe Yuan
Hector Liu
Yuan Li
Pengfei Liu
VLM
48
68
0
08 Oct 2024
Self-Improving Diffusion Models with Synthetic Data
Self-Improving Diffusion Models with Synthetic Data
Sina Alemohammad
Ahmed Imtiaz Humayun
S. Agarwal
John Collomosse
Richard G. Baraniuk
33
11
0
29 Aug 2024
Self-Directed Synthetic Dialogues and Revisions Technical Report
Self-Directed Synthetic Dialogues and Revisions Technical Report
Nathan Lambert
Hailey Schoelkopf
Aaron Gokaslan
Luca Soldaini
Valentina Pyatkin
Louis Castricato
SyDa
45
3
0
25 Jul 2024
A survey on the impact of AI-based recommenders on human behaviours:
  methodologies, outcomes and future directions
A survey on the impact of AI-based recommenders on human behaviours: methodologies, outcomes and future directions
Luca Pappalardo
Emanuele Ferragina
Salvatore Citraro
Giuliano Cornacchia
M. Nanni
...
D. Gambetta
Giovanni Mauro
Virginia Morini
Valentina Pansanella
D. Pedreschi
48
9
0
29 Jun 2024
SK-VQA: Synthetic Knowledge Generation at Scale for Training
  Context-Augmented Multimodal LLMs
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
Xin Su
Man Luo
Kris W Pan
Tien Pei Chou
Vasudev Lal
Phillip Howard
53
3
0
28 Jun 2024
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math
  Reasoning by Eight-Fold
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
Amrith Rajagopal Setlur
Saurabh Garg
Xinyang Geng
Naman Garg
Virginia Smith
Aviral Kumar
42
45
0
20 Jun 2024
From Text to Life: On the Reciprocal Relationship between Artificial
  Life and Large Language Models
From Text to Life: On the Reciprocal Relationship between Artificial Life and Large Language Models
Eleni Nisioti
Claire Glanois
Elias Najarro
Andrew Dai
Elliot Meyerson
J. Pedersen
Laetitia Teodorescu
Conor F. Hayes
Shyam Sudhakaran
Sebastian Risi
AI4CE
LM&Ro
51
3
0
14 Jun 2024
Understanding Hallucinations in Diffusion Models through Mode
  Interpolation
Understanding Hallucinations in Diffusion Models through Mode Interpolation
Sumukh K. Aithal
Pratyush Maini
Zachary Chase Lipton
J. Zico Kolter
DiffM
40
19
0
13 Jun 2024
Cooperative learning of Pl@ntNet's Artificial Intelligence algorithm:
  how does it work and how can we improve it?
Cooperative learning of Pl@ntNet's Artificial Intelligence algorithm: how does it work and how can we improve it?
Tanguy Lefort
Antoine Affouard
Benjamin Charlier
J. Lombardo
Mathias Chouet
Hervé Goëau
Joseph Salmon
P. Bonnet
Alexis Joly
37
0
0
05 Jun 2024
Linguistic Collapse: Neural Collapse in (Large) Language Models
Linguistic Collapse: Neural Collapse in (Large) Language Models
Robert Wu
Vardan Papyan
48
12
0
28 May 2024
Meanings and Feelings of Large Language Models: Observability of Latent
  States in Generative AI
Meanings and Feelings of Large Language Models: Observability of Latent States in Generative AI
Tian Yu Liu
Stefano Soatto
Matteo Marchi
Pratik Chaudhari
Paulo Tabuada
AI4CE
38
2
0
22 May 2024
Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and
  Image Embeddings
Synth2^22: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Sahand Sharifzadeh
Christos Kaplanis
Shreya Pathak
D. Kumaran
Anastasija Ilić
Jovana Mitrović
Charles Blundell
Andrea Banino
VLM
46
9
0
12 Mar 2024
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video
  Diffusion Models
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
Wenhao Wang
Yi Yang
VGen
DiffM
33
32
0
10 Mar 2024
Large Language Models for Data Annotation: A Survey
Large Language Models for Data Annotation: A Survey
Zhen Tan
Dawei Li
Song Wang
Alimohammad Beigi
Bohan Jiang
Amrita Bhattacharjee
Mansooreh Karami
Wenlin Yao
Lu Cheng
Huan Liu
SyDa
56
50
0
21 Feb 2024
Model Collapse Demystified: The Case of Regression
Model Collapse Demystified: The Case of Regression
Elvis Dohmatob
Yunzhen Feng
Julia Kempe
39
32
0
12 Feb 2024
Data Feedback Loops: Model-driven Amplification of Dataset Biases
Data Feedback Loops: Model-driven Amplification of Dataset Biases
Rohan Taori
Tatsunori B. Hashimoto
74
43
0
08 Sep 2022
1