ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.20094
  4. Cited By
Scaling Synthetic Data Creation with 1,000,000,000 Personas

Scaling Synthetic Data Creation with 1,000,000,000 Personas

28 June 2024
Tao Ge
Xin Chan
Dian Yu
Haitao Mi
Dong Yu
Dong Yu
    SyDa
ArXivPDFHTML

Papers citing "Scaling Synthetic Data Creation with 1,000,000,000 Personas"

50 / 80 papers shown
Title
Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation Map
Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation Map
Alessandro Simoni
Francesco Pelosin
29
0
0
06 May 2025
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Mihai Nadas
Laura Diosan
Andrei Piscoran
Andreea Tomescu
VGen
50
0
0
29 Apr 2025
HyPerAlign: Hypotheses-driven Personalized Alignment
HyPerAlign: Hypotheses-driven Personalized Alignment
Cristina Garbacea
Chenhao Tan
39
0
0
29 Apr 2025
Improving LLM Personas via Rationalization with Psychological Scaffolds
Improving LLM Personas via Rationalization with Psychological Scaffolds
Brihi Joshi
Xiang Ren
Swabha Swayamdipta
Rik Koncel-Kedziorski
Tim Paek
68
0
0
25 Apr 2025
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Yuxin Jiang
Y. Wang
Chuhan Wu
Xinyi Dai
Yan Xu
...
Y. Wang
Xin Jiang
Lifeng Shang
R. Tang
W. Wang
17
0
0
22 Apr 2025
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
Bowen Jiang
Zhuoqun Hao
Y. Cho
B. Li
Yuan Yuan
Sihao Chen
Lyle Ungar
Camillo J. Taylor
Dan Roth
21
0
0
19 Apr 2025
MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation
MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation
Haris Riaz
Sourav Sanjukta Bhabesh
Vinayak Arannil
Miguel Ballesteros
Graham Horwood
SyDa
33
0
0
17 Apr 2025
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter
Shrimai Prabhumoye
Matvei Novikov
Seungju Han
Ying Lin
...
Eric Nyberg
Yejin Choi
M. Patwary
M. Shoeybi
Bryan Catanzaro
ReLM
OffRL
LRM
44
0
1
15 Apr 2025
OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution
OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution
Lucio La Cava
Andrea Tagarelli
DeLMO
VLM
46
0
0
15 Apr 2025
SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders
SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders
Ashmi Banerjee
Adithi Satish
Fitri Nur Aisyah
Wolfgang Wörndl
Yashar Deldjoo
AI4TS
26
0
0
12 Apr 2025
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
Akshara Prabhakar
Z. Liu
Weiran Yao
Jianguo Zhang
Ming Zhu
...
Juan Carlos Niebles
Shelby Heinecke
H. Wang
S.
Caiming Xiong
VGen
64
1
0
04 Apr 2025
We Need Improved Data Curation and Attribution in AI for Scientific Discovery
We Need Improved Data Curation and Attribution in AI for Scientific Discovery
Mara Graziani
Antonio Foncubierta
Dimitrios Christofidellis
Irina Espejo Morales
Malina Molnar
Marvin Alberts
Matteo Manica
Jannis Born
29
0
0
03 Apr 2025
Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use
Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use
Nicholas Roth
Christopher Hidey
Lucas Spangher
William Arnold
Chang Ye
Nick Masiewicki
Jinoo Baek
Peter Grabowski
Eugene Ie
LLMAG
40
0
0
29 Mar 2025
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
Vivek Iyer
Ricardo Rei
Pinzhen Chen
Alexandra Birch
SyDa
LM&MA
57
0
0
29 Mar 2025
Generating Synthetic Oracle Datasets to Analyze Noise Impact: A Study on Building Function Classification Using Tweets
Generating Synthetic Oracle Datasets to Analyze Noise Impact: A Study on Building Function Classification Using Tweets
Shanshan Bai
Anna Kruspe
X. X. Zhu
41
0
0
28 Mar 2025
Scaling Laws of Synthetic Data for Language Models
Scaling Laws of Synthetic Data for Language Models
Zeyu Qin
Qingxiu Dong
Xingxing Zhang
Li Dong
Xiaolong Huang
...
Hany Awadalla
Yi R. Fung
Weizhu Chen
Minhao Cheng
Furu Wei
SyDa
63
1
0
25 Mar 2025
A Survey of Large Language Model Agents for Question Answering
A Survey of Large Language Model Agents for Question Answering
Murong Yue
LLMAG
LM&MA
ELM
50
0
0
24 Mar 2025
ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach
ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach
Reem Gody
Mahmoud Goudy
Ahmed Tawfik
SyDa
44
0
0
21 Mar 2025
LLM Generated Persona is a Promise with a Catch
LLM Generated Persona is a Promise with a Catch
Ang Li
Haozhe Chen
Hongseok Namkoong
Tianyi Peng
40
1
0
18 Mar 2025
Not All Personas Are Worth It: Culture-Reflective Persona Data Augmentation
Not All Personas Are Worth It: Culture-Reflective Persona Data Augmentation
Ji-Eun Han
Yoonseok Heo
37
0
0
17 Mar 2025
MetaScale: Test-Time Scaling with Evolving Meta-Thoughts
MetaScale: Test-Time Scaling with Evolving Meta-Thoughts
Qin Liu
Wenxuan Zhou
Nan Xu
James Y. Huang
Fei-Yue Wang
Sheng Zhang
Hoifung Poon
M. Chen
LLMAG
ReLM
AI4Cl
LRM
75
1
0
17 Mar 2025
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
Jiayu Jiang
Changxing Ding
Wentao Tan
Junhong Wang
Jin Tao
Xiangmin Xu
45
1
0
13 Mar 2025
Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations
Ishani Mondal
Jack W. Stokes
S. Jauhar
Longqi Yang
Mengting Wan
Xiaofeng Xu
Xia Song
Jennifer Neville
36
0
0
11 Mar 2025
Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning
Xintong Li
Jalend Bantupalli
Ria Dharmani
Yuwei Zhang
Jingbo Shang
38
1
0
10 Mar 2025
VizTrust: A Visual Analytics Tool for Capturing User Trust Dynamics in Human-AI Communication
Xin Wang
Stephanie Tulk Jesso
Sadamori Kojaku
David M Neyens
Min Sun Kim
34
0
0
10 Mar 2025
Language Model Personalization via Reward Factorization
Idan Shenfeld
Felix Faltings
Pulkit Agrawal
Aldo Pacchiano
36
1
0
08 Mar 2025
TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records
Hejie Cui
Alyssa Unell
Bowen Chen
Jason Alan Fries
Emily Alsentzer
Sanmi Koyejo
N. Shah
65
0
0
06 Mar 2025
MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems
Rui Ye
Shuo Tang
Rui Ge
Yaxin Du
Zhenfei Yin
S. Chen
Jing Shao
LLMAG
74
1
0
05 Mar 2025
Adaptively evaluating models with task elicitation
Davis Brown
Prithvi Balehannina
Helen Jin
Shreya Havaldar
Hamed Hassani
Eric Wong
ALM
ELM
75
0
0
03 Mar 2025
PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data
PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data
Juntao Tan
Liangwei Yang
Zuxin Liu
Z. Liu
Rithesh Murthy
...
Shirley Kokane
S.
H. Wang
Caiming Xiong
Shelby Heinecke
SyDa
28
1
0
28 Feb 2025
LLM as a Broken Telephone: Iterative Generation Distorts Information
LLM as a Broken Telephone: Iterative Generation Distorts Information
Amr Mohamed
Mingmeng Geng
Michalis Vazirgiannis
Guokan Shang
47
1
0
27 Feb 2025
FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response
FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response
Mollie Shichman
C. Bonial
Austin Blodgett
Taylor Hudson
Francis Ferraro
Rachel Rudinger
SyDa
56
0
0
25 Feb 2025
PersonaMath: Boosting Mathematical Reasoning via Persona-Driven Data Augmentation
PersonaMath: Boosting Mathematical Reasoning via Persona-Driven Data Augmentation
Jing Luo
Longze Chen
Run Luo
Liang Zhu
Chang Ao
...
A. Argha
Hamid Alinejad-Rokny
Chengming Li
Shiwen Ni
Min Yang
SyDa
AIMat
75
0
0
24 Feb 2025
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
Jiaxi Li
Xingxing Zhang
Xun Wang
Xiaolong Huang
Li Dong
Liang Wang
Si-Qing Chen
Wei Lu
Furu Wei
SyDa
45
0
0
23 Feb 2025
Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases
Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases
Rena Gao
Xuetong Wu
Tatsuki Kuribayashi
Mingrui Ye
Siya Qi
Carsten Roever
Y. Liu
Zheng Yuan
Jey Han Lau
72
0
0
21 Feb 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Y. Yang
Ajay Patel
Matt Deitke
Tanmay Gupta
Luca Weihs
...
Mark Yatskar
Chris Callison-Burch
Ranjay Krishna
Aniruddha Kembhavi
Christopher Clark
SyDa
54
1
0
21 Feb 2025
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Shuo Tang
Xianghe Pang
Zexi Liu
Bohan Tang
Rui Ye
Xiaowen Dong
Y. Wang
Yanfeng Wang
S. Chen
SyDa
LLMAG
98
3
0
21 Feb 2025
Preference Optimization for Reasoning with Pseudo Feedback
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao
Geyang Guo
Xingxing Zhang
Nancy F. Chen
Shafiq R. Joty
Furu Wei
LRM
87
8
0
17 Feb 2025
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
Bryan L. M. de Oliveira
Luana G. B. Martins
Bruno Brandão
L. Melo
ELM
64
1
0
17 Feb 2025
Designing Role Vectors to Improve LLM Inference Behaviour
Designing Role Vectors to Improve LLM Inference Behaviour
Daniele Potertì
Andrea Seveso
Fabio Mercorio
LLMSV
35
0
0
17 Feb 2025
From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations
From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations
Shenghan Wu
Yang Deng
Yimo Zhu
W. Hsu
M. Lee
46
2
0
17 Feb 2025
FinMTEB: Finance Massive Text Embedding Benchmark
FinMTEB: Finance Massive Text Embedding Benchmark
Yixuan Tang
Yi Yang
AIFin
37
0
0
16 Feb 2025
LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient
LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient
Peiwen Yuan
Shaoxiong Feng
Yiwei Li
X. U. Wang
Y. Zhang
Jiayi Shi
Chuyi Tan
Boyuan Pan
Yao Hu
Kan Li
59
2
0
02 Feb 2025
OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas
Xiaoyang Wang
H. Zhang
Tao Ge
W. Yu
Dian Yu
Dong Yu
AI4CE
26
2
0
28 Jan 2025
Aligning Instruction Tuning with Pre-training
Aligning Instruction Tuning with Pre-training
Yiming Liang
Tianyu Zheng
Xinrun Du
Ge Zhang
J. Liu
...
Zhaoxiang Zhang
Wenhao Huang
Jiajun Zhang
Xiang Yue
Jiajun Zhang
66
1
0
16 Jan 2025
Map2Text: New Content Generation from Low-Dimensional Visualizations
Map2Text: New Content Generation from Low-Dimensional Visualizations
Xingjian Zhang
Ziyang Xiong
Shixuan Liu
Yutong Xie
Tolga Ergen
Dongsub Shim
Hua Xu
Honglak Lee
Qiaozhu Me
33
0
0
24 Dec 2024
MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification
MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification
Saptarshi Sengupta
Kristal Curtis
Akshay Mallipeddi
Abhinav Mathur
Joseph Ross
Liang Gou
Liang Gou
LLMAG
SyDa
86
1
0
28 Nov 2024
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for
  Quantized LLMs with 100T Training Tokens
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
Xu Ouyang
Tao Ge
Thomas Hartvigsen
Zhisong Zhang
Haitao Mi
Dong Yu
MQ
80
1
0
26 Nov 2024
Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
Yutao Hou
Yajing Luo
Zhiwen Ruan
H. Wang
Weifeng Ge
Y. Chen
Guanhua Chen
ELM
34
0
0
15 Nov 2024
Distinguishing Ignorance from Error in LLM Hallucinations
Distinguishing Ignorance from Error in LLM Hallucinations
Adi Simhi
Jonathan Herzig
Idan Szpektor
Yonatan Belinkov
HILM
46
2
0
29 Oct 2024
12
Next