ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.15895
  4. Cited By
Large Language Model as Attributed Training Data Generator: A Tale of
  Diversity and Bias

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias

28 June 2023
Yue Yu
Yuchen Zhuang
Jieyu Zhang
Yu Meng
Alexander Ratner
Ranjay Krishna
Jiaming Shen
Chao Zhang
    ALM
ArXivPDFHTML

Papers citing "Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias"

34 / 34 papers shown
Title
CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation
CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation
Viacheslav Vasilev
V. Arkhipkin
Julia Agafonova
Tatiana Nikulina
Evelina Mironova
Alisa Shichanina
Nikolai Gerasimenko
Mikhail Shoytov
Denis Dimitrov
43
0
0
07 May 2025
Synthline: A Product Line Approach for Synthetic Requirements Engineering Data Generation using Large Language Models
Synthline: A Product Line Approach for Synthetic Requirements Engineering Data Generation using Large Language Models
Abdelkarim El-Hajjami
Camille Salinesi
SyDa
32
0
0
06 May 2025
An Illusion of Progress? Assessing the Current State of Web Agents
An Illusion of Progress? Assessing the Current State of Web Agents
Tianci Xue
Weijian Qi
Tianneng Shi
Chan Hee Song
Boyu Gou
D. Song
Huan Sun
Yu Su
LLMAG
ELM
92
4
1
02 Apr 2025
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding
Zhiheng Xi
Wei He
Zhuoyuan Li
Yitao Zhai
Xiaowei Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
59
3
0
24 Feb 2025
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data
Shenglai Zeng
Jiankun Zhang
Pengfei He
J. Ren
Tianqi Zheng
Hanqing Lu
Han Xu
Hui Liu
Yue Xing
Jiliang Tang
132
9
0
21 Feb 2025
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity
Dylan Zhang
Justin Wang
Tianran Sun
36
0
0
17 Feb 2025
Measuring Diversity in Synthetic Datasets
Measuring Diversity in Synthetic Datasets
Yuchang Zhu
Huizhe Zhang
Bingzhe Wu
Jintang Li
Zibin Zheng
Peilin Zhao
Liang Chen
Yatao Bian
95
0
0
12 Feb 2025
Few-shot LLM Synthetic Data with Distribution Matching
Few-shot LLM Synthetic Data with Distribution Matching
Jiyuan Ren
Zhaocheng Du
Zhihao Wen
Qinglin Jia
Sunhao Dai
Chuhan Wu
Zhenhua Dong
SyDa
75
0
0
09 Feb 2025
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Ran Xu
Hejie Cui
Yue Yu
Xuan Kan
Wenqi Shi
Yuchen Zhuang
Wei Jin
Joyce C. Ho
Carl Yang
64
12
0
28 Jan 2025
Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop
Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop
Ekaterina Artemova
Akim Tsvigun
Dominik Schlechtweg
Natalia Fedorova
Konstantin Chernyshev
Sergei Tilga
Boris Obmoroshev
SyDa
VLM
95
0
0
28 Jan 2025
Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals
Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals
Qingyang Wu
Ying Xu
Tingsong Xiao
Yunze Xiao
Yitong Li
...
Yichi Zhang
Shanghai Zhong
Yuwei Zhang
Wei Lu
Yifan Yang
73
1
0
17 Jan 2025
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
Huawen Feng
Pu Zhao
Qingfeng Sun
Can Xu
Fangkai Yang
...
Qianli Ma
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
AAML
ALM
62
0
0
23 Dec 2024
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
37
0
0
13 Nov 2024
Montessori-Instruct: Generate Influential Training Data Tailored for
  Student Learning
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Xiaochuan Li
Zichun Yu
Chenyan Xiong
SyDa
27
1
0
18 Oct 2024
StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
Ajay Patel
Jiacheng Zhu
Justin Qiu
Zachary Horvitz
Marianna Apidianaki
Kathleen McKeown
Chris Callison-Burch
63
3
0
16 Oct 2024
Mitigating Propensity Bias of Large Language Models for Recommender Systems
Mitigating Propensity Bias of Large Language Models for Recommender Systems
Guixian Zhang
Guan Yuan
Debo Cheng
Lin Liu
Jiuyong Li
Shichao Zhang
39
2
0
30 Sep 2024
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information
Zheng Hui
Zhaoxiao Guo
Hang Zhao
Juanyong Duan
Congrui Huang
25
6
0
23 Sep 2024
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic
  CheckLists
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists
Raoyuan Zhao
Abdullatif Köksal
Yihong Liu
Leonie Weissweiler
Anna Korhonen
Hinrich Schütze
SyDa
33
1
0
30 Aug 2024
Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation
Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation
Jiaming Shen
Ran Xu
Yennie Jun
Zhen Qin
Tianqi Liu
Carl Yang
Yi Liang
Simon Baumgartner
Michael Bendersky
SyDa
55
4
0
22 Jul 2024
Leveraging Large Language Models for Integrated
  Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Shumaila Javaid
R. A. Khalil
Nasir Saeed
Bin He
Mohamed-Slim Alouini
32
9
0
05 Jul 2024
When Search Engine Services meet Large Language Models: Visions and
  Challenges
When Search Engine Services meet Large Language Models: Visions and Challenges
Haoyi Xiong
Jiang Bian
Yuchen Li
Xuhong Li
Mengnan Du
Shuaiqiang Wang
Dawei Yin
Sumi Helal
43
28
0
28 Jun 2024
1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge
  Aggregators?
1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?
Yue Huang
Chenrui Fan
Yuan Li
Siyuan Wu
Tianyi Zhou
Xiangliang Zhang
Lichao Sun
53
3
0
20 Jun 2024
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction
  Tuning
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning
Jiaqi Li
Yixuan Tang
Yi Yang
34
5
0
14 Jun 2024
TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision
TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision
Yunyi Zhang
Ruozhen Yang
Xueqiang Xu
Rui Li
Jinfeng Xiao
Jiaming Shen
Jiawei Han
35
10
0
29 Feb 2024
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
Wenlong Deng
Blair Chen
Beidi Zhao
Chiyu Zhang
Xiaoxiao Li
Christos Thrampoulidis
31
0
0
22 Feb 2024
LLM360: Towards Fully Transparent Open-Source LLMs
LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu
Aurick Qiao
W. Neiswanger
Hongyi Wang
Bowen Tan
...
Zhiting Hu
Mark Schulze
Preslav Nakov
Timothy Baldwin
Eric P. Xing
38
68
0
11 Dec 2023
When does In-context Learning Fall Short and Why? A Study on
  Specification-Heavy Tasks
When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks
Hao Peng
Xiaozhi Wang
Jianhui Chen
Weikai Li
Y. Qi
...
Zhili Wu
Kaisheng Zeng
Bin Xu
Lei Hou
Juanzi Li
24
27
0
15 Nov 2023
Bias Testing and Mitigation in LLM-based Code Generation
Bias Testing and Mitigation in LLM-based Code Generation
Dong Huang
Qingwen Bu
Jie M. Zhang
Xiaofei Xie
Junjie Chen
Heming Cui
33
20
0
03 Sep 2023
Instruction Tuning with GPT-4
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
157
579
0
06 Apr 2023
Mixture of Soft Prompts for Controllable Data Generation
Mixture of Soft Prompts for Controllable Data Generation
Derek Chen
Celine Lee
Yunan Lu
Domenic Rosati
Zhou Yu
109
22
0
02 Mar 2023
ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback
ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback
Jiacheng Ye
Jiahui Gao
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
VLM
71
69
0
22 Oct 2022
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
Jiahui Gao
Renjie Pi
Yong Lin
Hang Xu
Jiacheng Ye
Zhiyong Wu
Weizhong Zhang
Xiaodan Liang
Zhenguo Li
Lingpeng Kong
SyDa
VLM
57
45
0
25 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
RAFT: A Real-World Few-Shot Text Classification Benchmark
RAFT: A Real-World Few-Shot Text Classification Benchmark
Neel Alex
Eli Lifland
Lewis Tunstall
A. Thakur
Pegah Maham
...
Carolyn Ashurst
Paul Sedille
A. Carlier
M. Noetel
Andreas Stuhlmuller
RALM
173
56
0
28 Sep 2021
1