Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.04538
Cited By
v1
v2 (latest)
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
Neural Information Processing Systems (NeurIPS), 2022
9 February 2022
Yu Meng
Jiaxin Huang
Yu Zhang
Jiawei Han
SyDa
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Generating Training Data with Language Models: Towards Zero-Shot Language Understanding"
50 / 175 papers shown
An Interpretability-Guided Framework for Responsible Synthetic Data Generation in Emotional Text
Paula Joy B. Martinez
Jose Marie Antonio Miñoza
Sebastian C. Ibañez
142
0
0
20 Nov 2025
State of the Art in Text Classification for South Slavic Languages: Fine-Tuning or Prompting?
Taja Kuzman Pungeršek
Peter Rupnik
Ivan Porupski
Vuk Dinić
Nikola Ljubesic
104
0
0
11 Nov 2025
Who Is the Story About? Protagonist Entity Recognition in News
Jorge Gabín
M. E. Ares
Javier Parapar
251
0
0
10 Nov 2025
Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs
Guiyao Tie
Zenghui Yuan
Zeli Zhao
Chaoran Hu
Tianhe Gu
...
Ming Jin
Qingsong Wen
Lixing Chen
P. Zhou
Lichao Sun
KELM
ReLM
LRM
257
1
0
17 Oct 2025
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
Zi Liang
Qingqing Ye
Xuan Liu
Yanyun Wang
Jianliang Xu
Haibo Hu
217
1
0
27 Sep 2025
Evaluating LLMs Without Oracle Feedback: Agentic Annotation Evaluation Through Unsupervised Consistency Signals
Cheng Chen
Haiyan Yin
Ivor Tsang
148
1
0
10 Sep 2025
M-BRe: Discovering Training Samples for Relation Extraction from Unlabeled Texts with Large Language Models
Zexuan Li
Hongliang Dai
Piji Li
131
0
0
09 Sep 2025
Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework
Ilias Driouich
Hongliu Cao
Eoin Thomas
88
1
0
26 Aug 2025
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
Ping Yu
Jack Lanchantin
Tianlu Wang
Weizhe Yuan
O. Yu. Golovneva
I. Kulikov
Sainbayar Sukhbaatar
Jason Weston
Jing Xu
SyDa
ReLM
LRM
284
12
0
31 Jul 2025
EvolveSearch: An Iterative Self-Evolving Search Agent
Dingchu Zhang
Yida Zhao
Jialong Wu
Baixuan Li
Wenbiao Yin
...
Yong Jiang
Yufeng Li
Kewei Tu
Pengjun Xie
Fei Huang
LLMAG
KELM
244
24
0
28 May 2025
EAVIT: Efficient and Accurate Human Value Identification from Text data via LLMs
International Joint Conference on Artificial Intelligence (IJCAI), 2025
Wenhao Zhu
Yuhang Xie
Guojie Song
Xin Zhang
235
1
0
19 May 2025
AndroidGen: Building an Android Language Agent under Data Scarcity
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hanyu Lai
Junjie Gao
Xiao-Yang Liu
Zifei Shan
Shanghang Zhang
Yuxiao Dong
Jie Tang
LLMAG
320
5
0
27 Apr 2025
Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks
Yang Liu
Bingjie Yan
Tianyuan Zou
Jianqing Zhang
Zixuan Gu
...
Jiajian Li
Xiaozhou Ye
Ye Ouyang
Qiang Yang
Yanzhe Zhang
ALM
1.0K
3
0
24 Apr 2025
Synthetic Data Augmentation for Cross-domain Implicit Discourse Relation Recognition
Frances Yung
Varsha Suresh
Zaynab Reza
Mansoor Ahmad
Vera Demberg
363
1
0
26 Mar 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Computer Vision and Pattern Recognition (CVPR), 2025
Haoxin Li
Boyang Li
CoGe
684
4
0
03 Mar 2025
Talking to the brain: Using Large Language Models as Proxies to Model Brain Semantic Representation
Xin Liu
Zheng Zhang
Jingxin Nie
243
2
0
26 Feb 2025
Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yen-Ju Lu
Ting-Yao Hu
H. Koppula
Hadi Pouransari
Jen-Hao Rick Chang
...
Xiang Kong
Qi Zhu
Simon Wang
Oncel Tuzel
Raviteja Vemulapalli
250
4
0
24 Feb 2025
Synthetic Text Generation for Training Large Language Models via Gradient Matching
Dang Nguyen
Zeman Li
M. Bateni
Vahab Mirrokni
Meisam Razaviyayn
Baharan Mirzasoleiman
406
5
0
24 Feb 2025
BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training
Nikitas Theodoropoulos
Giorgos Filandrianos
Vassilis Lyberatos
Maria Lymperaiou
Giorgos Stamou
SyDa
446
3
0
24 Feb 2025
SNaRe: Domain-aware Data Generation for Low-Resource Event Detection
Tanmay Parekh
Yuxuan Dong
Lucas Bandarkar
Artin Kim
I-Hung Hsu
Kai-Wei Chang
Nanyun Peng
379
0
0
24 Feb 2025
Synthetic vs. Gold: The Role of LLM Generated Labels and Data in Cyberbullying Detection
Arefeh Kazemi
Sri Balaaji Natarajan Kalaivendan
Joachim Wagner
Hamza Qadeer
Kanishk Verma
Brian Davis
641
4
0
21 Feb 2025
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data
Shenglai Zeng
Jiankun Zhang
Pengfei He
J. Ren
Tianqi Zheng
Hanqing Lu
Han Xu
Hui Liu
Yue Xing
Shucheng Zhou
419
24
0
21 Feb 2025
A Survey of Text Classification Under Class Distribution Shift
Adriana Valentina Costache
Silviu Florin Gheorghe
Eduard Poesina
Paul Irofti
Radu Tudor Ionescu
OOD
VLM
313
1
0
18 Feb 2025
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Ran Xu
Hejie Cui
Yue Yu
Xuan Kan
Wenqi Shi
Yuchen Zhuang
Wei Jin
Joyce C. Ho
Carl Yang
387
33
0
28 Jan 2025
Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration
Thomas Walshe
S. Moon
Chunyang Xiao
Yawwani Gunawardana
Fran Silavong
256
5
0
21 Jan 2025
Bridging the Fairness Gap: Enhancing Pre-trained Models with LLM-Generated Sentences
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Liu Yu
Ludie Guo
Ping Kuang
Fan Zhou
252
4
0
12 Jan 2025
JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM
Pacific Asia Conference on Language, Information and Computation (PACLIC), 2024
Takuro Fujii
Satoru Katsumata
202
0
0
09 Dec 2024
Curriculum-style Data Augmentation for LLM-based Metaphor Detection
Kaidi Jia
Yanxia Wu
Rongsheng Li
Rongsheng Li
226
2
0
04 Dec 2024
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
1.3K
1
0
01 Dec 2024
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification
IEEE Access (IEEE Access), 2024
Taja Kuzman
Nikola Ljubesic
294
4
0
29 Nov 2024
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
338
0
0
13 Nov 2024
Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs
International Journal of Data Science and Analysis (JDSA), 2024
Shan Zhong
Jiahao Zeng
Yongxin Yu
Bohong Lin
349
3
0
09 Nov 2024
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates
Neural Information Processing Systems (NeurIPS), 2024
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
251
7
0
28 Oct 2024
Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification
International Conference on Learning Representations (ICLR), 2024
Hsun-Yu Kuo
Yin-Hsiang Liao
Yu-Chieh Chao
Wei-Yun Ma
Pu-Jen Cheng
SyDa
329
6
0
28 Oct 2024
LanFL: Differentially Private Federated Learning with Large Language Models using Synthetic Samples
Huiyu Wu
Diego Klabjan
FedML
295
2
0
24 Oct 2024
Self-calibration for Language Model Quantization and Pruning
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
1.0K
2
0
22 Oct 2024
A Little Human Data Goes A Long Way
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Dhananjay Ashok
Jonathan May
SyDa
526
6
0
17 Oct 2024
Personalized Visual Instruction Tuning
International Conference on Learning Representations (ICLR), 2024
Renjie Pi
Jianshu Zhang
Tianyang Han
Jipeng Zhang
Boyao Wang
Tong Zhang
MLLM
212
13
0
09 Oct 2024
Generating Synthetic Datasets for Few-shot Prompt Tuning
Xu Guo
Zilin Du
Boyang Li
Chunyan Miao
206
2
0
08 Oct 2024
Generate then Refine: Data Augmentation for Zero-shot Intent Detection
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
I-Fan Lin
Faegheh Hasibi
Suzan Verberne
VLM
248
7
0
02 Oct 2024
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zheng Hui
Zhaoxiao Guo
Hang Zhao
Juanyong Duan
Congrui Huang
372
15
0
23 Sep 2024
ControlMath: Controllable Data Generation Promotes Math Generalist Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Polydoros Giannouris
Ning Wu
Jianhui Chang
Jia Li
268
6
0
20 Sep 2024
Enhancing SLM via ChatGPT and Dataset Augmentation
Tom Pieper
Mohamad Ballout
U. Krumnack
Gunther Heidemann
Kai-Uwe Kühnberger
247
0
0
19 Sep 2024
Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels
International Conference on Computational Linguistics (COLING), 2024
Chaoqun Liu
Qin Chao
Wenxuan Zhang
Xiaobao Wu
Boyang Albert Li
Anh Tuan Luu
Lidong Bing
190
3
0
19 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
776
55
0
10 Sep 2024
KModels: Unlocking AI for Business Applications
Roy Abitbol
Eyal Cohen
Muhammad Kanaan
Bhavna Agrawal
Yingjie Li
Anuradha Bhamidipaty
Erez Bilgory
126
0
0
08 Sep 2024
On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey
Jingcai Guo
Zhijie Rao
Zhi Chen
Song Guo
Jingren Zhou
Dacheng Tao
268
7
0
09 Aug 2024
Computer Audition: From Task-Specific Machine Learning to Foundation Models
Andreas Triantafyllopoulos
Iosif Tsangko
Alexander Gebhard
A. Mesaros
Maria Sandsten
B. Schuller
400
7
0
22 Jul 2024
Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation
Jiaming Shen
Ran Xu
Yennie Jun
Zhen Qin
Tianqi Liu
Carl Yang
Yi Liang
Simon Baumgartner
Michael Bendersky
SyDa
300
7
0
22 Jul 2024
A Survey on Natural Language Counterfactual Generation
Yongjie Wang
Xiaoqi Qiu
Yu Yue
Xu Guo
Zhiwei Zeng
Yuhong Feng
Zhiqi Shen
253
21
0
04 Jul 2024
1
2
3
4
Next