Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.04538
Cited By
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
9 February 2022
Yu Meng
Jiaxin Huang
Yu Zhang
Jiawei Han
SyDa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Generating Training Data with Language Models: Towards Zero-Shot Language Understanding"
41 / 41 papers shown
Title
AndroidGen: Building an Android Language Agent under Data Scarcity
Hanyu Lai
Junjie Gao
Xiao-Yang Liu
Y. Xu
S. Zhang
Yuxiao Dong
Jie Tang
LLMAG
72
0
0
27 Apr 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li
Boyang Li
CoGe
69
0
0
03 Mar 2025
BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training
Nikitas Theodoropoulos
Giorgos Filandrianos
Vassilis Lyberatos
Maria Lymperaiou
Giorgos Stamou
SyDa
52
1
0
24 Feb 2025
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data
Shenglai Zeng
Jiankun Zhang
Pengfei He
J. Ren
Tianqi Zheng
Hanqing Lu
Han Xu
Hui Liu
Yue Xing
Jiliang Tang
135
9
0
21 Feb 2025
Synthetic vs. Gold: The Role of LLM-Generated Labels and Data in Cyberbullying Detection
Arefeh Kazemi
Sri Balaaji Natarajan Kalaivendan
Joachim Wagner
Hamza Qadeer
Brian Davis
58
1
0
21 Feb 2025
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Ran Xu
Hejie Cui
Yue Yu
Xuan Kan
Wenqi Shi
Yuchen Zhuang
Wei Jin
Joyce C. Ho
Carl Yang
64
13
0
28 Jan 2025
Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration
Thomas Walshe
S. Moon
Chunyang Xiao
Yawwani Gunawardana
Fran Silavong
37
0
0
21 Jan 2025
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
120
0
0
01 Dec 2024
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
37
0
0
13 Nov 2024
Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification
Hsun-Yu Kuo
Yin-Hsiang Liao
Yu-Chieh Chao
Wei-Yun Ma
Pu-Jen Cheng
SyDa
45
2
0
28 Oct 2024
Self-calibration for Language Model Quantization and Pruning
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
90
0
0
22 Oct 2024
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information
Zheng Hui
Zhaoxiao Guo
Hang Zhao
Juanyong Duan
Congrui Huang
25
6
0
23 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
58
23
0
10 Sep 2024
Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation
Jiaming Shen
Ran Xu
Yennie Jun
Zhen Qin
Tianqi Liu
Carl Yang
Yi Liang
Simon Baumgartner
Michael Bendersky
SyDa
55
4
0
22 Jul 2024
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee
Seungwon Lim
Seungju Han
Giyeong Oh
Hyungjoo Chae
...
Beong-woo Kwak
Yeonsoo Lee
Dongha Lee
Jinyoung Yeo
Youngjae Yu
33
8
0
20 Jun 2024
Edisum: Summarizing and Explaining Wikipedia Edits at Scale
Marija Sakota
Isaac Johnson
Guosheng Feng
Robert West
SyDa
KELM
25
2
0
04 Apr 2024
TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision
Yunyi Zhang
Ruozhen Yang
Xueqiang Xu
Rui Li
Jinfeng Xiao
Jiaming Shen
Jiawei Han
40
10
0
29 Feb 2024
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
Wenlong Deng
Blair Chen
Beidi Zhao
Chiyu Zhang
Xiaoxiao Li
Christos Thrampoulidis
31
0
0
22 Feb 2024
Large Language Models for Conducting Advanced Text Analytics Information Systems Research
Benjamin Ampel
Chi-Heng Yang
J. Hu
Hsinchun Chen
21
7
0
27 Dec 2023
Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction
Zilin Du
Haoxin Li
Xu Guo
Boyang Li
25
1
0
05 Dec 2023
Large Language Models in Education: Vision and Opportunities
Wensheng Gan
Zhenlian Qi
Jiayang Wu
Chun-Wei Lin
AI4Ed
36
69
0
22 Nov 2023
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Ruida Wang
Wangchunshu Zhou
Mrinmaya Sachan
19
32
0
20 Oct 2023
Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges
Vinay Samuel
Houda Aynaou
Arijit Ghosh Chowdhury
Karthik Venkat Ramanan
Aman Chadha
SyDa
16
7
0
21 Sep 2023
Zero-Shot Text Classification via Self-Supervised Tuning
Chaoqun Liu
Wenxuan Zhang
Guizhen Chen
Xiaobao Wu
A. Luu
Chip Hong Chang
Lidong Bing
VLM
32
11
0
19 May 2023
A Universal Discriminator for Zero-Shot Generalization
Haike Xu
Zongyu Lin
Jing Zhou
Yanan Zheng
Zhilin Yang
AI4CE
13
14
0
15 Nov 2022
Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning
Yu Meng
Martin Michalski
Jiaxin Huang
Yu Zhang
Tarek F. Abdelzaher
Jiawei Han
VLM
39
46
0
06 Nov 2022
The COVID That Wasn't: Counterfactual Journalism Using GPT
S. Hamilton
Andrew Piper
20
4
0
13 Oct 2022
Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP
Johann Frei
Frank Kramer
16
1
0
30 Aug 2022
Prototypical Calibration for Few-shot Learning of Language Models
Zhixiong Han
Y. Hao
Li Dong
Yutao Sun
Furu Wei
168
52
0
20 May 2022
Data Augmentation for Intent Classification with Off-the-shelf Large Language Models
Gaurav Sahu
Pau Rodríguez López
I. Laradji
Parmida Atighehchian
David Vazquez
Dzmitry Bahdanau
11
60
0
05 Apr 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
43
211
0
16 Feb 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
211
1,656
0
15 Oct 2021
Towards Zero-Label Language Learning
Zirui Wang
Adams Wei Yu
Orhan Firat
Yuan Cao
SyDa
180
102
0
19 Sep 2021
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Boseop Kim
Hyoungseok Kim
Sang-Woo Lee
Gichang Lee
Donghyun Kwak
...
Jaewook Kang
Inho Kang
Jung-Woo Ha
W. Park
Nako Sung
VLM
241
121
0
10 Sep 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
215
305
0
27 Apr 2021
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Qinyuan Ye
Bill Yuchen Lin
Xiang Ren
209
179
0
18 Apr 2021
COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Yu Meng
Chenyan Xiong
Payal Bajaj
Saurabh Tiwary
Paul N. Bennett
Jiawei Han
Xia Song
119
202
0
16 Feb 2021
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
241
1,916
0
31 Dec 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,586
0
21 Jan 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,583
0
18 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1