Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.07922
Cited By
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
16 February 2022
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ZeroGen: Efficient Zero-shot Learning via Dataset Generation"
50 / 166 papers shown
Title
Synthline: A Product Line Approach for Synthetic Requirements Engineering Data Generation using Large Language Models
Abdelkarim El-Hajjami
Camille Salinesi
SyDa
34
0
0
06 May 2025
Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks
Yang Janet Liu
Bingjie Yan
Tianyuan Zou
Jianqing Zhang
Zixuan Gu
...
J. Li
Xiaozhou Ye
Ye Ouyang
Qiang Yang
Y. Zhang
ALM
89
1
0
24 Apr 2025
A Survey of Large Language Models in Mental Health Disorder Detection on Social Media
Zhuohan Ge
Nicole Hu
Darian Li
Yubo Wang
Shihao Qi
Yuming Xu
Han Shi
J. Zhang
AI4MH
56
0
0
03 Apr 2025
HILGEN: Hierarchically-Informed Data Generation for Biomedical NER Using Knowledgebases and Large Language Models
Yao Ge
Yuting Guo
Sudeshna Das
Swati Rajwal
Selen Bozkurt
A. Sarker
MedIm
LM&MA
53
0
0
06 Mar 2025
Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference
Grace Proebsting
Adam Poliak
50
0
0
06 Mar 2025
Targeted Distillation for Sentiment Analysis
Yice Zhang
Guangyu Xie
Jingjie Lin
Jianzhu Bao
Qianlong Wang
Xi Zeng
Ruifeng Xu
53
0
0
05 Mar 2025
FIG: Forward-Inverse Generation for Low-Resource Domain-specific Event Detection
Tanmay Parekh
Yuxuan Dong
Lucas Bandarkar
Artin Kim
I-Hung Hsu
Kai-Wei Chang
Nanyun Peng
41
0
0
24 Feb 2025
Synthetic Text Generation for Training Large Language Models via Gradient Matching
Dang Nguyen
Zeman Li
M. Bateni
Vahab Mirrokni
Meisam Razaviyayn
Baharan Mirzasoleiman
42
0
0
24 Feb 2025
Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization
Yen-Ju Lu
Ting-Yao Hu
H. Koppula
Hadi Pouransari
Jen-Hao Rick Chang
...
Xiang Kong
Qi Zhu
Simon Wang
Oncel Tuzel
Raviteja Vemulapalli
45
0
0
24 Feb 2025
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data
Shenglai Zeng
Jiankun Zhang
Pengfei He
J. Ren
Tianqi Zheng
Hanqing Lu
Han Xu
Hui Liu
Yue Xing
Jiliang Tang
132
9
0
21 Feb 2025
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Vardaan Pahuja
Yadong Lu
Corby Rosset
Boyu Gou
Arindam Mitra
Spencer Whitehead
Yu Su
Ahmed Awadallah
LLMAG
LM&Ro
Presented at
ResearchTrend Connect | LLMAG
on
14 Mar 2025
149
3
1
20 Feb 2025
A Survey of Text Classification Under Class Distribution Shift
Adriana Valentina Costache
Silviu Florin Gheorghe
Eduard Poesina
Paul Irofti
Radu Tudor Ionescu
OOD
VLM
60
0
0
18 Feb 2025
Measuring Diversity in Synthetic Datasets
Yuchang Zhu
Huizhe Zhang
Bingzhe Wu
Jintang Li
Zibin Zheng
Peilin Zhao
Liang Chen
Yatao Bian
95
0
0
12 Feb 2025
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Ran Xu
Hejie Cui
Yue Yu
Xuan Kan
Wenqi Shi
Yuchen Zhuang
Wei Jin
Joyce C. Ho
Carl Yang
64
12
0
28 Jan 2025
"My life is miserable, have to sign 500 autographs everyday": Exposing Humblebragging, the Brags in Disguise
Sharath Naganna
Saprativa Bhattacharjee
Pushpak Bhattacharyya
Biplab Banerjee
26
0
0
31 Dec 2024
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
105
0
0
01 Dec 2024
ToxiLab: How Well Do Open-Source LLMs Generate Synthetic Toxicity Data?
Zheng Hui
Zhaoxiao Guo
Hang Zhao
Juanyong Duan
Lin Ai
Yinheng Li
Julia Hirschberg
Congrui Huang
78
1
0
18 Nov 2024
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
37
0
0
13 Nov 2024
Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale
Flavio Di Palo
Prateek Singhi
Bilal Fadlallah
23
3
0
07 Nov 2024
Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification
Hsun-Yu Kuo
Yin-Hsiang Liao
Yu-Chieh Chao
Wei-Yun Ma
Pu-Jen Cheng
SyDa
45
2
0
28 Oct 2024
LanFL: Differentially Private Federated Learning with Large Language Models using Synthetic Samples
Huiyu Wu
Diego Klabjan
FedML
33
0
0
24 Oct 2024
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration
Qintong Li
Jiahui Gao
Sheng Wang
Renjie Pi
Xueliang Zhao
Chuan Wu
Xin Jiang
Z. Li
Lingpeng Kong
SyDa
28
2
0
22 Oct 2024
"What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMs
Ran Zmigrod
Pranav Shetty
Mathieu Sibue
Zhiqiang Ma
Armineh Nourbakhsh
Xiaomo Liu
Manuela Veloso
23
0
0
20 Oct 2024
A Comprehensive Evaluation of Cognitive Biases in LLMs
Simon Malberg
Roman Poletukhin
Carolin M. Schuster
Georg Groh
ELM
32
5
0
20 Oct 2024
Hybrid Training Approaches for LLMs: Leveraging Real and Synthetic Data to Enhance Model Performance in Domain-Specific Applications
Alexey Zhezherau
Alexei Yanockin
SyDa
26
3
0
11 Oct 2024
The Effects of Hallucinations in Synthetic Training Data for Relation Extraction
Steven Rogulsky
Nicholas Popovic
Michael Färber
HILM
30
1
0
10 Oct 2024
Personalized Visual Instruction Tuning
Renjie Pi
Jianshu Zhang
Tianyang Han
Jipeng Zhang
Rui Pan
Tong Zhang
MLLM
29
6
0
09 Oct 2024
Generating Synthetic Datasets for Few-shot Prompt Tuning
Xu Guo
Zilin Du
Boyang Li
Chunyan Miao
21
1
0
08 Oct 2024
Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection
Ksheeraja Raghavan
Samiran Gode
Ankit Parag Shah
Surabhi Raghavan
Wolfram Burgard
Bhiksha Raj
Rita Singh
25
0
0
04 Oct 2024
Correlation and Navigation in the Vocabulary Key Representation Space of Language Models
Letian Peng
Chenyang An
Jingbo Shang
KELM
28
0
0
03 Oct 2024
Generate then Refine: Data Augmentation for Zero-shot Intent Detection
I-Fan Lin
Faegheh Hasibi
Suzan Verberne
VLM
20
2
0
02 Oct 2024
Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting
Siyi Liu
Yang Li
Jiang Li
Shan Yang
Yunshi Lan
LRM
19
1
0
02 Oct 2024
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information
Zheng Hui
Zhaoxiao Guo
Hang Zhao
Juanyong Duan
Congrui Huang
25
6
0
23 Sep 2024
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Yuxiao Chen
K. Li
Wentao Bao
Deep Patel
Yu Kong
Martin Renqiang Min
Dimitris N. Metaxas
DiffM
31
1
0
22 Sep 2024
Enhancing SLM via ChatGPT and Dataset Augmentation
Tom Pieper
Mohamad Ballout
U. Krumnack
Gunther Heidemann
Kai-Uwe Kühnberger
26
0
0
19 Sep 2024
Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels
Chaoqun Liu
Qin Chao
Wenxuan Zhang
Xiaobao Wu
Boyang Albert Li
Anh Tuan Luu
Lidong Bing
17
1
0
19 Sep 2024
CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration
Jiahui Gao
Renjie Pi
Tianyang Han
Han Wu
Lanqing Hong
Lingpeng Kong
Xin Jiang
Zhenguo Li
39
5
0
17 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
58
23
0
10 Sep 2024
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists
Raoyuan Zhao
Abdullatif Köksal
Yihong Liu
Leonie Weissweiler
Anna Korhonen
Hinrich Schütze
SyDa
33
1
0
30 Aug 2024
Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs
John Mendonça
Isabel Trancoso
A. Lavie
ALM
29
1
0
20 Aug 2024
RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science
David Farr
Nico Manzonelli
Iain Cruickshank
Jevin West
28
1
0
15 Aug 2024
XMainframe: A Large Language Model for Mainframe Modernization
Anh T. V. Dau
Hieu Trung Dao
Anh Tuan Nguyen
Hieu Trung Tran
Phong X. Nguyen
Nghi D. Q. Bui
27
1
0
05 Aug 2024
Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation
Jiaming Shen
Ran Xu
Yennie Jun
Zhen Qin
Tianqi Liu
Carl Yang
Yi Liang
Simon Baumgartner
Michael Bendersky
SyDa
55
4
0
22 Jul 2024
Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls
Aras Selvi
Eleonora Kreacic
Mohsen Ghassemi
Vamsi K. Potluru
T. Balch
Manuela Veloso
24
0
0
18 Jul 2024
Training Task Experts through Retrieval Based Distillation
Jiaxin Ge
Xueying Jia
Vijay Viswanathan
Hongyin Luo
Graham Neubig
34
3
0
07 Jul 2024
A Survey on Natural Language Counterfactual Generation
Yongjie Wang
Xiaoqi Qiu
Yu Yue
Xu Guo
Zhiwei Zeng
Yuhong Feng
Zhiqi Shen
31
5
0
04 Jul 2024
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts
Ruida Wang
Jipeng Zhang
Yizhen Jia
Rui Pan
Shizhe Diao
Renjie Pi
Tong Zhang
LRM
33
15
0
03 Jul 2024
The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators
Tzu-Heng Huang
Catherine Cao
Vaishnavi Bhargava
Frederic Sala
31
3
0
25 Jun 2024
USDC: A Dataset of
U
‾
\underline{U}
U
ser
S
‾
\underline{S}
S
tance and
D
‾
\underline{D}
D
ogmatism in Long
C
‾
\underline{C}
C
onversations
Mounika Marreddy
S. Oota
Venkata Charan Chinni
Manish Gupta
Lucie Flek
46
0
0
24 Jun 2024
Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization
Sungbin Shin
Wonpyo Park
Jaeho Lee
Namhoon Lee
31
1
0
21 Jun 2024
1
2
3
4
Next