ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.07540
  4. Cited By
Generating Datasets with Pretrained Language Models

Generating Datasets with Pretrained Language Models

15 April 2021
Timo Schick
Hinrich Schütze
ArXivPDFHTML

Papers citing "Generating Datasets with Pretrained Language Models"

46 / 46 papers shown
Title
Bringing legal knowledge to the public by constructing a legal question bank using large-scale pre-trained language model
Bringing legal knowledge to the public by constructing a legal question bank using large-scale pre-trained language model
Mingruo Yuan
Ben Kao
Tien-Hsuan Wu
Michael M. K. Cheung
Henry W. H. Chan
Anne S. Y. Cheung
Felix W. H. Chan
Yongxi Chen
AILaw
ELM
118
3
0
07 May 2025
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
37
0
0
13 Nov 2024
Self-calibration for Language Model Quantization and Pruning
Self-calibration for Language Model Quantization and Pruning
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
117
0
0
22 Oct 2024
Do Audio-Language Models Understand Linguistic Variations?
Do Audio-Language Models Understand Linguistic Variations?
Ramaneswaran Selvakumar
Sonal Kumar
Hemant Kumar Giri
Nishit Anand
Ashish Seth
Sreyan Ghosh
Dinesh Manocha
AuLLM
VLM
47
1
0
21 Oct 2024
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design
Artem Snegirev
Maria Tikhonova
Anna Maksimova
Alena Fenogenova
Alexander Abramov
26
4
0
22 Aug 2024
GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual
  Integrity Theory
GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory
Wei Fan
Haoran Li
Zheye Deng
Weiqi Wang
Yangqiu Song
AILaw
33
8
0
17 Jun 2024
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
Patrick Emami
Zhaonan Li
Saumya Sinha
Truc Nguyen
48
1
0
30 May 2024
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Boshi Wang
Hao Fang
Jason Eisner
Benjamin Van Durme
Yu-Chuan Su
CLL
27
7
0
07 Mar 2024
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
Wenlong Deng
Blair Chen
Beidi Zhao
Chiyu Zhang
Xiaoxiao Li
Christos Thrampoulidis
33
0
0
22 Feb 2024
GIRT-Model: Automated Generation of Issue Report Templates
GIRT-Model: Automated Generation of Issue Report Templates
Nafiseh Nikeghbal
Amir Hossein Kargaran
Abbas Heydarnoori
20
2
0
04 Feb 2024
Faithful Persona-based Conversational Dataset Generation with Large
  Language Models
Faithful Persona-based Conversational Dataset Generation with Large Language Models
Pegah Jandaghi
XiangHai Sheng
Xinyi Bai
Jay Pujara
Hakim Sidahmed
29
21
0
15 Dec 2023
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
Giovanni Monea
Maxime Peyrard
Martin Josifoski
Vishrav Chaudhary
Jason Eisner
Emre Kiciman
Hamid Palangi
Barun Patra
Robert West
KELM
51
12
0
04 Dec 2023
Can LLMs Augment Low-Resource Reading Comprehension Datasets?
  Opportunities and Challenges
Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges
Vinay Samuel
Houda Aynaou
Arijit Ghosh Chowdhury
Karthik Venkat Ramanan
Aman Chadha
SyDa
25
7
0
21 Sep 2023
Collective Human Opinions in Semantic Textual Similarity
Collective Human Opinions in Semantic Textual Similarity
Yuxia Wang
Shimin Tao
Ning Xie
Hao-Yu Yang
Timothy Baldwin
Karin Verspoor
21
4
0
08 Aug 2023
I-WAS: a Data Augmentation Method with GPT-2 for Simile Detection
I-WAS: a Data Augmentation Method with GPT-2 for Simile Detection
Yongzhu Chang
Rongsheng Zhang
Jiashu Pu
30
1
0
08 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
34
3
0
08 Aug 2023
PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic
  Dialogue Convert Patient Dialogues to Medical Records
PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records
Viktor Schlegel
Hao Li
Yuping Wu
Anand Subramanian
Thanh-Tung Nguyen
...
Daniel Beck
Xiaojun Zeng
R. Batista-Navarro
Stefan Winkler
Goran Nenadic
LM&MA
MedIm
21
9
0
05 Jul 2023
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning
  in Goal-Oriented Dialogue Models
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
Qiang Zhang
Jason Naradowsky
Yusuke Miyao
ELM
24
32
0
29 May 2023
A Comprehensive Survey of Sentence Representations: From the BERT Epoch
  to the ChatGPT Era and Beyond
A Comprehensive Survey of Sentence Representations: From the BERT Epoch to the ChatGPT Era and Beyond
Abhinav Ramesh Kashyap
Thang-Tung Nguyen
Viktor Schlegel
Stefan Winkler
See-Kiong Ng
Soujanya Poria
AI4TS
3DV
SSL
34
6
0
22 May 2023
What happens before and after: Multi-Event Commonsense in Event
  Coreference Resolution
What happens before and after: Multi-Event Commonsense in Event Coreference Resolution
Sahithya Ravi
Christy Tanner
R. Ng
Vered Shwarz
34
16
0
20 Feb 2023
Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL
  Robustness
Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness
Shuaichen Chang
J. Wang
Mingwen Dong
Lin Pan
Henghui Zhu
...
William Yang Wang
Zhiguo Wang
Vittorio Castelli
Patrick K. L. Ng
Bing Xiang
OOD
29
34
0
21 Jan 2023
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
Leonid Boytsov
Preksha Patel
Vivek Sourabh
Riddhi Nisar
Sayan Kundu
R. Ramanathan
Eric Nyberg
21
19
0
08 Jan 2023
Geographic and Geopolitical Biases of Language Models
Geographic and Geopolitical Biases of Language Models
Fahim Faisal
Antonios Anastasopoulos
18
19
0
20 Dec 2022
Unnatural Instructions: Tuning Language Models with (Almost) No Human
  Labor
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
Or Honovich
Thomas Scialom
Omer Levy
Timo Schick
ALM
31
359
0
19 Dec 2022
SumREN: Summarizing Reported Speech about Events in News
SumREN: Summarizing Reported Speech about Events in News
R. Reddy
Heba Elfardy
Hou Pong Chan
Kevin Small
Heng Ji
24
5
0
02 Dec 2022
Tuning Language Models as Training Data Generators for
  Augmentation-Enhanced Few-Shot Learning
Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning
Yu Meng
Martin Michalski
Jiaxin Huang
Yu Zhang
Tarek F. Abdelzaher
Jiawei Han
VLM
41
46
0
06 Nov 2022
GPS: Genetic Prompt Search for Efficient Few-shot Learning
GPS: Genetic Prompt Search for Efficient Few-shot Learning
Hanwei Xu
Yujun Chen
Yulun Du
Nan Shao
Yanggang Wang
Haiyu Li
Zhilin Yang
VLM
14
28
0
31 Oct 2022
Counterfactual Data Augmentation via Perspective Transition for
  Open-Domain Dialogues
Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues
Jiao Ou
Jinchao Zhang
Yang Feng
Jie Zhou
33
13
0
30 Oct 2022
Referee: Reference-Free Sentence Summarization with Sharper
  Controllability through Symbolic Knowledge Distillation
Referee: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation
Melanie Sclar
Peter West
Sachin Kumar
Yulia Tsvetkov
Yejin Choi
18
19
0
25 Oct 2022
TestAug: A Framework for Augmenting Capability-based NLP Tests
TestAug: A Framework for Augmenting Capability-based NLP Tests
Guanqun Yang
Mirazul Haque
Qiaochu Song
Wei Yang
Xueqing Liu
ELM
26
0
0
14 Oct 2022
Annotated Dataset Creation through General Purpose Language Models for
  non-English Medical NLP
Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP
Johann Frei
Frank Kramer
21
1
0
30 Aug 2022
ShortcutLens: A Visual Analytics Approach for Exploring Shortcuts in
  Natural Language Understanding Dataset
ShortcutLens: A Visual Analytics Approach for Exploring Shortcuts in Natural Language Understanding Dataset
Zhihua Jin
Xingbo Wang
Furui Cheng
Chunhui Sun
Qun Liu
Huamin Qu
32
9
0
17 Aug 2022
Addressing Resource and Privacy Constraints in Semantic Parsing Through
  Data Augmentation
Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation
Kevin Yang
Olivia Deng
Charles C. Chen
Richard Shin
Subhro Roy
Benjamin Van Durme
35
10
0
18 May 2022
Few-shot Mining of Naturally Occurring Inputs and Outputs
Few-shot Mining of Naturally Occurring Inputs and Outputs
Mandar Joshi
Terra Blevins
M. Lewis
Daniel S. Weld
Luke Zettlemoyer
25
1
0
09 May 2022
Generating Data to Mitigate Spurious Correlations in Natural Language
  Inference Datasets
Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets
Yuxiang Wu
Matt Gardner
Pontus Stenetorp
Pradeep Dasigi
24
67
0
24 Mar 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
43
211
0
16 Feb 2022
WANLI: Worker and AI Collaboration for Natural Language Inference
  Dataset Creation
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
Alisa Liu
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
34
212
0
16 Jan 2022
Recent Advances in Natural Language Processing via Large Pre-Trained
  Language Models: A Survey
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
Bonan Min
Hayley L Ross
Elior Sulem
Amir Pouran Ben Veyseh
Thien Huu Nguyen
Oscar Sainz
Eneko Agirre
Ilana Heinz
Dan Roth
LM&MA
VLM
AI4CE
69
1,029
0
01 Nov 2021
Unsupervised Neural Machine Translation with Generative Language Models
  Only
Unsupervised Neural Machine Translation with Generative Language Models Only
Jesse Michael Han
Igor Babuschkin
Harrison Edwards
Arvind Neelakantan
Tao Xu
...
Alex Ray
Pranav Shyam
Aditya A. Ramesh
Alec Radford
Ilya Sutskever
42
36
0
11 Oct 2021
What Changes Can Large-scale Language Models Bring? Intensive Study on
  HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Boseop Kim
Hyoungseok Kim
Sang-Woo Lee
Gichang Lee
Donghyun Kwak
...
Jaewook Kang
Inho Kang
Jung-Woo Ha
W. Park
Nako Sung
VLM
241
121
0
10 Sep 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
  in Natural Language Processing
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLM
SyDa
23
3,828
0
28 Jul 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
  Bias in NLP
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
259
374
0
28 Feb 2021
Making Pre-trained Language Models Better Few-shot Learners
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
241
1,918
0
31 Dec 2020
Data Augmentation using Pre-trained Transformer Models
Data Augmentation using Pre-trained Transformer Models
Varun Kumar
Ashutosh Choudhary
Eunah Cho
VLM
209
347
0
04 Mar 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural
  Language Inference
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,587
0
21 Jan 2020
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
228
31,253
0
16 Jan 2013
1