135
80

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Kaustubh D. Dhole
Varun Gangal
Sebastian Gehrmann
Aadesh Gupta
Zhenhao Li
Saad Mahamood
Abinaya Mahendiran
Simon Mille
Ashish Srivastava
Samson Tan
Tongshuang Wu
Jascha Narain Sohl-Dickstein
Jinho D. Choi
Eduard H. Hovy
Ondrej Dusek
Sebastian Ruder
Sajant Anand
Nagender Aneja
Rabin Banjade
Lisa Barthe
Hanna Behnke
Ian Berlot-Attwell
Connor Boyle
Caroline Brun
Marco Antonio Sobrevilla Cabezudo
Samuel Cahyawijaya
E. Chapuis
Wanxiang Che
Mukund Choudhary
C. Clauss
Pierre Colombo
Filip Cornell
Gautier Dagan
M. Das
Tanay Dixit
Thomas Dopierre
Paul-Alexis Dray
Suchitra Dubey
Tatiana Ekeinhor
Marco Di Giovanni
Tanya Goyal
Rishabh Gupta
Rishabh Gupta
Louanes Hamla
Sanghyun Han
Fabrice Harel-Canada
A. Honoré
Ishan Jindal
Przemyslaw K. Joniak
Denis Kleyko
Venelin Kovatchev
Kalpesh Krishna
Ashutosh Kumar
Stefan Langer
S. Lee
Corey J. Levinson
Hualou Liang
Kaizhao Liang
Zhexiong Liu
Andrey Lukyanenko
Vukosi Marivate
Gerard de Melo
Simon Meoni
Maxime Meyer
Afnan Mir
N. Moosavi
Niklas Muennighoff
Timothy Sum Hon Mun
Kenton W. Murray
Marcin Namysl
Maria Obedkova
Priti Oli
Nivranshu Pasricha
Jan Pfister
Richard Plant
Vinay Uday Prabhu
V. Pais
Libo Qin
Shahab Raji
P. Rajpoot
Vikas Raunak
Roy Rinberg
N. Roberts
Juan Diego Rodriguez
Claude Roux
S. VasconcellosP.H.
Ananya B. Sai
Robin M. Schmidt
Thomas Scialom
T. Sefara
Saqib Nizam Shamsi
Xudong Shen
Haoyue Shi
Y. Shi
Anna Shvets
Nick Siegel
Damien Sileo
Jamie Simon
Chandan Singh
Roman Sitelew
P. Soni
Taylor Sorensen
William Soto
Aman Srivastava
KV Aditya Srivatsa
Tony Sun
T. MukundVarma
A. Tabassum
Fiona Anting Tan
Ryan Teehan
Monalisa Tiwari
M. Tolkiehn
Athena Wang
Zijian Wang
Gloria Xinyue Wang
Zijie J. Wang
Fuxuan Wei
Bryan Wilie
Genta Indra Winata
Xinyi Wu
Witold Wydmański
Tianbao Xie
Usama Yaseen
Michael A. Yee
Jing Zhang
Yue Zhang
Abstract

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter).

View on arXiv
Comments on this paper