Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

22 September 2020

Yejin Choi

Papers citing "Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics"

36 / 86 papers shown

Title
TiDAL: Learning Training Dynamics for Active Learning Seong Min Kye Kwanghee Choi Hyeongmin Byun Buru Chang 21 13 0 13 Oct 2022
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation Tanay Dixit Bhargavi Paranjape Hannaneh Hajishirzi Luke Zettlemoyer SyDa 135 23 0 10 Oct 2022
PROD: Progressive Distillation for Dense Retrieval Zhenghao Lin Yeyun Gong Xiao Liu Hang Zhang Chen Lin ... Jian Jiao Jing Lu Daxin Jiang Rangan Majumder Nan Duan 17 27 0 27 Sep 2022
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics Shoaib Ahmed Siddiqui Nitarshan Rajkumar Tegan Maharaj David M. Krueger Sara Hooker 30 27 0 20 Sep 2022
Efficient Methods for Natural Language Processing: A Survey Marcos Vinícius Treviso Ji-Ung Lee Tianchu Ji Betty van Aken Qingqing Cao ... Emma Strubell Niranjan Balasubramanian Leon Derczynski Iryna Gurevych Roy Schwartz 28 109 0 31 Aug 2022
The Value of Out-of-Distribution Data Ashwin De Silva Rahul Ramesh Carey E. Priebe Pratik Chaudhari Joshua T. Vogelstein OODD 14 11 0 23 Aug 2022
Evaluating and Crafting Datasets Effective for Deep Learning With Data Maps Jay Bishnu Andrew Gondoputro 11 1 0 22 Aug 2022
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs Jiarui Zhang Filip Ilievski Kaixin Ma Jonathan M Francis A. Oltramari SSL 16 5 0 21 May 2022
ALLSH: Active Learning Guided by Local Sensitivity and Hardness Shujian Zhang Chengyue Gong Xingchao Liu Pengcheng He Weizhu Chen Mingyuan Zhou 25 26 0 10 May 2022
A Data Cartography based MixUp for Pre-trained Language Models Seohong Park Cornelia Caragea 11 6 0 06 May 2022
Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees Jonathan Brophy Zayd Hammoudeh Daniel Lowd TDI 8 22 0 30 Apr 2022
On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations Roy Schwartz Gabriel Stanovsky 22 24 0 27 Apr 2022
Adaptor: Objective-Centric Adaptation Framework for Language Models Michal vStefánik Vít Novotný Nikola Groverová Petr Sojka 20 10 0 08 Mar 2022
MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts Weixin Liang James Y. Zou OOD 35 81 0 14 Feb 2022
FORML: Learning to Reweight Data for Fairness Bobby Yan Skyler Seto N. Apostoloff FaML 8 11 0 03 Feb 2022
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation Alisa Liu Swabha Swayamdipta Noah A. Smith Yejin Choi 30 212 0 16 Jan 2022
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification Alon Talmor Ori Yoran Ronan Le Bras Chandrasekhar Bhagavatula Yoav Goldberg Yejin Choi Jonathan Berant ELM 16 140 0 14 Jan 2022
On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training Chen Liu Zhichao Huang Mathieu Salzmann Tong Zhang Sabine Süsstrunk AAML 13 13 0 14 Dec 2021
Dataset Geography: Mapping Language Data to Language Users Fahim Faisal Yinkai Wang Antonios Anastasopoulos 54 23 0 07 Dec 2021
Multi-View Active Learning for Short Text Classification in User-Generated Data Payam Karisani Negin Karisani Li Xiong VLM 13 4 0 05 Dec 2021
Understanding Out-of-distribution: A Perspective of Data Dynamics Dyah Adila Dongyeop Kang 20 12 0 29 Nov 2021
Clean or Annotate: How to Spend a Limited Data Collection Budget Derek Chen Zhou Yu Samuel R. Bowman 27 13 0 15 Oct 2021
Online Multi-horizon Transaction Metric Estimation with Multi-modal Learning in Payment Networks Chin-Chia Michael Yeh Zhongfang Zhuang Junpeng Wang Yan Zheng J. Ebrahimi Ryan Mercer Liang Wang Wei Zhang AI4TS 16 4 0 21 Sep 2021
Training Dynamic based data filtering may not work for NLP datasets Arka Talukdar Monika Dagar Prachi Gupta Varun G. Menon NoLa 27 3 0 19 Sep 2021
The Grammar-Learning Trajectories of Neural Language Models Leshem Choshen Guy Hacohen D. Weinshall Omri Abend 17 28 0 13 Sep 2021
Assessing the Quality of the Datasets by Identifying Mislabeled Samples Vaibhav Pulastya Gaurav Nuti Yash Kumar Atri Tanmoy Chakraborty NoLa 25 5 0 10 Sep 2021
Cartography Active Learning Mike Zhang Barbara Plank 19 37 0 09 Sep 2021
CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge Yasumasa Onoe Michael J.Q. Zhang Eunsol Choi Greg Durrett HILM 21 85 0 03 Sep 2021
Contrastive Explanations for Model Interpretability Alon Jacovi Swabha Swayamdipta Shauli Ravfogel Yanai Elazar Yejin Choi Yoav Goldberg 22 94 0 02 Mar 2021
Latent Adversarial Debiasing: Mitigating Collider Bias in Deep Neural Networks L. N. Darlow Stanisław Jastrzębski Amos Storkey 41 24 0 19 Nov 2020
ANLIzing the Adversarial Natural Language Inference Dataset Adina Williams Tristan Thrush Douwe Kiela AAML 166 45 0 24 Oct 2020
How Can We Accelerate Progress Towards Human-like Linguistic Generalization? Tal Linzen 216 188 0 03 May 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 226 4,424 0 23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,943 0 20 Apr 2018
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles Balaji Lakshminarayanan Alexander Pritzel Charles Blundell UQCV BDL 268 5,652 0 05 Dec 2016
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Y. Gal Zoubin Ghahramani UQCV BDL 247 9,109 0 06 Jun 2015