Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.10795
Cited By
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
22 September 2020
Swabha Swayamdipta
Roy Schwartz
Nicholas Lourie
Yizhong Wang
Hannaneh Hajishirzi
Noah A. Smith
Yejin Choi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics"
50 / 86 papers shown
Title
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
57
0
0
02 May 2025
ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection
Xiaoxuan Zhu
Zhouhong Gu
Baiqian Wu
Suhang Zheng
Tao Wang
Tianyu Li
Hongwei Feng
Yanghua Xiao
40
0
0
01 Apr 2025
Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation
Pritam Kadasi
Sriman Reddy
Srivathsa Vamsi Chaturvedula
Rudranshu Sen
Agnish Saha
Soumavo Sikdar
Sayani Sarkar
Suhani Mittal
Rohit Jindal
Mayank Singh
48
0
0
19 Mar 2025
Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text
Andrei Jarca
Florinel-Alin Croitoru
Radu Tudor Ionescu
44
0
0
18 Feb 2025
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis
Wenbo Zhang
Hengrui Cai
Wenyu Chen
77
0
0
17 Feb 2025
Diversity-Oriented Data Augmentation with Large Language Models
Zaitian Wang
Jinghan Zhang
Xinhao Zhang
Kunpeng Liu
Pengfei Wang
Yuanchun Zhou
78
1
0
17 Feb 2025
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks
Jing Yang
Max Glockner
Anderson de Rezende Rocha
Iryna Gurevych
LRM
62
1
0
07 Feb 2025
Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets
Vatsal Gupta
Pranshu Pandya
Tushar Kataria
Vivek Gupta
Dan Roth
AAML
53
1
0
03 Jan 2025
VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition
Michael Yeung
Toya Teramoto
Songtao Wu
Tatsuo Fujiwara
Kenji Suzuki
Tamaki Kojima
71
0
0
09 Dec 2024
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
37
0
0
13 Nov 2024
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
52
1
0
26 Oct 2024
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors
Georgios Chochlakis
Alexandros Potamianos
Kristina Lerman
Shrikanth Narayanan
25
0
0
17 Oct 2024
Data Quality Control in Federated Instruction-tuning of Large Language Models
Yaxin Du
Rui Ye
Fengting Yuchi
W. Zhao
Jingjing Qu
Y. Wang
Siheng Chen
ALM
FedML
45
0
0
15 Oct 2024
Continual Learning: Less Forgetting, More OOD Generalization via Adaptive Contrastive Replay
Hossein Rezaei
Mohammad Sabokrou
CLL
21
0
0
09 Oct 2024
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
Yuxin Xiao
Shujian Zhang
Wenxuan Zhou
Marzyeh Ghassemi
Sanqiang Zhao
58
0
0
07 Oct 2024
Targeted synthetic data generation for tabular data via hardness characterization
Tommaso Ferracci
Leonie Goldmann
Anton Hinel
Francesco Sanna Passino
87
0
0
01 Oct 2024
Training Gradient Boosted Decision Trees on Tabular Data Containing Label Noise for Classification Tasks
Anita Eisenburger
Daniel Otten
Anselm Hudde
F. Hopfgartner
NoLa
39
1
0
13 Sep 2024
Knowledge-Infused Legal Wisdom: Navigating LLM Consultation through the Lens of Diagnostics and Positive-Unlabeled Reinforcement Learning
Yang Wu
Chenghao Wang
Ece Gumusel
Xiaozhong Liu
ELM
AILaw
29
4
0
05 Jun 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Zachary Ankner
Cody Blakeney
Kartik K. Sreenivasan
Max Marion
Matthew L. Leavitt
Mansheej Paul
30
23
0
30 May 2024
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
E. Chimoto
Jay Gala
Orevaoghene Ahia
Julia Kreutzer
Bruce A. Bassett
Sara Hooker
VLM
34
4
0
29 May 2024
Exploring the Evolution of Hidden Activations with Live-Update Visualization
Xianglin Yang
Jin Song Dong
23
0
0
24 May 2024
TrACT: A Training Dynamics Aware Contrastive Learning Framework for Long-tail Trajectory Prediction
Junrui Zhang
Mozhgan Pourkeshavarz
Amir Rasouli
42
3
0
18 Apr 2024
Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation
Juhwan Choi
Jungmin Yun
Kyohoon Jin
Youngbin Kim
30
4
0
15 Apr 2024
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference
Mobashir Sadat
Cornelia Caragea
32
4
0
11 Apr 2024
Towards Principled Task Grouping for Multi-Task Learning
Chenguang Wang
Xuanhao Pan
Tianshu Yu
29
0
0
23 Feb 2024
Importance-Aware Data Augmentation for Document-Level Neural Machine Translation
Ming-Ru Wu
Yufei Wang
George F. Foster
Lizhen Qu
Gholamreza Haffari
13
6
0
27 Jan 2024
Annotation Sensitivity: Training Data Collection Methods Affect Model Performance
Christoph Kern
Stephanie Eckman
Jacob Beck
Rob Chew
Bolei Ma
Frauke Kreuter
8
9
0
23 Nov 2023
NameGuess: Column Name Expansion for Tabular Data
Jiani Zhang
Zhengyuan Shen
Balasubramaniam Srinivasan
Shen Wang
Huzefa Rangwala
George Karypis
13
4
0
19 Oct 2023
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
Yupei Du
Albert Gatt
Dong Nguyen
19
1
0
10 Oct 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLM
DiffM
23
5
0
02 Aug 2023
Which Spurious Correlations Impact Reasoning in NLI Models? A Visual Interactive Diagnosis through Data-Constrained Counterfactuals
Robin Shing Moon Chan
Afra Amini
Mennatallah El-Assady
LRM
AAML
21
2
0
21 Jun 2023
Measuring and Mitigating Local Instability in Deep Neural Networks
Arghya Datta
Subhrangshu Nandi
Jingcheng Xu
Greg Ver Steeg
He Xie
Anoop Kumar
Aram Galstyan
8
3
0
18 May 2023
What's the Meaning of Superhuman Performance in Today's NLU?
Simone Tedeschi
Johan Bos
T. Declerck
Jan Hajic
Daniel Hershcovich
...
Simon Krek
Steven Schockaert
Rico Sennrich
Ekaterina Shutova
Roberto Navigli
ELM
LM&MA
VLM
ReLM
LRM
24
26
0
15 May 2023
Does Informativeness Matter? Active Learning for Educational Dialogue Act Classification
Wei Tan
Jionghao Lin
David Lang
Guanliang Chen
D. Gašević
Lan Du
Wray L. Buntine
11
6
0
12 Apr 2023
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks
Antonis Maronikolakis
Abdullatif Köksal
Hinrich Schütze
28
0
0
04 Apr 2023
A Bag-of-Prototypes Representation for Dataset-Level Applications
Wei-Chih Tu
Weijian Deng
Tom Gedeon
Liang Zheng
19
9
0
23 Mar 2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism
Hannah Rose Kirk
Wenjie Yin
Bertie Vidgen
Paul Röttger
10
117
0
07 Mar 2023
Balanced Audiovisual Dataset for Imbalance Analysis
Wenke Xia
Xu Zhao
Xincheng Pang
Changqing Zhang
Di Hu
24
1
0
14 Feb 2023
Investigating Multi-source Active Learning for Natural Language Inference
Ard Snijders
Douwe Kiela
Katerina Margatina
20
7
0
14 Feb 2023
Identifying Semantically Difficult Samples to Improve Text Classification
Shashank Mujumdar
S. Mehta
Hima Patel
Suman Mitra
11
0
0
13 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
17
1
0
07 Feb 2023
FUN with Fisher: Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing
Chen Cecilia Liu
Jonas Pfeiffer
Ivan Vulić
Iryna Gurevych
CLL
17
9
0
13 Jan 2023
Understanding Difficulty-based Sample Weighting with a Universal Difficulty Measure
Xiaoling Zhou
Ou Wu
Weiyao Zhu
Ziyang Liang
20
2
0
12 Jan 2023
BERT on a Data Diet: Finding Important Examples by Gradient-Based Pruning
Mohsen Fayyaz
Ehsan Aghazadeh
Ali Modarressi
Mohammad Taher Pilehvar
Yadollah Yaghoobzadeh
Samira Ebrahimi Kahou
19
18
0
10 Nov 2022
DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems
Nabeel Seedat
F. Imrie
M. Schaar
25
12
0
09 Nov 2022
The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation
Barbara Plank
22
96
0
04 Nov 2022
Exploring Mode Connectivity for Pre-trained Language Models
Yujia Qin
Cheng Qian
Jing Yi
Weize Chen
Yankai Lin
Xu Han
Zhiyuan Liu
Maosong Sun
Jie Zhou
8
20
0
25 Oct 2022
Training Dynamics for Curriculum Learning: A Study on Monolingual and Cross-lingual NLU
Fenia Christopoulou
Gerasimos Lampouras
Ignacio Iacobacci
24
3
0
22 Oct 2022
SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval
Kun Zhou
Yeyun Gong
Xiao Liu
Wayne Xin Zhao
Yelong Shen
...
Jing Lu
Rangan Majumder
Ji-Rong Wen
Nan Duan
Weizhu Chen
21
33
0
21 Oct 2022
Improving Data Quality with Training Dynamics of Gradient Boosting Decision Trees
M. Ponti
L. Oliveira
Mathias Esteban
Valentina Garcia
J. Román
Luis Argerich
TDI
16
4
0
20 Oct 2022
1
2
Next