Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1902.00423
Cited By
v1
v2 (latest)
Do We Train on Test Data? Purging CIFAR of Near-Duplicates
1 February 2019
Björn Barz
Joachim Denzler
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Do We Train on Test Data? Purging CIFAR of Near-Duplicates"
42 / 42 papers shown
Title
Impact of Data Duplication on Deep Neural Network-Based Image Classifiers: Robust vs. Standard Models
Alireza Aghabagherloo
Aydin Abadi
Sumanta Sarkar
Vishnu Asutosh Dasu
Bart Preneel
AAML
125
1
0
01 Apr 2025
The Vendiscope: An Algorithmic Microscope For Data Collections
Amey P. Pasarkar
Adji Bousso Dieng
90
2
0
15 Feb 2025
MBInception: A new Multi-Block Inception Model for Enhancing Image Processing Efficiency
Fatemeh Froughirad
Reza Bakhoda Eshtivani
Hamed Khajavi
Amir Rastgoo
79
0
0
18 Dec 2024
Label Errors in the Tobacco3482 Dataset
Gordon Lim
Stefan Larson
Kevin Leach
118
0
0
17 Dec 2024
Questionable practices in machine learning
Gavin Leech
Juan J. Vazquez
Misha Yagudin
Niclas Kupper
Laurence Aitchison
101
6
0
17 Jul 2024
Scaling Up Deep Clustering Methods Beyond ImageNet-1K
Nikolas Adaloglou
Félix D. P. Michels
Kaspar Senft
Diana Petrusheva
M. Kollmann
102
1
0
03 Jun 2024
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Luca Pesce
Ludovic Stephan
128
18
0
24 May 2024
Automated Program Repair: Emerging trends pose and expose problems for benchmarks
J. Renzullo
Pemma Reiter
Westley Weimer
Stephanie Forrest
84
3
0
08 May 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato
Stavroula Mougiakakou
64
0
0
08 Mar 2024
Rethinking cluster-conditioned diffusion models
Nikolas Adaloglou
Tim Kaiser
Félix D. P. Michels
M. Kollmann
VLM
70
3
0
01 Mar 2024
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
Myeongseob Ko
Feiyang Kang
Weiyan Shi
Ming Jin
Zhou Yu
Ruoxi Jia
TDI
72
4
0
14 Feb 2024
Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images
Tuan Truong
Farnaz Khun Jush
Matthias Lenga
69
3
0
12 Dec 2023
Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests
Edward Raff
James Holt
54
3
0
27 Oct 2023
No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets
Lorenzo Brigato
Stavroula Mougiakakou
72
5
0
04 Sep 2023
Data-Efficient Energy-Aware Participant Selection for UAV-Enabled Federated Learning
Youssra Cheriguene
Wael Jaafar
Kerrache Chaker Abdelaziz
H. Yanikomeroglu
Fatima Zohra Bousbaa
N. Lagraa
FedML
66
2
0
14 Aug 2023
Memorization Through the Lens of Curvature of Loss Function Around Samples
Isha Garg
Deepak Ravikumar
Kaushik Roy
TDI
65
13
0
11 Jul 2023
Integrating Curricula with Replays: Its Effects on Continual Learning
Ren Jie Tee
Mengmi Zhang
KELM
CLL
83
1
0
08 Jul 2023
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
96
3
0
21 Jun 2023
Image Classification with Small Datasets: Overview and Benchmark
Lorenzo Brigato
Björn Barz
Luca Iocchi
Joachim Denzler
VLM
64
20
0
23 Dec 2022
Reducing Training Sample Memorization in GANs by Training with Memorization Rejection
Andrew Bai
Cho-Jui Hsieh
Wendy Kan
Hsuan-Tien Lin
GAN
85
5
0
21 Oct 2022
A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences
Natavsa Tagasovska
Nathan C. Frey
Andreas Loukas
I. Hotzel
J. Lafrance-Vanasse
...
A. Rajpal
Richard Bonneau
Kyunghyun Cho
Stephen Ra
Vladimir Gligorijević
88
11
0
19 Oct 2022
Bugs in the Data: How ImageNet Misrepresents Biodiversity
A. Luccioni
David Rolnick
78
46
0
24 Aug 2022
When does dough become a bagel? Analyzing the remaining mistakes on ImageNet
Vijay Vasudevan
Benjamin Caine
Raphael Gontijo-Lopes
Sara Fridovich-Keil
Rebecca Roelofs
VLM
UQCV
90
59
0
09 May 2022
A Siren Song of Open Source Reproducibility
Edward Raff
Andrew L. Farris
82
9
0
09 Apr 2022
Perfectly Accurate Membership Inference by a Dishonest Central Server in Federated Learning
Georg Pichler
Marco Romanelli
L. Rey Vega
Pablo Piantanida
FedML
56
11
0
30 Mar 2022
Datamodels: Predicting Predictions from Training Data
Andrew Ilyas
Sung Min Park
Logan Engstrom
Guillaume Leclerc
Aleksander Madry
TDI
135
143
0
01 Feb 2022
Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification
Takashi Ishida
Ikko Yamane
Nontawat Charoenphakdee
Gang Niu
Masashi Sugiyama
BDL
UQCV
62
18
0
01 Feb 2022
A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels
R. Joyce
Edward Raff
Charles K. Nicholas
80
16
0
23 Sep 2021
Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification
Lorenzo Brigato
Björn Barz
Luca Iocchi
Joachim Denzler
67
18
0
30 Aug 2021
A data-based comparative review and AI-driven symbolic model for longitudinal dispersion coefficient in natural streams
Yifeng Zhao
Zicheng Liu
Peiren Zhang
S. Galindo‐Torres
Stan Z. Li
13
0
0
17 Jun 2021
On Memorization in Probabilistic Deep Generative Models
G. V. D. Burg
Christopher K. I. Williams
TDI
95
63
0
06 Jun 2021
Rethinking Noisy Label Models: Labeler-Dependent Noise with Adversarial Awareness
Glenn Dawson
R. Polikar
NoLa
73
3
0
28 May 2021
PEng4NN: An Accurate Performance Estimation Engine for Efficient Automated Neural Network Architecture Search
A. Rorabaugh
Silvina Caíno-Lores
Michael R. Wyatt
Travis Johnston
M. Taufer
39
2
0
11 Jan 2021
An Analysis of Dataset Overlap on Winograd-Style Tasks
Ali Emami
Adam Trischler
Kaheer Suleman
Jackie C.K. Cheung
76
22
0
09 Nov 2020
FSD50K: An Open Dataset of Human-Labeled Sound Events
Eduardo Fonseca
Xavier Favory
Jordi Pons
F. Font
Xavier Serra
121
467
0
01 Oct 2020
What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
Vitaly Feldman
Chiyuan Zhang
TDI
245
472
0
09 Aug 2020
Are we done with ImageNet?
Lucas Beyer
Olivier J. Hénaff
Alexander Kolesnikov
Xiaohua Zhai
Aaron van den Oord
VLM
134
406
0
12 Jun 2020
The Curious Case of Convex Neural Networks
S. Sivaprasad
Ankur Singh
Naresh Manwani
Vineet Gandhi
109
27
0
09 Jun 2020
Self-Distillation as Instance-Specific Label Smoothing
Zhilu Zhang
M. Sabuncu
76
119
0
09 Jun 2020
Identifying Mislabeled Data using the Area Under the Margin Ranking
Geoff Pleiss
Tianyi Zhang
Ethan R. Elenberg
Kilian Q. Weinberger
NoLa
119
274
0
28 Jan 2020
Big Transfer (BiT): General Visual Representation Learning
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
J. Puigcerver
Jessica Yung
Sylvain Gelly
N. Houlsby
MQ
301
1,212
0
24 Dec 2019
Understanding Isomorphism Bias in Graph Data Sets
Sergei Ivanov
Sergei Sviridov
Evgeny Burnaev
FaML
AI4CE
118
38
0
26 Oct 2019
1