v1v2 (latest)

Do We Train on Test Data? Purging CIFAR of Near-Duplicates

1 February 2019

Papers citing "Do We Train on Test Data? Purging CIFAR of Near-Duplicates"

42 / 42 papers shown

Title
Impact of Data Duplication on Deep Neural Network-Based Image Classifiers: Robust vs. Standard Models Alireza Aghabagherloo Aydin Abadi Sumanta Sarkar Vishnu Asutosh Dasu Bart Preneel AAML 125 1 0 01 Apr 2025
The Vendiscope: An Algorithmic Microscope For Data Collections Amey P. Pasarkar Adji Bousso Dieng 90 2 0 15 Feb 2025
MBInception: A new Multi-Block Inception Model for Enhancing Image Processing Efficiency Fatemeh Froughirad Reza Bakhoda Eshtivani Hamed Khajavi Amir Rastgoo 79 0 0 18 Dec 2024
Label Errors in the Tobacco3482 Dataset Gordon Lim Stefan Larson Kevin Leach 118 0 0 17 Dec 2024
Questionable practices in machine learning Gavin Leech Juan J. Vazquez Misha Yagudin Niclas Kupper Laurence Aitchison 101 6 0 17 Jul 2024
Scaling Up Deep Clustering Methods Beyond ImageNet-1K Nikolas Adaloglou Félix D. P. Michels Kaspar Senft Diana Petrusheva M. Kollmann 102 1 0 03 Jun 2024
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions Luca Arnaboldi Yatin Dandi Florent Krzakala Luca Pesce Ludovic Stephan 128 18 0 24 May 2024
Automated Program Repair: Emerging trends pose and expose problems for benchmarks J. Renzullo Pemma Reiter Westley Weimer Stephanie Forrest 84 3 0 08 May 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets Lorenzo Brigato Stavroula Mougiakakou 64 0 0 08 Mar 2024
Rethinking cluster-conditioned diffusion models Nikolas Adaloglou Tim Kaiser Félix D. P. Michels M. Kollmann VLM 70 3 0 01 Mar 2024
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes Myeongseob Ko Feiyang Kang Weiyan Shi Ming Jin Zhou Yu Ruoxi Jia TDI 72 4 0 14 Feb 2024
Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images Tuan Truong Farnaz Khun Jush Matthias Lenga 69 3 0 12 Dec 2023
Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests Edward Raff James Holt 54 3 0 27 Oct 2023
No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets Lorenzo Brigato Stavroula Mougiakakou 72 5 0 04 Sep 2023
Data-Efficient Energy-Aware Participant Selection for UAV-Enabled Federated Learning Youssra Cheriguene Wael Jaafar Kerrache Chaker Abdelaziz H. Yanikomeroglu Fatima Zohra Bousbaa N. Lagraa FedML 66 2 0 14 Aug 2023
Memorization Through the Lens of Curvature of Loss Function Around Samples Isha Garg Deepak Ravikumar Kaushik Roy TDI 65 13 0 11 Jul 2023
Integrating Curricula with Replays: Its Effects on Continual Learning Ren Jie Tee Mengmi Zhang KELM CLL 83 1 0 08 Jul 2023
On Evaluation of Document Classification using RVL-CDIP Stefan Larson Gordon Lim Kevin Leach 96 3 0 21 Jun 2023
Image Classification with Small Datasets: Overview and Benchmark Lorenzo Brigato Björn Barz Luca Iocchi Joachim Denzler VLM 64 20 0 23 Dec 2022
Reducing Training Sample Memorization in GANs by Training with Memorization Rejection Andrew Bai Cho-Jui Hsieh Wendy Kan Hsuan-Tien Lin GAN 85 5 0 21 Oct 2022
A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences Natavsa Tagasovska Nathan C. Frey Andreas Loukas I. Hotzel J. Lafrance-Vanasse ... A. Rajpal Richard Bonneau Kyunghyun Cho Stephen Ra Vladimir Gligorijević 88 11 0 19 Oct 2022
Bugs in the Data: How ImageNet Misrepresents Biodiversity A. Luccioni David Rolnick 78 46 0 24 Aug 2022
When does dough become a bagel? Analyzing the remaining mistakes on ImageNet Vijay Vasudevan Benjamin Caine Raphael Gontijo-Lopes Sara Fridovich-Keil Rebecca Roelofs VLM UQCV 90 59 0 09 May 2022
A Siren Song of Open Source Reproducibility Edward Raff Andrew L. Farris 82 9 0 09 Apr 2022
Perfectly Accurate Membership Inference by a Dishonest Central Server in Federated Learning Georg Pichler Marco Romanelli L. Rey Vega Pablo Piantanida FedML 56 11 0 30 Mar 2022
Datamodels: Predicting Predictions from Training Data Andrew Ilyas Sung Min Park Logan Engstrom Guillaume Leclerc Aleksander Madry TDI 135 143 0 01 Feb 2022
Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification Takashi Ishida Ikko Yamane Nontawat Charoenphakdee Gang Niu Masashi Sugiyama BDL UQCV 62 18 0 01 Feb 2022
A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels R. Joyce Edward Raff Charles K. Nicholas 80 16 0 23 Sep 2021
Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification Lorenzo Brigato Björn Barz Luca Iocchi Joachim Denzler 67 18 0 30 Aug 2021
A data-based comparative review and AI-driven symbolic model for longitudinal dispersion coefficient in natural streams Yifeng Zhao Zicheng Liu Peiren Zhang S. Galindo‐Torres Stan Z. Li 13 0 0 17 Jun 2021
On Memorization in Probabilistic Deep Generative Models G. V. D. Burg Christopher K. I. Williams TDI 95 63 0 06 Jun 2021
Rethinking Noisy Label Models: Labeler-Dependent Noise with Adversarial Awareness Glenn Dawson R. Polikar NoLa 73 3 0 28 May 2021
PEng4NN: An Accurate Performance Estimation Engine for Efficient Automated Neural Network Architecture Search A. Rorabaugh Silvina Caíno-Lores Michael R. Wyatt Travis Johnston M. Taufer 39 2 0 11 Jan 2021
An Analysis of Dataset Overlap on Winograd-Style Tasks Ali Emami Adam Trischler Kaheer Suleman Jackie C.K. Cheung 76 22 0 09 Nov 2020
FSD50K: An Open Dataset of Human-Labeled Sound Events Eduardo Fonseca Xavier Favory Jordi Pons F. Font Xavier Serra 121 467 0 01 Oct 2020
What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation Vitaly Feldman Chiyuan Zhang TDI 245 472 0 09 Aug 2020
Are we done with ImageNet? Lucas Beyer Olivier J. Hénaff Alexander Kolesnikov Xiaohua Zhai Aaron van den Oord VLM 134 406 0 12 Jun 2020
The Curious Case of Convex Neural Networks S. Sivaprasad Ankur Singh Naresh Manwani Vineet Gandhi 109 27 0 09 Jun 2020
Self-Distillation as Instance-Specific Label Smoothing Zhilu Zhang M. Sabuncu 76 119 0 09 Jun 2020
Identifying Mislabeled Data using the Area Under the Margin Ranking Geoff Pleiss Tianyi Zhang Ethan R. Elenberg Kilian Q. Weinberger NoLa 119 274 0 28 Jan 2020
Big Transfer (BiT): General Visual Representation Learning Alexander Kolesnikov Lucas Beyer Xiaohua Zhai J. Puigcerver Jessica Yung Sylvain Gelly N. Houlsby MQ 301 1,212 0 24 Dec 2019
Understanding Isomorphism Bias in Graph Data Sets Sergei Ivanov Sergei Sviridov Evgeny Burnaev FaML AI4CE 118 38 0 26 Oct 2019