Deep Double Descent: Where Bigger Models and More Data Hurt

4 December 2019

Papers citing "Deep Double Descent: Where Bigger Models and More Data Hurt"

50 / 182 papers shown

Title
A dynamic view of the double descent Vivek Shripad Borkar 63 0 0 03 May 2025
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Roman Abramov Felix Steinbauer Gjergji Kasneci 144 0 0 29 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Yiping Wang Qing Yang Zhiyuan Zeng Liliang Ren L. Liu ... Jianfeng Gao Weizhu Chen S. Wang Simon S. Du Yelong Shen OffRL ReLM LRM 118 4 0 29 Apr 2025
The Double Descent Behavior in Two Layer Neural Network for Binary Classification Chathurika S Abeykoon A. Beknazaryan Hailin Sang 51 1 0 27 Apr 2025
A Model Zoo on Phase Transitions in Neural Networks Konstantin Schurholt Léo Meynent Yefan Zhou Haiquan Lu Yaoqing Yang Damian Borth 68 0 0 25 Apr 2025
PETNet -- Coincident Particle Event Detection using Spiking Neural Networks Jan Debus Charlotte Debus Günther Dissertori Markus Gotz 31 0 0 09 Apr 2025
The Challenge of Achieving Attributability in Multilingual Table-to-Text Generation with Question-Answer Blueprints Aden Haussmann LMTD 57 0 0 29 Mar 2025
On the Relationship Between Double Descent of CNNs and Shape/Texture Bias Under Learning Process Shun Iwase Shuya Takahashi Nakamasa Inoue Rio Yokota Ryo Nakamura Hirokatsu Kataoka 74 0 0 04 Mar 2025
From Small to Large Language Models: Revisiting the Federalist Papers So Won Jeong Veronika Rockova 37 0 0 25 Feb 2025
On Memorization in Diffusion Models Xiangming Gu Chao Du Tianyu Pang Chongxuan Li Min-Bin Lin Ye Wang DiffM TDI 166 43 0 21 Feb 2025
Early Stopping Against Label Noise Without Validation Data Suqin Yuan Lei Feng Tongliang Liu NoLa 98 15 0 11 Feb 2025
Analysis of Overparameterization in Continual Learning under a Linear Model Daniel Goldfarb Paul Hand CLL 39 0 0 11 Feb 2025
The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation Martin Mundt Anaelia Ovalle Felix Friedrich A Pranav Subarnaduti Paul Manuel Brack Kristian Kersting William Agnew 281 0 0 05 Feb 2025
How more data can hurt: Instability and regularization in next-generation reservoir computing Yuanzhao Zhang Edmilson Roque dos Santos Sean P. Cornelius 77 2 0 28 Jan 2025
Functional Risk Minimization Ferran Alet Clement Gehring Tomás Lozano-Pérez Kenji Kawaguchi Joshua B. Tenenbaum Leslie Pack Kaelbling OffRL 60 0 0 31 Dec 2024
Understanding Model Ensemble in Transferable Adversarial Attack Wei Yao Zeliang Zhang Huayi Tang Yong Liu 33 2 0 09 Oct 2024
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models Tung-Yu Wu Pei-Yu Lo ReLM LRM 46 2 0 02 Oct 2024
Investigating the Impact of Model Complexity in Large Language Models Jing Luo Huiyuan Wang Weiran Huang 34 0 0 01 Oct 2024
Zero-shot forecasting of chaotic systems Yuanzhao Zhang William Gilpin AI4TS 37 4 0 24 Sep 2024
Improved Diversity-Promoting Collaborative Metric Learning for Recommendation Shilong Bao Qianqian Xu Zhiyong Yang Yuan He Xiaochun Cao Qingming Huang 45 5 0 02 Sep 2024
Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning Mohammadamin Banayeeanzade Mahdi Soltanolkotabi Mohammad Rostami CLL LRM 103 1 0 29 Aug 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 82 19 0 02 Jul 2024
Establishing Deep InfoMax as an effective self-supervised learning methodology in materials informatics Michael Moran Vladimir V. Gusev M. Gaultois Dmytro Antypov M. Rosseinsky AI4CE 25 0 0 30 Jun 2024
Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models Ziche Liu Rui Ke Feng Jiang Feng Jiang Haizhou Li 69 1 0 20 Jun 2024
Just How Flexible are Neural Networks in Practice? Ravid Shwartz-Ziv Micah Goldblum Arpit Bansal C. B. Bruss Yann LeCun Andrew Gordon Wilson 40 4 0 17 Jun 2024
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement Wangyou Zhang Kohei Saijo Jee-weon Jung Chenda Li Shinji Watanabe Yanmin Qian 32 4 0 06 Jun 2024
Representations as Language: An Information-Theoretic Framework for Interpretability Henry Conklin Kenny Smith MILM 39 1 0 04 Jun 2024
A Margin-based Multiclass Generalization Bound via Geometric Complexity Michael Munn Benoit Dherin Javier Gonzalvo UQCV 40 2 0 28 May 2024
Survival of the Fittest Representation: A Case Study with Modular Addition Xiaoman Delores Ding Zifan Carl Guo Eric J. Michaud Ziming Liu Max Tegmark 48 3 0 27 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Xueyan Niu Bo Bai Lei Deng Wei Han 31 6 0 14 May 2024
pFedLVM: A Large Vision Model (LVM)-Driven and Latent Feature-Based Personalized Federated Learning Framework in Autonomous Driving Wei-Bin Kou Qingfeng Lin Ming Tang Sheng Xu Rongguang Ye ... Shuai Wang Guofa Li Zhenyu Chen Guangxu Zhu Yik-Chung Wu FedML 52 11 0 07 May 2024
Why is SAM Robust to Label Noise? Christina Baek Zico Kolter Aditi Raghunathan NoLa AAML 41 9 0 06 May 2024
LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing Zeyang Ma A. Chen Dong Jae Kim Tse-Husn Chen Shaowei Wang 27 44 0 27 Apr 2024
Predictive Churn with the Set of Good Models J. Watson-Daniels Flavio du Pin Calmon Alexander DÁmour Carol Xuan Long David C. Parkes Berk Ustun 83 7 0 12 Feb 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead Marlon Becker Frederick Altrock Benjamin Risse 76 5 0 22 Jan 2024
Weak Correlations as the Underlying Principle for Linearization of Gradient-Based Learning Systems Ori Shem-Ur Yaron Oz 14 0 0 08 Jan 2024
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck Benjamin L. Edelman Surbhi Goel Sham Kakade Eran Malach Cyril Zhang 48 8 0 07 Sep 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? T. Kajitsuka Issei Sato 31 16 0 26 Jul 2023
What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems Saddek Bensalem Chih-Hong Cheng Wei Huang Xiaowei Huang Changshun Wu Xingyu Zhao AAML 24 6 0 20 Jul 2023
Quantifying lottery tickets under label noise: accuracy, calibration, and complexity V. Arora Daniele Irto Sebastian Goldt G. Sanguinetti 36 2 0 21 Jun 2023
Gibbs-Based Information Criteria and the Over-Parameterized Regime Haobo Chen Yuheng Bu Greg Wornell 27 1 0 08 Jun 2023
Double Descent of Discrepancy: A Task-, Data-, and Model-Agnostic Phenomenon Yi-Xiao Luo Bin Dong 26 0 0 25 May 2023
How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features Simone Bombari Marco Mondelli AAML 19 4 0 20 May 2023
On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains Yicheng Li Zixiong Yu Y. Cotronis Qian Lin 55 13 0 04 May 2023
Diversifying the High-level Features for better Adversarial Transferability Zhiyuan Wang Zeliang Zhang Siyuan Liang Xiaosen Wang AAML 42 18 0 20 Apr 2023
Mathematical Challenges in Deep Learning V. Nia Guojun Zhang I. Kobyzev Michael R. Metel Xinlin Li ... S. Hemati M. Asgharian Linglong Kong Wulong Liu Boxing Chen AI4CE VLM 37 1 0 24 Mar 2023
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset Thanh-Dung Le P. Jouvet R. Noumeir MoE MedIm 72 5 0 22 Mar 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency Vithursan Thangarasa Shreyas Saxena Abhay Gupta Sean Lie 28 3 0 21 Mar 2023
Memorization Capacity of Neural Networks with Conditional Computation Erdem Koyuncu 30 4 0 20 Mar 2023
Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and Reducing Overfitting Yitzchak Shmalo Jonathan Jenkins Oleksii Krupchytskyi 22 3 0 15 Mar 2023