Mandoline: Model Evaluation under Distribution Shift

1 July 2021

Papers citing "Mandoline: Model Evaluation under Distribution Shift"

47 / 47 papers shown

Title
Performance Estimation in Binary Classification Using Calibrated Confidence Juhani Kivimäki Jakub Białek W. Kuberski J. Nurminen 48 0 0 08 May 2025
Revisiting the attacker's knowledge in inference attacks against Searchable Symmetric Encryption Marc Damie Jean-Benoist Leger Florian Hahn Andreas Peter AAML 43 1 0 14 Apr 2025
Evaluating Membership Inference Attacks in heterogeneous-data setups Bram van Dartel Marc Damie Florian Hahn MIACV MIALM 181 0 0 26 Feb 2025
Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach Firas Bayram Bestoun S. Ahmed OOD 34 0 0 28 Oct 2024
Detecting Interpretable Subgroup Drifts F. Giobergia Eliana Pastor Luca de Alfaro Elena Baralis 16 0 0 26 Aug 2024
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance Thomas Decker Alexander Koebler Michael Lebacher Ingo Thon Volker Tresp Florian Buettner 24 1 0 24 Aug 2024
LADDER: Language Driven Slice Discovery and Error Rectification Shantanu Ghosh Rayan Syed Chenyu Wang Clare B. Poynton Kayhan Batmanghelich 34 0 0 31 Jul 2024
LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class Taxonomies Jia Shi Gautam Gare Jinjin Tian Siqi Chai Zhiqiu Lin Arun Vasudevan Di Feng Francesco Ferroni Shu Kong VLM OODD OOD 52 3 0 22 Jul 2024
A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation Riccardo Fogliato Pratik Patil Mathew Monfort Pietro Perona 22 1 0 11 Jun 2024
Clarify: Improving Model Robustness With Natural Language Corrections Yoonho Lee Michelle S. Lam Helena Vasconcelos Michael S. Bernstein Chelsea Finn 27 6 0 06 Feb 2024
Expert-Driven Monitoring of Operational ML Models J. Leest C. Raibulet Ilias Gerostathopoulos Patricia Lago 26 0 0 22 Jan 2024
Estimating Model Performance Under Covariate Shift Without Labels Jakub Bialek W. Kuberski Nikolaos Perrakis Albert Bifet 31 2 0 16 Jan 2024
Can You Rely on Your Model Evaluation? Improving Model Evaluation with Synthetic Test Data B. V. Breugel Nabeel Seedat F. Imrie M. Schaar SyDa 24 19 0 25 Oct 2023
GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels Xin-Yang Zheng Miao Zhang C. Chen Soheila Molaei Chuan Zhou Shirui Pan GNN 34 14 0 23 Oct 2023
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks Hao Chen Jindong Wang Ankit Shah Ran Tao Hongxin Wei Berfin cSimcsek Masashi Sugiyama Bhiksha Raj 29 26 0 29 Sep 2023
Large Language Model Routing with Benchmark Datasets Tal Shnitzer Anthony Ou Mírian Silva Kate Soule Yuekai Sun Justin Solomon Neil Thompson Mikhail Yurochkin RALM 11 56 0 27 Sep 2023
PAGER: A Framework for Failure Analysis of Deep Regression Models Jayaraman J. Thiagarajan V. Narayanaswamy Puja Trivedi Rushil Anirudh 33 0 0 20 Sep 2023
CAME: Contrastive Automated Model Evaluation Ru Peng Qiuyang Duan Haobo Wang Jiachen Ma Yanbo Jiang Yongjun Tu Xiu Jiang J. Zhao ELM 23 4 0 22 Aug 2023
Distance Matters For Improving Performance Estimation Under Covariate Shift Mélanie Roschewitz Ben Glocker 23 1 0 14 Aug 2023
Evaluating the Robustness of Test Selection Methods for Deep Neural Networks Qiang Hu Yuejun Guo Xiaofei Xie Maxime Cordy Wei Ma Mike Papadakis Yves Le Traon NoLa OOD 22 3 0 29 Jul 2023
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models Uddeshya Upadhyay Shyamgopal Karthik Massimiliano Mancini Zeynep Akata MLLM VLM 16 3 0 01 Jul 2023
On Orderings of Probability Vectors and Unsupervised Performance Estimation Muhammad Maaz Rui Qiao Yiheng Zhou Renxian Zhang 16 0 0 16 Jun 2023
(Almost) Provable Error Bounds Under Distribution Shift via Disagreement Discrepancy Elan Rosenfeld Saurabh Garg UQCV 32 4 0 01 Jun 2023
Characterizing Out-of-Distribution Error via Optimal Transport Yuzhe Lu Yilong Qin Runtian Zhai Andrew Shen Ketong Chen Zhenlin Wang Soheil Kolouri Simon Stepputtis Joseph Campbell Katia P. Sycara OODD 32 10 0 25 May 2023
A Domain-Region Based Evaluation of ML Performance Robustness to Covariate Shift Firas Bayram Bestoun S. Ahmed OOD 11 4 0 18 Apr 2023
K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation Shuyu Miao Lin Zheng J. Liu and Hong Jin 31 5 0 17 Apr 2023
On the Efficacy of Generalization Error Prediction Scoring Functions Puja Trivedi Danai Koutra Jayaraman J. Thiagarajan 21 0 0 23 Mar 2023
Confidence and Dispersity Speak: Characterising Prediction Matrix for Unsupervised Accuracy Estimation Weijian Deng Yumin Suh Stephen Gould Liang Zheng UQCV 26 12 0 02 Feb 2023
Data Models for Dataset Drift Controls in Machine Learning With Optical Images Luis Oala Marco Aversa Gabriel Nobis Kurt Willis Yoan Neuenschwander ... E. Pomarico Wojciech Samek Roderick Murray-Smith Christoph Clausen B. Sanguinetti 23 5 0 04 Nov 2022
Test-time Recalibration of Conformal Predictors Under Distribution Shift Based on Unlabeled Examples Fatih Yilmaz Reinhard Heckel 23 0 0 09 Oct 2022
From plane crashes to algorithmic harm: applicability of safety engineering frameworks for responsible ML Shalaleh Rismani Renee Shelby A. Smart Edgar W. Jatho Joshua A. Kroll AJung Moon Negar Rostamzadeh 34 36 0 06 Oct 2022
HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions Lingjiao Chen Zhihua Jin Sabri Eyuboglu Christopher Ré Matei A. Zaharia James Y. Zou 45 9 0 18 Sep 2022
Estimating and Explaining Model Performance When Both Covariates and Labels Shift Lingjiao Chen Matei A. Zaharia James Y. Zou 20 15 0 18 Sep 2022
Estimating Model Performance under Domain Shifts with Class-Specific Confidence Scores Zeju Li Konstantinos Kamnitsas Mobarakol Islam Chen Chen Ben Glocker 24 9 0 20 Jul 2022
Predicting Out-of-Domain Generalization with Neighborhood Invariance Nathan Ng Neha Hulkund Kyunghyun Cho Marzyeh Ghassemi OOD 18 4 0 05 Jul 2022
Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift Christina Baek Yiding Jiang Aditi Raghunathan Zico Kolter 24 79 0 27 Jun 2022
Evaluating Robustness to Dataset Shift via Parametric Robustness Sets Nikolaj Thams Michael Oberst David Sontag OOD 38 10 0 31 May 2022
Understanding new tasks through the lens of training data via exponential tilting Subha Maity Mikhail Yurochkin Moulinath Banerjee Yuekai Sun 29 10 0 26 May 2022
Evaluation Gaps in Machine Learning Practice Ben Hutchinson Negar Rostamzadeh Christina Greer Katherine A. Heller Vinodkumar Prabhakaran ELM 28 56 0 11 May 2022
Domino: Discovering Systematic Errors with Cross-Modal Embeddings Sabri Eyuboglu M. Varma Khaled Kamal Saab Jean-Benoit Delbrouck Christopher Lee-Messer Jared A. Dunnmon James Y. Zou Christopher Ré 22 141 0 24 Mar 2022
Predicting Out-of-Distribution Error with the Projection Norm Yaodong Yu Zitong Yang Alexander Wei Yi-An Ma Jacob Steinhardt OODD 12 43 0 11 Feb 2022
Self-Adaptive Forecasting for Improved Deep Learning on Non-Stationary Time-Series Sercan Ö. Arik Nathanael Yoder Tomas Pfister TTA AI4TS 6 20 0 04 Feb 2022
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance Saurabh Garg Sivaraman Balakrishnan Zachary Chase Lipton Behnam Neyshabur Hanie Sedghi OODD OOD 37 124 0 11 Jan 2022
Predicting with Confidence on Unseen Distributions Devin Guillory Vaishaal Shankar Sayna Ebrahimi Trevor Darrell Ludwig Schmidt UQCV OOD 20 115 0 07 Jul 2021
Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles Jiefeng Chen Frederick Liu Besim Avci Xi Wu Yingyu Liang S. Jha 24 60 0 29 Jun 2021
Robustness Gym: Unifying the NLP Evaluation Landscape Karan Goel Nazneen Rajani Jesse Vig Samson Tan Jason M. Wu Stephan Zheng Caiming Xiong Mohit Bansal Christopher Ré AAML OffRL OOD 146 136 0 13 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,956 0 20 Apr 2018