Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00643
Cited By
Mandoline: Model Evaluation under Distribution Shift
1 July 2021
Mayee F. Chen
Karan Goel
N. Sohoni
Fait Poms
Kayvon Fatahalian
Christopher Ré
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mandoline: Model Evaluation under Distribution Shift"
47 / 47 papers shown
Title
Performance Estimation in Binary Classification Using Calibrated Confidence
Juhani Kivimäki
Jakub Białek
W. Kuberski
J. Nurminen
48
0
0
08 May 2025
Revisiting the attacker's knowledge in inference attacks against Searchable Symmetric Encryption
Marc Damie
Jean-Benoist Leger
Florian Hahn
Andreas Peter
AAML
43
1
0
14 Apr 2025
Evaluating Membership Inference Attacks in heterogeneous-data setups
Bram van Dartel
Marc Damie
Florian Hahn
MIACV
MIALM
181
0
0
26 Feb 2025
Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach
Firas Bayram
Bestoun S. Ahmed
OOD
34
0
0
28 Oct 2024
Detecting Interpretable Subgroup Drifts
F. Giobergia
Eliana Pastor
Luca de Alfaro
Elena Baralis
16
0
0
26 Aug 2024
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance
Thomas Decker
Alexander Koebler
Michael Lebacher
Ingo Thon
Volker Tresp
Florian Buettner
24
1
0
24 Aug 2024
LADDER: Language Driven Slice Discovery and Error Rectification
Shantanu Ghosh
Rayan Syed
Chenyu Wang
Clare B. Poynton
Kayhan Batmanghelich
34
0
0
31 Jul 2024
LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class Taxonomies
Jia Shi
Gautam Gare
Jinjin Tian
Siqi Chai
Zhiqiu Lin
Arun Vasudevan
Di Feng
Francesco Ferroni
Shu Kong
VLM
OODD
OOD
52
3
0
22 Jul 2024
A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
Riccardo Fogliato
Pratik Patil
Mathew Monfort
Pietro Perona
22
1
0
11 Jun 2024
Clarify: Improving Model Robustness With Natural Language Corrections
Yoonho Lee
Michelle S. Lam
Helena Vasconcelos
Michael S. Bernstein
Chelsea Finn
27
6
0
06 Feb 2024
Expert-Driven Monitoring of Operational ML Models
J. Leest
C. Raibulet
Ilias Gerostathopoulos
Patricia Lago
26
0
0
22 Jan 2024
Estimating Model Performance Under Covariate Shift Without Labels
Jakub Bialek
W. Kuberski
Nikolaos Perrakis
Albert Bifet
31
2
0
16 Jan 2024
Can You Rely on Your Model Evaluation? Improving Model Evaluation with Synthetic Test Data
B. V. Breugel
Nabeel Seedat
F. Imrie
M. Schaar
SyDa
24
19
0
25 Oct 2023
GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels
Xin-Yang Zheng
Miao Zhang
C. Chen
Soheila Molaei
Chuan Zhou
Shirui Pan
GNN
34
14
0
23 Oct 2023
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
Hao Chen
Jindong Wang
Ankit Shah
Ran Tao
Hongxin Wei
Berfin cSimcsek
Masashi Sugiyama
Bhiksha Raj
29
26
0
29 Sep 2023
Large Language Model Routing with Benchmark Datasets
Tal Shnitzer
Anthony Ou
Mírian Silva
Kate Soule
Yuekai Sun
Justin Solomon
Neil Thompson
Mikhail Yurochkin
RALM
11
56
0
27 Sep 2023
PAGER: A Framework for Failure Analysis of Deep Regression Models
Jayaraman J. Thiagarajan
V. Narayanaswamy
Puja Trivedi
Rushil Anirudh
33
0
0
20 Sep 2023
CAME: Contrastive Automated Model Evaluation
Ru Peng
Qiuyang Duan
Haobo Wang
Jiachen Ma
Yanbo Jiang
Yongjun Tu
Xiu Jiang
J. Zhao
ELM
23
4
0
22 Aug 2023
Distance Matters For Improving Performance Estimation Under Covariate Shift
Mélanie Roschewitz
Ben Glocker
23
1
0
14 Aug 2023
Evaluating the Robustness of Test Selection Methods for Deep Neural Networks
Qiang Hu
Yuejun Guo
Xiaofei Xie
Maxime Cordy
Wei Ma
Mike Papadakis
Yves Le Traon
NoLa
OOD
22
3
0
29 Jul 2023
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
Uddeshya Upadhyay
Shyamgopal Karthik
Massimiliano Mancini
Zeynep Akata
MLLM
VLM
16
3
0
01 Jul 2023
On Orderings of Probability Vectors and Unsupervised Performance Estimation
Muhammad Maaz
Rui Qiao
Yiheng Zhou
Renxian Zhang
16
0
0
16 Jun 2023
(Almost) Provable Error Bounds Under Distribution Shift via Disagreement Discrepancy
Elan Rosenfeld
Saurabh Garg
UQCV
32
4
0
01 Jun 2023
Characterizing Out-of-Distribution Error via Optimal Transport
Yuzhe Lu
Yilong Qin
Runtian Zhai
Andrew Shen
Ketong Chen
Zhenlin Wang
Soheil Kolouri
Simon Stepputtis
Joseph Campbell
Katia P. Sycara
OODD
32
10
0
25 May 2023
A Domain-Region Based Evaluation of ML Performance Robustness to Covariate Shift
Firas Bayram
Bestoun S. Ahmed
OOD
11
4
0
18 Apr 2023
K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation
Shuyu Miao
Lin Zheng
J. Liu
and Hong Jin
31
5
0
17 Apr 2023
On the Efficacy of Generalization Error Prediction Scoring Functions
Puja Trivedi
Danai Koutra
Jayaraman J. Thiagarajan
21
0
0
23 Mar 2023
Confidence and Dispersity Speak: Characterising Prediction Matrix for Unsupervised Accuracy Estimation
Weijian Deng
Yumin Suh
Stephen Gould
Liang Zheng
UQCV
26
12
0
02 Feb 2023
Data Models for Dataset Drift Controls in Machine Learning With Optical Images
Luis Oala
Marco Aversa
Gabriel Nobis
Kurt Willis
Yoan Neuenschwander
...
E. Pomarico
Wojciech Samek
Roderick Murray-Smith
Christoph Clausen
B. Sanguinetti
23
5
0
04 Nov 2022
Test-time Recalibration of Conformal Predictors Under Distribution Shift Based on Unlabeled Examples
Fatih Yilmaz
Reinhard Heckel
23
0
0
09 Oct 2022
From plane crashes to algorithmic harm: applicability of safety engineering frameworks for responsible ML
Shalaleh Rismani
Renee Shelby
A. Smart
Edgar W. Jatho
Joshua A. Kroll
AJung Moon
Negar Rostamzadeh
34
36
0
06 Oct 2022
HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions
Lingjiao Chen
Zhihua Jin
Sabri Eyuboglu
Christopher Ré
Matei A. Zaharia
James Y. Zou
45
9
0
18 Sep 2022
Estimating and Explaining Model Performance When Both Covariates and Labels Shift
Lingjiao Chen
Matei A. Zaharia
James Y. Zou
20
15
0
18 Sep 2022
Estimating Model Performance under Domain Shifts with Class-Specific Confidence Scores
Zeju Li
Konstantinos Kamnitsas
Mobarakol Islam
Chen Chen
Ben Glocker
24
9
0
20 Jul 2022
Predicting Out-of-Domain Generalization with Neighborhood Invariance
Nathan Ng
Neha Hulkund
Kyunghyun Cho
Marzyeh Ghassemi
OOD
18
4
0
05 Jul 2022
Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift
Christina Baek
Yiding Jiang
Aditi Raghunathan
Zico Kolter
24
79
0
27 Jun 2022
Evaluating Robustness to Dataset Shift via Parametric Robustness Sets
Nikolaj Thams
Michael Oberst
David Sontag
OOD
38
10
0
31 May 2022
Understanding new tasks through the lens of training data via exponential tilting
Subha Maity
Mikhail Yurochkin
Moulinath Banerjee
Yuekai Sun
29
10
0
26 May 2022
Evaluation Gaps in Machine Learning Practice
Ben Hutchinson
Negar Rostamzadeh
Christina Greer
Katherine A. Heller
Vinodkumar Prabhakaran
ELM
28
56
0
11 May 2022
Domino: Discovering Systematic Errors with Cross-Modal Embeddings
Sabri Eyuboglu
M. Varma
Khaled Kamal Saab
Jean-Benoit Delbrouck
Christopher Lee-Messer
Jared A. Dunnmon
James Y. Zou
Christopher Ré
22
141
0
24 Mar 2022
Predicting Out-of-Distribution Error with the Projection Norm
Yaodong Yu
Zitong Yang
Alexander Wei
Yi-An Ma
Jacob Steinhardt
OODD
12
43
0
11 Feb 2022
Self-Adaptive Forecasting for Improved Deep Learning on Non-Stationary Time-Series
Sercan Ö. Arik
Nathanael Yoder
Tomas Pfister
TTA
AI4TS
6
20
0
04 Feb 2022
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance
Saurabh Garg
Sivaraman Balakrishnan
Zachary Chase Lipton
Behnam Neyshabur
Hanie Sedghi
OODD
OOD
37
124
0
11 Jan 2022
Predicting with Confidence on Unseen Distributions
Devin Guillory
Vaishaal Shankar
Sayna Ebrahimi
Trevor Darrell
Ludwig Schmidt
UQCV
OOD
20
115
0
07 Jul 2021
Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles
Jiefeng Chen
Frederick Liu
Besim Avci
Xi Wu
Yingyu Liang
S. Jha
24
60
0
29 Jun 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Mohit Bansal
Christopher Ré
AAML
OffRL
OOD
146
136
0
13 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,956
0
20 Apr 2018
1