ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.00643
  4. Cited By
Mandoline: Model Evaluation under Distribution Shift

Mandoline: Model Evaluation under Distribution Shift

1 July 2021
Mayee F. Chen
Karan Goel
N. Sohoni
Fait Poms
Kayvon Fatahalian
Christopher Ré
ArXivPDFHTML

Papers citing "Mandoline: Model Evaluation under Distribution Shift"

47 / 47 papers shown
Title
Performance Estimation in Binary Classification Using Calibrated Confidence
Performance Estimation in Binary Classification Using Calibrated Confidence
Juhani Kivimäki
Jakub Białek
W. Kuberski
J. Nurminen
48
0
0
08 May 2025
Revisiting the attacker's knowledge in inference attacks against Searchable Symmetric Encryption
Revisiting the attacker's knowledge in inference attacks against Searchable Symmetric Encryption
Marc Damie
Jean-Benoist Leger
Florian Hahn
Andreas Peter
AAML
43
1
0
14 Apr 2025
Evaluating Membership Inference Attacks in heterogeneous-data setups
Evaluating Membership Inference Attacks in heterogeneous-data setups
Bram van Dartel
Marc Damie
Florian Hahn
MIACV
MIALM
181
0
0
26 Feb 2025
Towards Trustworthy Machine Learning in Production: An Overview of the
  Robustness in MLOps Approach
Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach
Firas Bayram
Bestoun S. Ahmed
OOD
34
0
0
28 Oct 2024
Detecting Interpretable Subgroup Drifts
Detecting Interpretable Subgroup Drifts
F. Giobergia
Eliana Pastor
Luca de Alfaro
Elena Baralis
16
0
0
26 Aug 2024
Explanatory Model Monitoring to Understand the Effects of Feature Shifts
  on Performance
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance
Thomas Decker
Alexander Koebler
Michael Lebacher
Ingo Thon
Volker Tresp
Florian Buettner
24
1
0
24 Aug 2024
LADDER: Language Driven Slice Discovery and Error Rectification
LADDER: Language Driven Slice Discovery and Error Rectification
Shantanu Ghosh
Rayan Syed
Chenyu Wang
Clare B. Poynton
Kayhan Batmanghelich
34
0
0
31 Jul 2024
LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with
  Class Taxonomies
LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class Taxonomies
Jia Shi
Gautam Gare
Jinjin Tian
Siqi Chai
Zhiqiu Lin
Arun Vasudevan
Di Feng
Francesco Ferroni
Shu Kong
VLM
OODD
OOD
52
3
0
22 Jul 2024
A Framework for Efficient Model Evaluation through Stratification,
  Sampling, and Estimation
A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
Riccardo Fogliato
Pratik Patil
Mathew Monfort
Pietro Perona
22
1
0
11 Jun 2024
Clarify: Improving Model Robustness With Natural Language Corrections
Clarify: Improving Model Robustness With Natural Language Corrections
Yoonho Lee
Michelle S. Lam
Helena Vasconcelos
Michael S. Bernstein
Chelsea Finn
27
6
0
06 Feb 2024
Expert-Driven Monitoring of Operational ML Models
Expert-Driven Monitoring of Operational ML Models
J. Leest
C. Raibulet
Ilias Gerostathopoulos
Patricia Lago
26
0
0
22 Jan 2024
Estimating Model Performance Under Covariate Shift Without Labels
Estimating Model Performance Under Covariate Shift Without Labels
Jakub Bialek
W. Kuberski
Nikolaos Perrakis
Albert Bifet
31
2
0
16 Jan 2024
Can You Rely on Your Model Evaluation? Improving Model Evaluation with
  Synthetic Test Data
Can You Rely on Your Model Evaluation? Improving Model Evaluation with Synthetic Test Data
B. V. Breugel
Nabeel Seedat
F. Imrie
M. Schaar
SyDa
24
19
0
25 Oct 2023
GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels
GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels
Xin-Yang Zheng
Miao Zhang
C. Chen
Soheila Molaei
Chuan Zhou
Shirui Pan
GNN
34
14
0
23 Oct 2023
Understanding and Mitigating the Label Noise in Pre-training on
  Downstream Tasks
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
Hao Chen
Jindong Wang
Ankit Shah
Ran Tao
Hongxin Wei
Berfin cSimcsek
Masashi Sugiyama
Bhiksha Raj
29
26
0
29 Sep 2023
Large Language Model Routing with Benchmark Datasets
Large Language Model Routing with Benchmark Datasets
Tal Shnitzer
Anthony Ou
Mírian Silva
Kate Soule
Yuekai Sun
Justin Solomon
Neil Thompson
Mikhail Yurochkin
RALM
11
56
0
27 Sep 2023
PAGER: A Framework for Failure Analysis of Deep Regression Models
PAGER: A Framework for Failure Analysis of Deep Regression Models
Jayaraman J. Thiagarajan
V. Narayanaswamy
Puja Trivedi
Rushil Anirudh
33
0
0
20 Sep 2023
CAME: Contrastive Automated Model Evaluation
CAME: Contrastive Automated Model Evaluation
Ru Peng
Qiuyang Duan
Haobo Wang
Jiachen Ma
Yanbo Jiang
Yongjun Tu
Xiu Jiang
J. Zhao
ELM
23
4
0
22 Aug 2023
Distance Matters For Improving Performance Estimation Under Covariate
  Shift
Distance Matters For Improving Performance Estimation Under Covariate Shift
Mélanie Roschewitz
Ben Glocker
23
1
0
14 Aug 2023
Evaluating the Robustness of Test Selection Methods for Deep Neural
  Networks
Evaluating the Robustness of Test Selection Methods for Deep Neural Networks
Qiang Hu
Yuejun Guo
Xiaofei Xie
Maxime Cordy
Wei Ma
Mike Papadakis
Yves Le Traon
NoLa
OOD
22
3
0
29 Jul 2023
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
Uddeshya Upadhyay
Shyamgopal Karthik
Massimiliano Mancini
Zeynep Akata
MLLM
VLM
16
3
0
01 Jul 2023
On Orderings of Probability Vectors and Unsupervised Performance
  Estimation
On Orderings of Probability Vectors and Unsupervised Performance Estimation
Muhammad Maaz
Rui Qiao
Yiheng Zhou
Renxian Zhang
16
0
0
16 Jun 2023
(Almost) Provable Error Bounds Under Distribution Shift via Disagreement
  Discrepancy
(Almost) Provable Error Bounds Under Distribution Shift via Disagreement Discrepancy
Elan Rosenfeld
Saurabh Garg
UQCV
32
4
0
01 Jun 2023
Characterizing Out-of-Distribution Error via Optimal Transport
Characterizing Out-of-Distribution Error via Optimal Transport
Yuzhe Lu
Yilong Qin
Runtian Zhai
Andrew Shen
Ketong Chen
Zhenlin Wang
Soheil Kolouri
Simon Stepputtis
Joseph Campbell
Katia P. Sycara
OODD
32
10
0
25 May 2023
A Domain-Region Based Evaluation of ML Performance Robustness to
  Covariate Shift
A Domain-Region Based Evaluation of ML Performance Robustness to Covariate Shift
Firas Bayram
Bestoun S. Ahmed
OOD
11
4
0
18 Apr 2023
K-means Clustering Based Feature Consistency Alignment for Label-free
  Model Evaluation
K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation
Shuyu Miao
Lin Zheng
J. Liu
and Hong Jin
31
5
0
17 Apr 2023
On the Efficacy of Generalization Error Prediction Scoring Functions
On the Efficacy of Generalization Error Prediction Scoring Functions
Puja Trivedi
Danai Koutra
Jayaraman J. Thiagarajan
21
0
0
23 Mar 2023
Confidence and Dispersity Speak: Characterising Prediction Matrix for
  Unsupervised Accuracy Estimation
Confidence and Dispersity Speak: Characterising Prediction Matrix for Unsupervised Accuracy Estimation
Weijian Deng
Yumin Suh
Stephen Gould
Liang Zheng
UQCV
26
12
0
02 Feb 2023
Data Models for Dataset Drift Controls in Machine Learning With Optical
  Images
Data Models for Dataset Drift Controls in Machine Learning With Optical Images
Luis Oala
Marco Aversa
Gabriel Nobis
Kurt Willis
Yoan Neuenschwander
...
E. Pomarico
Wojciech Samek
Roderick Murray-Smith
Christoph Clausen
B. Sanguinetti
23
5
0
04 Nov 2022
Test-time Recalibration of Conformal Predictors Under Distribution Shift
  Based on Unlabeled Examples
Test-time Recalibration of Conformal Predictors Under Distribution Shift Based on Unlabeled Examples
Fatih Yilmaz
Reinhard Heckel
23
0
0
09 Oct 2022
From plane crashes to algorithmic harm: applicability of safety
  engineering frameworks for responsible ML
From plane crashes to algorithmic harm: applicability of safety engineering frameworks for responsible ML
Shalaleh Rismani
Renee Shelby
A. Smart
Edgar W. Jatho
Joshua A. Kroll
AJung Moon
Negar Rostamzadeh
34
36
0
06 Oct 2022
HAPI: A Large-scale Longitudinal Dataset of Commercial ML API
  Predictions
HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions
Lingjiao Chen
Zhihua Jin
Sabri Eyuboglu
Christopher Ré
Matei A. Zaharia
James Y. Zou
45
9
0
18 Sep 2022
Estimating and Explaining Model Performance When Both Covariates and
  Labels Shift
Estimating and Explaining Model Performance When Both Covariates and Labels Shift
Lingjiao Chen
Matei A. Zaharia
James Y. Zou
20
15
0
18 Sep 2022
Estimating Model Performance under Domain Shifts with Class-Specific
  Confidence Scores
Estimating Model Performance under Domain Shifts with Class-Specific Confidence Scores
Zeju Li
Konstantinos Kamnitsas
Mobarakol Islam
Chen Chen
Ben Glocker
24
9
0
20 Jul 2022
Predicting Out-of-Domain Generalization with Neighborhood Invariance
Predicting Out-of-Domain Generalization with Neighborhood Invariance
Nathan Ng
Neha Hulkund
Kyunghyun Cho
Marzyeh Ghassemi
OOD
18
4
0
05 Jul 2022
Agreement-on-the-Line: Predicting the Performance of Neural Networks
  under Distribution Shift
Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift
Christina Baek
Yiding Jiang
Aditi Raghunathan
Zico Kolter
24
79
0
27 Jun 2022
Evaluating Robustness to Dataset Shift via Parametric Robustness Sets
Evaluating Robustness to Dataset Shift via Parametric Robustness Sets
Nikolaj Thams
Michael Oberst
David Sontag
OOD
38
10
0
31 May 2022
Understanding new tasks through the lens of training data via
  exponential tilting
Understanding new tasks through the lens of training data via exponential tilting
Subha Maity
Mikhail Yurochkin
Moulinath Banerjee
Yuekai Sun
29
10
0
26 May 2022
Evaluation Gaps in Machine Learning Practice
Evaluation Gaps in Machine Learning Practice
Ben Hutchinson
Negar Rostamzadeh
Christina Greer
Katherine A. Heller
Vinodkumar Prabhakaran
ELM
28
56
0
11 May 2022
Domino: Discovering Systematic Errors with Cross-Modal Embeddings
Domino: Discovering Systematic Errors with Cross-Modal Embeddings
Sabri Eyuboglu
M. Varma
Khaled Kamal Saab
Jean-Benoit Delbrouck
Christopher Lee-Messer
Jared A. Dunnmon
James Y. Zou
Christopher Ré
22
141
0
24 Mar 2022
Predicting Out-of-Distribution Error with the Projection Norm
Predicting Out-of-Distribution Error with the Projection Norm
Yaodong Yu
Zitong Yang
Alexander Wei
Yi-An Ma
Jacob Steinhardt
OODD
12
43
0
11 Feb 2022
Self-Adaptive Forecasting for Improved Deep Learning on Non-Stationary
  Time-Series
Self-Adaptive Forecasting for Improved Deep Learning on Non-Stationary Time-Series
Sercan Ö. Arik
Nathanael Yoder
Tomas Pfister
TTA
AI4TS
6
20
0
04 Feb 2022
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance
Saurabh Garg
Sivaraman Balakrishnan
Zachary Chase Lipton
Behnam Neyshabur
Hanie Sedghi
OODD
OOD
37
124
0
11 Jan 2022
Predicting with Confidence on Unseen Distributions
Predicting with Confidence on Unseen Distributions
Devin Guillory
Vaishaal Shankar
Sayna Ebrahimi
Trevor Darrell
Ludwig Schmidt
UQCV
OOD
20
115
0
07 Jul 2021
Detecting Errors and Estimating Accuracy on Unlabeled Data with
  Self-training Ensembles
Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles
Jiefeng Chen
Frederick Liu
Besim Avci
Xi Wu
Yingyu Liang
S. Jha
24
60
0
29 Jun 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Mohit Bansal
Christopher Ré
AAML
OffRL
OOD
146
136
0
13 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,956
0
20 Apr 2018
1