ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.06898
  4. Cited By
Our Evaluation Metric Needs an Update to Encourage Generalization

Our Evaluation Metric Needs an Update to Encourage Generalization

14 July 2020
Swaroop Mishra
Anjana Arunkumar
Chris Bryan
Chitta Baral
ArXiv (abs)PDFHTML

Papers citing "Our Evaluation Metric Needs an Update to Encourage Generalization"

13 / 13 papers shown
LINGO : Visually Debiasing Natural Language Instructions to Support Task
  Diversity
LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity
Anjana Arunkumar
Sanjay Kariyappa
Rakhi Agrawal
Sriramakrishnan Chandrasekaran
Chris Bryan
224
1
0
12 Apr 2023
Real-Time Visual Feedback to Guide Benchmark Creation: A
  Human-and-Metric-in-the-Loop Workflow
Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop WorkflowConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Anjana Arunkumar
Swaroop Mishra
Bhavdeep Singh Sachdeva
Chitta Baral
Chris Bryan
228
0
0
09 Feb 2023
Pretrained Transformers Do not Always Improve Robustness
Pretrained Transformers Do not Always Improve Robustness
Swaroop Mishra
Bhavdeep Singh Sachdeva
Chitta Baral
VLM
175
2
0
14 Oct 2022
A Survey of Parameters Associated with the Quality of Benchmarks in NLP
A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Swaroop Mishra
Anjana Arunkumar
Chris Bryan
Chitta Baral
231
1
0
14 Oct 2022
Investigating the Failure Modes of the AUC metric and Exploring
  Alternatives for Evaluating Systems in Safety Critical Applications
Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications
Swaroop Mishra
Anjana Arunkumar
Chitta Baral
159
0
0
10 Oct 2022
Don't Blame the Annotator: Bias Already Starts in the Annotation
  Instructions
Don't Blame the Annotator: Bias Already Starts in the Annotation InstructionsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Mihir Parmar
Swaroop Mishra
Mor Geva
Chitta Baral
516
67
0
01 May 2022
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning
  Tasks
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Swaroop Mishra
Arindam Mitra
Neeraj Varshney
Bhavdeep Singh Sachdeva
Peter Clark
Chitta Baral
Ashwin Kalyan
AIMatReLMELMLRM
366
137
0
12 Apr 2022
Generalized but not Robust? Comparing the Effects of Data Modification
  Methods on Out-of-Domain Generalization and Adversarial Robustness
Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial RobustnessFindings (Findings), 2022
Tejas Gokhale
Swaroop Mishra
Man Luo
Bhavdeep Singh Sachdeva
Chitta Baral
273
33
0
15 Mar 2022
Choose Your QA Model Wisely: A Systematic Study of Generative and
  Extractive Readers for Question Answering
Choose Your QA Model Wisely: A Systematic Study of Generative and Extractive Readers for Question Answering
Man Luo
Kazuma Hashimoto
Semih Yavuz
Zhiwei Liu
Chitta Baral
Yingbo Zhou
240
25
0
14 Mar 2022
A Proposal to Study "Is High Quality Data All We Need?"
A Proposal to Study "Is High Quality Data All We Need?"
Swaroop Mishra
Anjana Arunkumar
178
3
0
12 Mar 2022
Investigating Selective Prediction Approaches Across Several Tasks in
  IID, OOD, and Adversarial Settings
Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial SettingsFindings (Findings), 2022
Neeraj Varshney
Swaroop Mishra
Chitta Baral
334
64
0
01 Mar 2022
How Robust are Model Rankings: A Leaderboard Customization Approach for
  Equitable Evaluation
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable EvaluationAAAI Conference on Artificial Intelligence (AAAI), 2021
Swaroop Mishra
Anjana Arunkumar
237
27
0
10 Jun 2021
DQI: A Guide to Benchmark Evaluation
DQI: A Guide to Benchmark Evaluation
Swaroop Mishra
Anjana Arunkumar
Bhavdeep Singh Sachdeva
Chris Bryan
Chitta Baral
185
8
0
10 Aug 2020
1
Page 1 of 1