ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.08073
  4. Cited By
GLUE-X: Evaluating Natural Language Understanding Models from an
  Out-of-distribution Generalization Perspective

GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective

15 November 2022
Linyi Yang
Shuibai Zhang
Libo Qin
Yafu Li
Yidong Wang
Hanmeng Liu
Jindong Wang
Xingxu Xie
Yue Zhang
    ELM
ArXivPDFHTML

Papers citing "GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective"

50 / 67 papers shown
Title
Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation
Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation
Alexandre Misrahi
Nadezhda Chirkova
Maxime Louis
Vassilina Nikoulina
RALM
77
0
0
03 Apr 2025
Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation
Qiji Zhou
Yifan Gong
Guangsheng Bao
Hongjie Qiu
Jinqiang Li
Xiangrong Zhu
Huajian Zhang
Yue Zhang
LRM
44
0
0
12 Mar 2025
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
Yupeng Chang
Yi-Ju Chang
Yuan Wu
AI4CE
ALM
76
0
0
24 Feb 2025
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
Eva Sánchez Salido
Julio Gonzalo
Guillermo Marco
ELM
58
2
0
18 Feb 2025
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Qiujie Xie
Qingqiu Li
Zhuohao Yu
Yuejie Zhang
Yue Zhang
Linyi Yang
ELM
58
1
0
15 Feb 2025
Outcome-Refining Process Supervision for Code Generation
Outcome-Refining Process Supervision for Code Generation
Zhuohao Yu
Weizheng Gu
Yidong Wang
Zhengran Zeng
Jindong Wang
Wei Ye
Shikun Zhang
LRM
84
4
0
19 Dec 2024
SelfPrompt: Autonomously Evaluating LLM Robustness via
  Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
AAML
61
2
0
01 Dec 2024
Ño' Matters: Out-of-Distribution Detection in Multimodality Long
  Dialogue
Ño' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
Rena Gao
Xuetong Wu
Siwen Luo
Caren Han
Feng Liu
OODD
42
0
0
31 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic
  Reasoning Tasks
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLM
CoGe
ReLM
VLM
LRM
24
0
0
17 Oct 2024
Can Large Language Models Understand Symbolic Graphics Programs?
Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu
Weiyang Liu
Haiwen Feng
Zhen Liu
Tim Z. Xiao
Katherine M. Collins
J. Tenenbaum
Adrian Weller
Michael J. Black
Bernhard Schölkopf
46
11
0
15 Aug 2024
Benchmarks as Microscopes: A Call for Model Metrology
Benchmarks as Microscopes: A Call for Model Metrology
Michael Stephen Saxon
Ari Holtzman
Peter West
William Yang Wang
Naomi Saphra
23
10
0
22 Jul 2024
A Closer Look at Benchmarking Self-Supervised Pre-training with Image
  Classification
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification
Markus Marks
Manuel Knott
Neehar Kondapaneni
Elijah Cole
T. Defraeye
Fernando Pérez-Cruz
Pietro Perona
SSL
24
2
0
16 Jul 2024
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
Yibo Miao
Yifan Zhu
Yinpeng Dong
Lijia Yu
Jun Zhu
Xiao-Shan Gao
EGVM
31
12
0
08 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
Jimmy Huang
ELM
ALM
22
25
0
04 Jul 2024
E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models
E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models
Zhenyu Zhang
Bingguang Hao
Jinpeng Li
Zekai Zhang
Dongyan Zhao
21
0
0
16 Jun 2024
KGPA: Robustness Evaluation for Large Language Models via Cross-Domain
  Knowledge Graphs
KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
Lina Wang
29
1
0
16 Jun 2024
Evaluating the Generalization Ability of Quantized LLMs: Benchmark,
  Analysis, and Toolbox
Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox
Yijun Liu
Yuan Meng
Fang Wu
Shenhao Peng
Hang Yao
Chaoyu Guan
Chen Tang
Xinzhu Ma
Zhi Wang
Wenwu Zhu
MQ
48
7
0
15 Jun 2024
Examining the robustness of LLM evaluation to the distributional
  assumptions of benchmarks
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
Melissa Ailem
Katerina Marazopoulou
Charlotte Siska
James Bono
51
13
0
25 Apr 2024
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation
  of Large Language Models
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Zhengran Zeng
Wei Ye
Jindong Wang
Yue Zhang
Shikun Zhang
36
1
0
09 Apr 2024
A Rationale-centric Counterfactual Data Augmentation Method for
  Cross-Document Event Coreference Resolution
A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution
Bowen Ding
Qingkai Min
Shengkun Ma
Yingjie Li
Linyi Yang
Yue Zhang
33
3
0
02 Apr 2024
A Survey on Evaluation of Out-of-Distribution Generalization
A Survey on Evaluation of Out-of-Distribution Generalization
Han Yu
Jiashuo Liu
Xingxuan Zhang
Jiayun Wu
Peng Cui
OOD
42
9
0
04 Mar 2024
FAC$^2$E: Better Understanding Large Language Model Capabilities by
  Dissociating Language and Cognition
FAC2^22E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition
Xiaoqiang Wang
Bang Liu
Lingfei Wu
22
0
0
29 Feb 2024
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large
  Language Models
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Wei Ye
Jindong Wang
Xing Xie
Yue Zhang
Shikun Zhang
32
20
0
23 Feb 2024
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Kaijie Zhu
Jindong Wang
Qinlin Zhao
Ruochen Xu
Xing Xie
37
30
0
21 Feb 2024
Dive into the Chasm: Probing the Gap between In- and Cross-Topic
  Generalization
Dive into the Chasm: Probing the Gap between In- and Cross-Topic Generalization
Andreas Waldis
Yufang Hou
Iryna Gurevych
ELM
24
7
0
02 Feb 2024
An Empirical Study on Large Language Models in Accuracy and Robustness
  under Chinese Industrial Scenarios
An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios
Zongjie Li
Wenying Qiu
Pingchuan Ma
Yichen Li
You Li
Sijia He
Baozheng Jiang
Shuai Wang
Weixi Gu
13
2
0
27 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
52
56
0
11 Jan 2024
Supervised Knowledge Makes Large Language Models Better In-context
  Learners
Supervised Knowledge Makes Large Language Models Better In-context Learners
Linyi Yang
Shuibai Zhang
Zhuohao Yu
Guangsheng Bao
Yidong Wang
...
Ruochen Xu
Weirong Ye
Xing Xie
Weizhu Chen
Yue Zhang
16
14
0
26 Dec 2023
Domain Invariant Learning for Gaussian Processes and Bayesian
  Exploration
Domain Invariant Learning for Gaussian Processes and Bayesian Exploration
Xilong Zhao
Siyuan Bian
Yaoyun Zhang
Yuliang Zhang
Qinying Gu
Xinbing Wang
Cheng Zhou
Nanyang Ye
21
1
0
18 Dec 2023
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A
  Hate Speech Detection Case Study
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study
Maike Zufle
Verna Dankers
Ivan Titov
17
0
0
16 Nov 2023
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Ashim Gupta
Rishanth Rajendhran
Nathan Stringham
Vivek Srikumar
Ana Marasović
AAML
23
3
0
16 Nov 2023
Generalization Analogies: A Testbed for Generalizing AI Oversight to
  Hard-To-Measure Domains
Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
Joshua Clymer
Garrett Baker
Rohan Subramani
Sam Wang
14
6
0
13 Nov 2023
Unlocking Emergent Modularity in Large Language Models
Unlocking Emergent Modularity in Large Language Models
Zihan Qiu
Zeyu Huang
Jie Fu
12
8
0
17 Oct 2023
Meta Semantic Template for Evaluation of Large Language Models
Meta Semantic Template for Evaluation of Large Language Models
Yachuan Liu
Liang Chen
Jindong Wang
Qiaozhu Mei
Xing Xie
17
0
0
01 Oct 2023
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
A. Maritan
Jiaao Chen
S. Dey
Luca Schenato
Diyi Yang
Xing Xie
ELM
LRM
14
42
0
29 Sep 2023
Understanding and Mitigating the Label Noise in Pre-training on
  Downstream Tasks
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
Hao Chen
Jindong Wang
Ankit Shah
Ran Tao
Hongxin Wei
Berfin cSimcsek
Masashi Sugiyama
Bhiksha Raj
22
26
0
29 Sep 2023
How to Handle Different Types of Out-of-Distribution Scenarios in
  Computational Argumentation? A Comprehensive and Fine-Grained Field Study
How to Handle Different Types of Out-of-Distribution Scenarios in Computational Argumentation? A Comprehensive and Fine-Grained Field Study
Andreas Waldis
Yufang Hou
Iryna Gurevych
25
2
0
15 Sep 2023
Classical Out-of-Distribution Detection Methods Benchmark in Text
  Classification Tasks
Classical Out-of-Distribution Detection Methods Benchmark in Text Classification Tasks
M. Baran
Joanna Baran
Mateusz Wójcik
Maciej Ziȩba
Adam Gonczarek
OODD
19
4
0
13 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
58
1,464
0
06 Jul 2023
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning
  Optimization
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Yidong Wang
Zhuohao Yu
Zhengran Zeng
Linyi Yang
Cunxiang Wang
...
Jindong Wang
Xingxu Xie
Wei Ye
Shi-Bo Zhang
Yue Zhang
ALM
ELM
35
222
0
08 Jun 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
  and LLMs Evaluations
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
32
72
0
07 Jun 2023
PromptRobust: Towards Evaluating the Robustness of Large Language Models
  on Adversarial Prompts
PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Kaijie Zhu
Jindong Wang
Jiaheng Zhou
Zichen Wang
Hao Chen
...
Linyi Yang
Weirong Ye
Yue Zhang
Neil Zhenqiang Gong
Xingxu Xie
SILM
29
146
0
07 Jun 2023
From Adversarial Arms Race to Model-centric Evaluation: Motivating a
  Unified Automatic Robustness Evaluation Framework
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework
Yangyi Chen
Hongcheng Gao
Ganqu Cui
Lifan Yuan
Dehan Kong
...
Longtao Huang
H. Xue
Zhiyuan Liu
Maosong Sun
Heng Ji
AAML
ELM
18
6
0
29 May 2023
Rethinking the Evaluation Protocol of Domain Generalization
Rethinking the Evaluation Protocol of Domain Generalization
Han Yu
Xingxuan Zhang
Renzhe Xu
Jiashuo Liu
Yue He
Peng Cui
OOD
19
7
0
24 May 2023
Out-of-Distribution Generalization in Text Classification: Past,
  Present, and Future
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang
Y. Song
Xuan Ren
Chenyang Lyu
Yidong Wang
Lingqiao Liu
Jindong Wang
Jennifer Foster
Yue Zhang
OOD
20
2
0
23 May 2023
Robust Prompt Optimization for Large Language Models Against
  Distribution Shifts
Robust Prompt Optimization for Large Language Models Against Distribution Shifts
Moxin Li
Wenjie Wang
Fuli Feng
Yixin Cao
Jizhi Zhang
Tat-Seng Chua
OffRL
40
14
0
23 May 2023
Regex-augmented Domain Transfer Topic Classification based on a
  Pre-trained Language Model: An application in Financial Domain
Regex-augmented Domain Transfer Topic Classification based on a Pre-trained Language Model: An application in Financial Domain
Vanessa Liao
Syed Shariyar Murtaza
Yifan Nie
Jimmy J. Lin
6
0
0
23 May 2023
Consistency Regularization for Domain Generalization with Logit
  Attribution Matching
Consistency Regularization for Domain Generalization with Logit Attribution Matching
Han Gao
Kaican Li
Weiyan Xie
Zhi Lin
Yongxiang Huang
Luning Wang
Caleb Chen Cao
N. Zhang
13
2
0
13 May 2023
On the Usage of Continual Learning for Out-of-Distribution
  Generalization in Pre-trained Language Models of Code
On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code
M. Weyssow
Xin Zhou
Kisub Kim
David Lo
H. Sahraoui
CLL
KELM
19
9
0
06 May 2023
Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating Holistic Domain Knowledge of Large Language Model--A Preliminary Release
Zhouhong Gu
Xiaoxuan Zhu
Haoning Ye
Lin Zhang
Zhuozhi Xiong
Zihan Li
Qi He
Sihang Jiang
Hongwei Feng
Yanghua Xiao
ELM
ALM
32
2
0
23 Apr 2023
12
Next