ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.17681
  4. Cited By
VarBench: Robust Language Model Benchmarking Through Dynamic Variable
  Perturbation

VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation

25 June 2024
Kun Qian
Shunji Wan
Claudia Tang
Youzhi Wang
Xuanming Zhang
Maximillian Chen
Zhou Yu
    AAML
ArXivPDFHTML

Papers citing "VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation"

12 / 12 papers shown
Title
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
Yifan Sun
Han Wang
Dongbai Li
Gang Wang
Huan Zhang
AAML
50
0
0
20 Mar 2025
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Simin Chen
Yiming Chen
Zexin Li
Yifan Jiang
Zhongwei Wan
...
Dezhi Ran
Tianle Gu
H. Li
Tao Xie
Baishakhi Ray
38
2
0
23 Feb 2025
AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science
AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science
Chenyue Li
Wen Deng
Mengqian Lu
Binhang Yuan
ELM
AI4Cl
LRM
87
0
0
03 Feb 2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu
Qiyun Xu
Tong Xiao
Tianhao Chen
Yuchen Yan
Jiaxin Zhang
Shizhe Diao
Can Yang
Yang Wang
ELM
LRM
AI4CE
84
2
0
01 Feb 2025
Can LLMs Solve longer Math Word Problems Better?
Can LLMs Solve longer Math Word Problems Better?
Xin Xu
Tong Xiao
Zitong Chao
Zhenya Huang
Can Yang
Yang Wang
61
10
0
23 May 2024
Evading Data Contamination Detection for Language Models is (too) Easy
Evading Data Contamination Detection for Language Models is (too) Easy
Jasper Dekoninck
Mark Niklas Muller
Maximilian Baader
Marc Fischer
Martin Vechev
79
18
0
05 Feb 2024
Task Contamination: Language Models May Not Be Few-Shot Anymore
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li
Jeffrey Flanigan
79
87
0
26 Dec 2023
Don't Make Your LLM an Evaluation Benchmark Cheater
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
102
136
0
03 Nov 2023
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for
  Code Understanding and Generation
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
201
1,451
0
02 Sep 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
234
447
0
14 Jul 2021
1