ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.04850
  4. Cited By
Rethinking Benchmark and Contamination for Language Models with
  Rephrased Samples

Rethinking Benchmark and Contamination for Language Models with Rephrased Samples

8 November 2023
Shuo Yang
Wei-Lin Chiang
Lianmin Zheng
Joseph E. Gonzalez
Ion Stoica
    ALM
ArXivPDFHTML

Papers citing "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"

34 / 84 papers shown
Title
Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large
  Language Models
Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models
Eldar Kurtic
Amir Moeini
Dan Alistarh
LRM
29
2
0
18 Jun 2024
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and
  BenchBuilder Pipeline
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Tianle Li
Wei-Lin Chiang
Evan Frick
Lisa Dunlap
Tianhao Wu
Banghua Zhu
Joseph E. Gonzalez
Ion Stoica
ALM
33
115
0
17 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
34
38
0
06 Jun 2024
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial
  Actions across X Community Notes and Wikipedia edits
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Tim Franzmeyer
Aleksandar Shtedritski
Samuel Albanie
Philip H. S. Torr
João F. Henriques
Jakob N. Foerster
19
1
0
05 Jun 2024
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Jinjie Ni
Fuzhao Xue
Xiang Yue
Yuntian Deng
Mahir Shah
Kabir Jain
Graham Neubig
Yang You
ELM
30
35
0
03 Jun 2024
ConStat: Performance-Based Contamination Detection in Large Language
  Models
ConStat: Performance-Based Contamination Detection in Large Language Models
Jasper Dekoninck
Mark Niklas Muller
Martin Vechev
32
5
0
25 May 2024
LMD3: Language Model Data Density Dependence
LMD3: Language Model Data Density Dependence
John Kirchenbauer
Garrett Honke
Gowthami Somepalli
Jonas Geiping
Daphne Ippolito
Katherine Lee
Tom Goldstein
David Andre
22
6
0
10 May 2024
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and
  Natural User Prompts
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts
Shudan Zhang
Hanlin Zhao
Xiao Liu
Qinkai Zheng
Zehan Qi
Xiaotao Gu
Xiaohan Zhang
Yuxiao Dong
Jie Tang
ELM
52
16
0
07 May 2024
A Philosophical Introduction to Language Models - Part II: The Way
  Forward
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
52
11
0
06 May 2024
Aloe: A Family of Fine-tuned Open Healthcare LLMs
Aloe: A Family of Fine-tuned Open Healthcare LLMs
Ashwin Kumar Gururajan
Enrique Lopez-Cuena
Jordi Bayarri-Planas
Adrián Tormos
Daniel Hinjos
...
Lucia Urcelay-Ganzabal
Marta Gonzalez-Mallo
Sergio Álvarez Napagao
Eduard Ayguadé-Parra
Ulises Cortés Dario Garcia-Gasulla
ELM
LM&MA
24
12
0
03 May 2024
Benchmarking Benchmark Leakage in Large Language Models
Benchmarking Benchmark Leakage in Large Language Models
Ruijie Xu
Zengzhi Wang
Run-Ze Fan
Pengfei Liu
53
42
0
29 Apr 2024
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging
  LLMs' (Lack of) Multicultural Knowledge
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Yu Ying Chiu
Amirhossein Ajalloeian
Maria Antoniak
Chan Young Park
Shuyue Stella Li
Mehar Bhatia
Sahithya Ravi
Yulia Tsvetkov
Vered Shwartz
Yejin Choi
36
20
0
10 Apr 2024
Elephants Never Forget: Memorization and Learning of Tabular Data in
  Large Language Models
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
Sebastian Bordt
Harsha Nori
Vanessa Rodrigues
Besmira Nushi
Rich Caruana
36
12
0
09 Apr 2024
Conifer: Improving Complex Constrained Instruction-Following Ability of
  Large Language Models
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
Haoran Sun
Lixin Liu
Junjie Li
Fengyu Wang
Baohua Dong
Ran Lin
Ruohui Huang
25
14
0
03 Apr 2024
Are LLMs Effective Backbones for Fine-tuning? An Experimental
  Investigation of Supervised LLMs on Chinese Short Text Matching
Are LLMs Effective Backbones for Fine-tuning? An Experimental Investigation of Supervised LLMs on Chinese Short Text Matching
Shulin Liu
Chengcheng Xu
Hao Liu
T. Yu
Tao Yang
24
1
0
29 Mar 2024
Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models
Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models
Hyunbyung Park
Sukyung Lee
Gyoungjin Gim
Yungi Kim
Dahyun Kim
Chanjun Park
VLM
29
0
0
28 Mar 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
29
260
0
12 Mar 2024
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
Linyuan Gong
Sida Wang
Mostafa Elhoushi
Alvin Cheung
27
15
0
07 Mar 2024
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang
Lianmin Zheng
Ying Sheng
Anastasios Nikolas Angelopoulos
Tianle Li
...
Hao Zhang
Banghua Zhu
Michael I. Jordan
Joseph E. Gonzalez
Ion Stoica
OSLM
16
469
0
07 Mar 2024
Quantifying Contamination in Evaluating Code Generation Capabilities of
  Language Models
Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models
Martin Riddell
Ansong Ni
Arman Cohan
ELM
29
28
0
06 Mar 2024
A Comprehensive Evaluation of Quantization Strategies for Large Language
  Models
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
Renren Jin
Jiangcun Du
Wuwei Huang
Wei Liu
Jian Luan
Bin Wang
Deyi Xiong
MQ
25
30
0
26 Feb 2024
Generalization or Memorization: Data Contamination and Trustworthy
  Evaluation for Large Language Models
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models
Yihong Dong
Xue Jiang
Huanyu Liu
Zhi Jin
Bin Gu
Mengfei Yang
Ge Li
21
43
0
24 Feb 2024
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Kaijie Zhu
Jindong Wang
Qinlin Zhao
Ruochen Xu
Xing Xie
37
30
0
21 Feb 2024
TreeEval: Benchmark-Free Evaluation of Large Language Models through
  Tree Planning
TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning
Xiang Li
Yunshi Lan
Chao Yang
ELM
38
7
0
20 Feb 2024
Do Membership Inference Attacks Work on Large Language Models?
Do Membership Inference Attacks Work on Large Language Models?
Michael Duan
Anshuman Suri
Niloofar Mireshghallah
Sewon Min
Weijia Shi
Luke Zettlemoyer
Yulia Tsvetkov
Yejin Choi
David E. Evans
Hanna Hajishirzi
MIALM
27
80
0
12 Feb 2024
Evading Data Contamination Detection for Language Models is (too) Easy
Evading Data Contamination Detection for Language Models is (too) Easy
Jasper Dekoninck
Mark Niklas Muller
Maximilian Baader
Marc Fischer
Martin Vechev
85
18
0
05 Feb 2024
On Catastrophic Inheritance of Large Foundation Models
On Catastrophic Inheritance of Large Foundation Models
Hao Chen
Bhiksha Raj
Xing Xie
Jindong Wang
AI4CE
48
12
0
02 Feb 2024
Orion-14B: Open-source Multilingual Large Language Models
Orion-14B: Open-source Multilingual Large Language Models
Du Chen
Yi Huang
Xiaopu Li
Yongqiang Li
Yongqiang Liu
Haihui Pan
Leichao Xu
Dacheng Zhang
Zhipeng Zhang
Kun Han
16
4
0
20 Jan 2024
Investigating Data Contamination for Pre-training Language Models
Investigating Data Contamination for Pre-training Language Models
Minhao Jiang
Ken Ziyu Liu
Ming Zhong
Rylan Schaeffer
Siru Ouyang
Jiawei Han
Sanmi Koyejo
17
62
0
11 Jan 2024
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
Zhongshen Zeng
Pengguang Chen
Shu Liu
Haiyun Jiang
Jiaya Jia
ReLM
ELM
LRM
27
18
0
28 Dec 2023
Competition-Level Problems are Effective LLM Evaluators
Competition-Level Problems are Effective LLM Evaluators
Yiming Huang
Zheng-Wen Lin
Xiao Liu
Yeyun Gong
Shuai Lu
...
Yaobo Liang
Yelong Shen
Chen Lin
Nan Duan
Weizhu Chen
ELM
LRM
30
24
0
04 Dec 2023
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in
  Large Language Models
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
Shahriar Golchin
Mihai Surdeanu
16
24
0
10 Nov 2023
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
A. Maritan
Jiaao Chen
S. Dey
Luca Schenato
Diyi Yang
Xing Xie
ELM
LRM
14
42
0
29 Sep 2023
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,798
0
14 Dec 2020
Previous
12