ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.00393
  4. Cited By
TRUCE: Private Benchmarking to Prevent Contamination and Improve
  Comparative Evaluation of LLMs

TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs

1 March 2024
Tanmay Rajore
Nishanth Chandran
Sunayana Sitaram
Divya Gupta
Rahul Sharma
Kashish Mittal
Manohar Swaminathan
ArXivPDFHTML

Papers citing "TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs"

11 / 11 papers shown
Title
TLUE: A Tibetan Language Understanding Evaluation Benchmark
TLUE: A Tibetan Language Understanding Evaluation Benchmark
Fan Gao
Cheng Huang
Nyima Tashi
Xiangxiang Wang
Thupten Tsering
...
Gadeng Luosang
Rinchen Dongrub
Dorje Tashi
Xiao Feng
Yongbin Yu
ELM
76
2
0
15 Mar 2025
LSHBloom: Memory-efficient, Extreme-scale Document Deduplication
LSHBloom: Memory-efficient, Extreme-scale Document Deduplication
A. Khan
Robert Underwood
Carlo Siebenschuh
Y. Babuji
Aswathy Ajith
Kyle Hippe
Ozan Gokdemir
Alexander Brace
Kyle Chard
Ian T. Foster
38
0
0
06 Nov 2024
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Eva Sánchez Salido
Roser Morante
Julio Gonzalo
Guillermo Marco
Jorge Carrillo-de-Albornoz
...
Enrique Amigó
Andrés Fernández
Alejandro Benito-Santos
Adrián Ghajari Espinosa
Victor Fresno
ELM
39
0
0
19 Sep 2024
Benchmark Data Contamination of Large Language Models: A Survey
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
38
0
06 Jun 2024
Task Contamination: Language Models May Not Be Few-Shot Anymore
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li
Jeffrey Flanigan
95
91
0
26 Dec 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
221
571
0
03 May 2023
CrypTFlow2: Practical 2-Party Secure Inference
CrypTFlow2: Practical 2-Party Secure Inference
Deevashwer Rathee
Mayank Rathee
Nishant Kumar
Nishanth Chandran
Divya Gupta
Aseem Rastogi
Rahul Sharma
77
301
0
13 Oct 2020
MLQA: Evaluating Cross-lingual Extractive Question Answering
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
244
491
0
16 Oct 2019
Are We Modeling the Task or the Annotator? An Investigation of Annotator
  Bias in Natural Language Understanding Datasets
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
237
319
0
21 Aug 2019
Hypothesis Only Baselines in Natural Language Inference
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
190
576
0
02 May 2018
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
287
39,194
0
01 Sep 2014
1