Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.12767
Cited By
Can we trust the evaluation on ChatGPT?
22 March 2023
Rachith Aiyappa
Jisun An
Haewoon Kwak
Yong-Yeol Ahn
ELM
ALM
LLMAG
AI4MH
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Can we trust the evaluation on ChatGPT?"
5 / 5 papers shown
Title
Generative Evaluation of Complex Reasoning in Large Language Models
Haowei Lin
X. Wang
Ruilin Yan
Baizhou Huang
Haotian Ye
Jianhua Zhu
Zihao Wang
James Y. Zou
Jianzhu Ma
Yitao Liang
ReLM
ELM
LRM
76
0
0
03 Apr 2025
German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset
Laura Mascarell
Ribin Chalumattu
Annette Rios
HILM
27
0
0
06 Mar 2024
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
16
81
0
19 May 2023
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Marzena Karpinska
Mohit Iyyer
14
81
0
06 Apr 2023
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
264
1,798
0
14 Dec 2020
1