Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy

v1v2 (latest)

Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy

20 January 2025

Saeid Asgari Taghanaki

ArXiv (abs)PDF HTML

Papers citing "Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy"

10 / 10 papers shown

Title
A Statistical Analysis of LLMs' Self-Evaluation Using Proverbs Ryosuke Sonoda Ramya Srinivasan 257 1 0 22 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-EvaluationInternational Conference on Learning Representations (ICLR), 2024 Yiming Wang Pei Zhang Baosong Yang Yang Li Rui Wang LRM 334 29 0 17 Oct 2024
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content Joao Monteiro Pierre-Andre Noel Étienne Marcotte Sai Rajeswar Valentina Zantedeschi David Vazquez Nicolas Chapados Christopher Pal Perouz Taslakian 117 19 0 17 Jun 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Yubo Wang Xueguang Ma Ge Zhang Yuansheng Ni Abhranil Chandra ... Kai Wang Alex Zhuang Rongqi Fan Xiang Yue Wenhu Chen LRM ELM 561 1,005 0 03 Jun 2024
Language Models can Evaluate Themselves via Probability DiscrepancyAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Tingyu Xia Bowen Yu Yuan Wu Yi-Ju Chang Chang Zhou ELM 226 9 0 17 May 2024
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference Jo˜ao Monteiro Étienne Marcotte Pierre-Andre Noel Valentina Zantedeschi David Vázquez Nicolas Chapados Christopher Pal Perouz Taslakian 143 8 0 23 Apr 2024
LLaMA: Open and Efficient Foundation Language Models Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux ... Faisal Azhar Aurelien Rodriguez Armand Joulin Edouard Grave Guillaume Lample ALM PILM 2.7K 17,335 0 27 Feb 2023
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020 Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 2.0K 51,377 0 28 May 2020
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 2.8K 106,996 0 11 Oct 2018
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar Jian Zhang Konstantin Lopyrev Abigail Z. Jacobs RALM 656 8,850 0 16 Jun 2016