FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

17 April 2025

Papers citing "FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents"

2 / 2 papers shown

Title
Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses Sahel Sharifymoghaddam Shivani Upadhyay Nandan Thakur Ronak Pradeep Jimmy Lin RALM 18 0 0 28 Apr 2025
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models Ronak Pradeep Nandan Thakur Shivani Upadhyay Daniel Fernando Campos Nick Craswell Jimmy Lin 17 0 0 21 Apr 2025