ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.08275
  4. Cited By
A Large-Scale Study of Relevance Assessments with Large Language Models:
  An Initial Look

A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look

13 November 2024
Shivani Upadhyay
Ronak Pradeep
Nandan Thakur
Daniel Fernando Campos
Nick Craswell
I. Soboroff
Hoa Trang Dang
Jimmy J. Lin
ArXivPDFHTML

Papers citing "A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look"

11 / 11 papers shown
Title
LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations
LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations
Laura Dietz
Oleg Zendel
P. Bailey
Charles L. A. Clarke
Ellese Cotterill
Jeff Dalton
Faegheh Hasibi
Mark Sanderson
Nick Craswell
ELM
43
0
0
27 Apr 2025
The Viability of Crowdsourcing for RAG Evaluation
The Viability of Crowdsourcing for RAG Evaluation
Lukas Gienapp
Tim Hagen
Maik Frobe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
21
0
0
22 Apr 2025
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Ronak Pradeep
Nandan Thakur
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
22
0
0
21 Apr 2025
Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges
Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges
Nandan Thakur
Ronak Pradeep
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
ELM
33
0
0
21 Apr 2025
LLM-Driven Usefulness Judgment for Web Search Evaluation
LLM-Driven Usefulness Judgment for Web Search Evaluation
Mouly Dewan
Jiqun Liu
Aditya Gautam
Chirag Shah
38
0
0
19 Apr 2025
Benchmarking LLM-based Relevance Judgment Methods
Benchmarking LLM-based Relevance Judgment Methods
Negar Arabzadeh
Charles L. A. Clarke
31
0
0
17 Apr 2025
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Nandan Thakur
Jimmy J. Lin
Sam Havens
Michael Carbin
Omar Khattab
Andrew Drozdov
36
2
0
17 Apr 2025
A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment
A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment
Negar Arabzadeh
Charles L. A. Clarke
27
1
0
16 Apr 2025
LLM-Driven Usefulness Labeling for IR Evaluation
Mouly Dewan
Jiqun Liu
Chirag Shah
54
0
0
13 Mar 2025
Improving the Reusability of Conversational Search Test Collections
Zahra Abbasiantaeb
Chuan Meng
Leif Azzopardi
Mohammad Aliannejadi
KELM
52
0
0
12 Mar 2025
LLM-Assisted Relevance Assessments: When Should We Ask LLMs for Help?
LLM-Assisted Relevance Assessments: When Should We Ask LLMs for Help?
Rikiya Takehi
E. Voorhees
Tetsuya Sakai
I. Soboroff
114
2
0
11 Nov 2024
1