ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.03608
  4. Cited By
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and
  Generation

TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation

4 October 2024
Jonathan Cook
Tim Rocktaschel
Jakob Foerster
Dennis Aumiller
Alex Wang
    ALM
ArXivPDFHTML

Papers citing "TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation"

7 / 7 papers shown
Title
EvalAgent: Discovering Implicit Evaluation Criteria from the Web
EvalAgent: Discovering Implicit Evaluation Criteria from the Web
Manya Wadhwa
Zayne Sprague
Chaitanya Malaviya
Philippe Laban
Junyi Jessy Li
Greg Durrett
25
0
0
21 Apr 2025
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
Mingxuan Li
Hanchen Li
Chenhao Tan
ALM
ELM
39
0
0
09 Apr 2025
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi
Hritik Bansal
Arian Hosseini
Aditya Grover
Kai-Wei Chang
Marcus Rohrbach
Anna Rohrbach
OffRL
LRM
37
0
0
01 Apr 2025
REPA: Russian Error Types Annotation for Evaluating Text Generation and Judgment Capabilities
REPA: Russian Error Types Annotation for Evaluating Text Generation and Judgment Capabilities
Alexander Pugachev
Alena Fenogenova
Vladislav Mikhailov
Ekaterina Artemova
32
0
0
17 Mar 2025
Investigating Non-Transitivity in LLM-as-a-Judge
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu
Laura Ruis
Tim Rocktaschel
Robert Kirk
38
0
0
19 Feb 2025
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect
  Verifiers
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Benedikt Stroebl
Sayash Kapoor
Arvind Narayanan
LRM
82
6
0
26 Nov 2024
LIFBench: Evaluating the Instruction Following Performance and Stability
  of Large Language Models in Long-Context Scenarios
LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios
Xiaodong Wu
Minhao Wang
Yichen Liu
Xiaoming Shi
He Yan
Xiangju Lu
Junmin Zhu
Wei Zhang
54
3
0
11 Nov 2024
1