ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.18359
25
5

Improving Model Factuality with Fine-grained Critique-based Evaluator

24 October 2024
Yiqing Xie
Wenxuan Zhou
Pradyot Prakash
Di Jin
Yuning Mao
Quintin Fettes
Arya Talebzadeh
Sinong Wang
Han Fang
Carolyn Rose
Daniel Fried
Hejia Zhang
    HILM
ArXivPDFHTML
Abstract

Factuality evaluation aims to detect factual errors produced by language models (LMs) and hence guide the development of more factual models. Towards this goal, we train a factuality evaluator, FenCE, that provides LM generators with claim-level factuality feedback. We conduct data augmentation on a combination of public judgment datasets to train FenCE to (1) generate textual critiques along with scores and (2) make claim-level judgment based on diverse source documents obtained by various tools. We then present a framework that leverages FenCE to improve the factuality of LM generators by constructing training data. Specifically, we generate a set of candidate responses, leverage FenCE to revise and score each response without introducing lesser-known facts, and train the generator by preferring highly scored revised responses. Experiments show that our data augmentation methods improve the evaluator's accuracy by 2.9% on LLM-AggreFact. With FenCE, we improve Llama2-7B-chat and Llama3-8B-chat's factuality rate by 16.86% and 14.45% on FActScore, outperforming state-of-the-art factuality finetuning methods by 8.83% and 6.96%.

View on arXiv
@article{xie2025_2410.18359,
  title={ Improving Model Factuality with Fine-grained Critique-based Evaluator },
  author={ Yiqing Xie and Wenxuan Zhou and Pradyot Prakash and Di Jin and Yuning Mao and Quintin Fettes and Arya Talebzadeh and Sinong Wang and Han Fang and Carolyn Rose and Daniel Fried and Hejia Zhang },
  journal={arXiv preprint arXiv:2410.18359},
  year={ 2025 }
}
Comments on this paper