v1v2 (latest)

Evaluation of Text Generation: A Survey

26 June 2020

Papers citing "Evaluation of Text Generation: A Survey"

50 / 243 papers shown

OPOR-Bench: Evaluating Large Language Models on Online Public Opinion Report Generation

320

01 Dec 2025

Rating Roulette: Self-Inconsistency in LLM-As-A-Judge FrameworksConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

Rajarshi Haldar

Julia Hockenmaier

197

31 Oct 2025

CreativityPrism: A Holistic Evaluation Framework for Large Language Model Creativity

...

199

23 Oct 2025

A Layered Intuition -- Method Model with Scope Extension for LLM Reasoning

Hong Su

LRM

107

12 Oct 2025

Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions

121

29 Sep 2025

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

346

04 Sep 2025

The illusion of a perfect metric: Why evaluating AI's words is harder than it looks

222

19 Aug 2025

References Matter: Investigating the Impact of Reference Set Variation on Summarization Evaluation

410

17 Jun 2025

From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary

216

17 Jun 2025

COGENT: A Curriculum-oriented Framework for Generating Grade-appropriate Educational ContentWorkshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2025

325

11 Jun 2025

Design of Trimmed Helicoid Soft-Rigid Hybrid RobotsInternational Conference on Soft Robotics (ICSR), 2025

Zach J. Patterson

Emily R. Sologuren

Daniela Rus

130

03 Jun 2025

APE: Selective Fine-tuning with Acceptance Criteria for Language Model Adaptation

Javier Marín

245

26 May 2025

Evaluating and Mitigating Bias in AI-Based Medical Text GenerationNature Computational Science (Nat. Comput. Sci.), 2025

285

24 Apr 2025

The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes

Wencong You

Daniel Lowd

348

24 Apr 2025

CPR: Leveraging LLMs for Topic and Phrase Suggestion to Facilitate Comprehensive Product Reviews

Ekta Gujral

Apurva Sinha

Lishi Ji

Bijayani Sanghamitra Mishra

171

18 Apr 2025

LLMs as Span Annotators: A Comparative Study of LLMs and Humans

663

11 Apr 2025

SCORE: Story Coherence and Retrieval Enhancement for AI Narratives

...

835

30 Mar 2025

When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?

...

398

29 Mar 2025

Natural Language GenerationTheoretical Issues In Natural Language Processing (TINLP), 2018

Emiel van Miltenburg

Chenghua Lin

350

20 Mar 2025

Argument Summarization and its Evaluation in the Era of Large Language Models

482

02 Mar 2025

Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks

...

463

24 Feb 2025

HPSS: Heuristic Prompting Strategy Search for LLM EvaluatorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

271

18 Feb 2025

Reference-free Evaluation Metrics for Text Generation: A Survey

447

21 Jan 2025

Interactive Information Need Prediction with Intent and Context

Kevin Ros

Dhyey Pandya

ChengXiang Zhai

174

05 Jan 2025

AltGen: AI-Driven Alt Text Generation for Enhancing EPUB Accessibility

312

03 Jan 2025

QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization

413

10 Dec 2024

Challenges in Trustworthy Human Evaluation of ChatbotsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

362

05 Dec 2024

I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot ExperiencesConference on Robot Learning (CoRL), 2024

347

20 Nov 2024

Script-Strategy Aligned Generation: Aligning LLMs with Expert-Crafted Dialogue Scripts and Therapeutic Strategies for PsychotherapyProceedings of the ACM on Human-Computer Interaction (PACMHCI), 2024

526

11 Nov 2024

Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning

Dong Shu

Jundong Li

253

30 Oct 2024

Prove Your Point!: Bringing Proof-Enhancement Principles to Argumentative Essay GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

184

30 Oct 2024

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework

449

24 Oct 2024

OpenMU: Your Swiss Army Knife for Music Understanding

426

21 Oct 2024

Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-ExplorationNeural Information Processing Systems (NeurIPS), 2024

320

17 Oct 2024

4-LEGS: 4D Language Embedded Gaussian Splatting

522

14 Oct 2024

MLP-SLAM: Multilayer Perceptron-Based Simultaneous Localization and Mapping

Taozhe Li

Wei Sun

395

14 Oct 2024

Investigating Human-Computer Interaction and Visual Comprehension in Text Generation Process of Natural Language Generation Models

212

11 Oct 2024

Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM Evaluation

Chaithanya Bandi

Abir Harrasse

Hari Bandi

LLMAG ELM

423

07 Oct 2024

Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions

Enamul Hoque

Mohammed Saidul Islam

246

29 Sep 2024

Quality Matters: Evaluating Synthetic Data for Tool-Using LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

129

24 Sep 2024

LLMs are One-Shot URL Classifiers and Explainers

283

22 Sep 2024

The Effect of Education in Prompt Engineering: Evidence from Journalists

Amirsiavosh Bashardoust

Yuanjun Feng

Dominique Geissler

Stefan Feuerriegel

Y. Shrestha

221

18 Sep 2024

Exploring Fine-tuned Generative Models for Keyphrase Selection: A Case Study for Russian

Anna Glazkova

Dmitry A. Morozov

243

16 Sep 2024

CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation

333

03 Sep 2024

A Perspective on Literary Metaphor in the Context of Generative AI

Imke van Heerden

Anil Bas

232

02 Sep 2024

Summarizing long regulatory documents with a multi-step pipeline

202

19 Aug 2024

Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation PracticesInternational Conference on Natural Language Generation (INLG), 2024

Patrícia Schmidtová

Saad Mahamood

Simone Balloccu

Ondřej Dušek

Albert Gatt

Dimitra Gkatzia

David M. Howcroft

Ondřej Plátek

Adarsa Sivaprasad

256

17 Aug 2024

What should I wear to a party in a Greek taverna? Evaluation for Conversational Agents in the Fashion Domain

Antonis Maronikolakis

Ana Peleteiro Ramallo

Weiwei Cheng

Thomas Kober

LLMAG

203

13 Aug 2024

Exploring Personality-Driven Personalization in XAI: Enhancing User Trust in Gameplay

Zhaoxin Li

Sophie Yang

Shijie Wang

178

08 Aug 2024

Interpretable Differential Diagnosis with Dual-Inference Large Language Models

Rui Zhang

243

10 Jul 2024