Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.02792
Cited By
Unifying Human and Statistical Evaluation for Natural Language Generation
4 April 2019
Tatsunori B. Hashimoto
Hugh Zhang
Percy Liang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Unifying Human and Statistical Evaluation for Natural Language Generation"
45 / 45 papers shown
Title
From Superficial to Deep: Integrating External Knowledge for Follow-up Question Generation Using Knowledge Graph and LLM
Jianyu Liu
Yi Huang
Sheng Bi
Junlan Feng
Guilin Qi
31
2
0
08 Apr 2025
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
Guangsheng Bao
Yanbin Zhao
Juncai He
Yue Zhang
VLM
92
2
0
20 Feb 2025
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Shahin Honarvar
Mark van der Wilk
Alastair Donaldson
78
6
0
28 Jan 2025
Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting
Can Chen
Jun-Kun Wang
DeLMO
37
0
0
29 Oct 2024
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Sijin Chen
Omar Hagrass
Jason M. Klusowski
24
2
0
04 Oct 2024
Agents' Room: Narrative Generation through Multi-step Collaboration
Fantine Huot
Reinald Kim Amplayo
Jennimaria Palomaki
Alice Shoshana Jakobovits
Elizabeth Clark
Mirella Lapata
43
7
0
03 Oct 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
27
65
0
30 May 2024
How Far Can We Extract Diverse Perspectives from Large Language Models?
Shirley Anugrah Hayati
Minhwa Lee
Dheeraj Rajagopal
Dongyeop Kang
38
10
0
16 Nov 2023
HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis
Christoforos Vasilatos
Manaar Alam
Talal Rahwan
Yasir Zaki
Michail Maniatakos
DeLMO
32
32
0
26 May 2023
ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems
Sarik Ghazarian
Yijia Shao
Rujun Han
Aram Galstyan
Nanyun Peng
18
7
0
12 May 2023
MAUVE Scores for Generative Models: Theory and Practice
Krishna Pillutla
Lang Liu
John Thickstun
Sean Welleck
Swabha Swayamdipta
Rowan Zellers
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
EGVM
23
21
0
30 Dec 2022
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
46
99
0
19 Dec 2022
Implicit causality in GPT-2: a case study
H. Huynh
T. Lentz
Emiel van Miltenburg
LRM
22
3
0
08 Dec 2022
Truncation Sampling as Language Model Desmoothing
John Hewitt
Christopher D. Manning
Percy Liang
BDL
36
75
0
27 Oct 2022
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception
Keenan I. Jones
Enes ALTUNCU
V. N. Franqueira
Yi-Chia Wang
Shujun Li
DeLMO
34
3
0
11 Aug 2022
Innovations in Neural Data-to-text Generation: A Survey
Mandar Sharma
Ajay K. Gogineni
Naren Ramakrishnan
24
10
0
25 Jul 2022
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics
Huan Yee Koh
Jiaxin Ju
Ming Liu
Shirui Pan
73
122
0
03 Jul 2022
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
56
14
0
11 Jun 2022
Computational Storytelling and Emotions: A Survey
Yusuke Mori
Hiroaki Yamane
Yusuke Mukuta
Tatsuya Harada
35
2
0
23 May 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
Philippe Laban
Chien-Sheng Wu
Wenhao Liu
Caiming Xiong
33
5
0
13 May 2022
Vector Representations of Idioms in Conversational Systems
Tosin P. Adewumi
F. Liwicki
Marcus Liwicki
14
8
0
07 May 2022
CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation
Pei Ke
Hao Zhou
Yankai Lin
Peng Li
Jie Zhou
Xiaoyan Zhu
Minlie Huang
21
37
0
02 Apr 2022
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation
Shashi Narayan
Gonccalo Simoes
Yao-Min Zhao
Joshua Maynez
Dipanjan Das
Michael Collins
Mirella Lapata
18
30
0
28 Mar 2022
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
Hanqing Zhang
Haolin Song
Shaoyu Li
Ming Zhou
Dawei Song
38
213
0
14 Jan 2022
Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees
Yaman Kumar Singla
Sriram Krishna
R. Shah
Changyou Chen
18
6
0
17 Nov 2021
HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models
Tanya Goyal
Nazneen Rajani
Wenhao Liu
Wojciech Kry'sciñski
AI4CE
15
12
0
08 Oct 2021
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
Mingkai Deng
Bowen Tan
Zhengzhong Liu
Eric P. Xing
Zhiting Hu
16
72
0
14 Sep 2021
How to Evaluate Your Dialogue Models: A Review of Approaches
Xinmeng Li
Wansen Wu
Long Qin
Quanjun Yin
ELM
22
8
0
03 Aug 2021
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Yao Dou
Maxwell Forbes
Rik Koncel-Kedziorski
Noah A. Smith
Yejin Choi
DeLMO
6
126
0
02 Jul 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
28
393
0
30 Jun 2021
What Context Features Can Transformer Language Models Use?
J. O'Connor
Jacob Andreas
KELM
13
75
0
15 Jun 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
246
283
0
02 Feb 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Krishna Pillutla
Swabha Swayamdipta
Rowan Zellers
John Thickstun
Sean Welleck
Yejin Choi
Zaïd Harchaoui
26
341
0
02 Feb 2021
GLUCOSE: GeneraLized and COntextualized Story Explanations
N. Mostafazadeh
Aditya Kalyanpur
Lori Moon
David W. Buchanan
Lauren Berkowitz
Or Biran
Jennifer Chu-Carroll
13
120
0
16 Sep 2020
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
Jian-Yu Guan
Minlie Huang
21
69
0
16 Sep 2020
Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols
Sarah E. Finch
Jinho D. Choi
ELM
21
67
0
10 Jun 2020
Limits of Detecting Text Generated by Large-Scale Language Models
L. Varshney
N. Keskar
R. Socher
DeLMO
16
18
0
09 Feb 2020
Social Bias Frames: Reasoning about Social and Power Implications of Language
Maarten Sap
Saadia Gabriel
Lianhui Qin
Dan Jurafsky
Noah A. Smith
Yejin Choi
17
483
0
10 Nov 2019
Do Massively Pretrained Language Models Make Better Storytellers?
A. See
Aneesh S. Pappu
Rohun Saxena
Akhila Yerukola
Christopher D. Manning
26
166
0
24 Sep 2019
Defending Against Neural Fake News
Rowan Zellers
Ari Holtzman
Hannah Rashkin
Yonatan Bisk
Ali Farhadi
Franziska Roesner
Yejin Choi
AAML
17
996
0
29 May 2019
Judge the Judges: A Large-Scale Evaluation Study of Neural Language Models for Online Review Generation
Cristina Garbacea
Samuel Carton
Shiyan Yan
Qiaozhu Mei
ELM
17
29
0
02 Jan 2019
Language GANs Falling Short
Massimo Caccia
Lucas Page-Caccia
W. Fedus
Hugo Larochelle
Joelle Pineau
Laurent Charlin
117
215
0
06 Nov 2018
Retrieval-Based Neural Code Generation
Shirley Anugrah Hayati
R. Olivier
Pravalika Avvaru
Pengcheng Yin
A. Tomasic
Graham Neubig
129
110
0
29 Aug 2018
Adversarial Evaluation of Dialogue Models
Anjuli Kannan
Oriol Vinyals
AAML
ALM
131
76
0
27 Jan 2017
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein
Yoon Kim
Yuntian Deng
Jean Senellart
Alexander M. Rush
254
1,896
0
10 Jan 2017
1