Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.09746
Cited By
Evaluating Human-Language Model Interaction
19 December 2022
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
Ashwin Paranjape
Ines Gerard-Ursin
Xiang Lisa Li
Faisal Ladhak
Frieda Rong
Rose E. Wang
Minae Kwon
Joon Sung Park
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Human-Language Model Interaction"
35 / 85 papers shown
Title
A Critical Evaluation of Evaluations for Long-form Question Answering
Fangyuan Xu
Yixiao Song
Mohit Iyyer
Eunsol Choi
ELM
30
94
0
29 May 2023
In Search of Verifiability: Explanations Rarely Enable Complementary Performance in AI-Advised Decision Making
Raymond Fok
Daniel S. Weld
19
61
0
12 May 2023
Prompted LLMs as Chatbot Modules for Long Open-domain Conversation
Gibbeum Lee
Volker Hartmann
Jongho Park
Dimitris Papailiopoulos
Kangwook Lee
19
62
0
08 May 2023
Are Human Explanations Always Helpful? Towards Objective Evaluation of Human Natural Language Explanations
Bingsheng Yao
Prithviraj Sen
Lucian Popa
James A. Hendler
Dakuo Wang
XAI
ELM
FAtt
13
10
0
04 May 2023
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Anya Belz
Craig Thomson
Ehud Reiter
Gavin Abercrombie
J. Alonso-Moral
...
Antonio Toral
Xiao-Yi Wan
Leo Wanner
Lewis J. Watson
Diyi Yang
66
35
0
02 May 2023
Learning Personalized Decision Support Policies
Umang Bhatt
Valerie Chen
Katherine M. Collins
Parameswaran Kamalaruban
Emma Kallina
Adrian Weller
Ameet Talwalkar
OffRL
45
10
0
13 Apr 2023
Approximating Online Human Evaluation of Social Chatbots with Prompting
Ekaterina Svikhnushina
Pearl Pu
ELM
8
13
0
11 Apr 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
215
1,701
0
07 Apr 2023
Approach Intelligent Writing Assistants Usability with Seven Stages of Action
Avinash Bhat
Disha Shrivastava
Jin L. C. Guo
9
2
0
06 Apr 2023
GPT detectors are biased against non-native English writers
Weixin Liang
Mert Yuksekgonul
Yining Mao
E. Wu
James Y. Zou
DeLMO
23
268
0
06 Apr 2023
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
37
770
0
30 Mar 2023
Ecosystem Graphs: The Social Footprint of Foundation Models
Rishi Bommasani
Dilara Soylu
Thomas I. Liao
Kathleen A. Creel
Percy Liang
MLAU
16
32
0
28 Mar 2023
Mapping the Design Space of Interactions in Human-AI Text Co-creation Tasks
Zijian Ding
Joel Chan
19
18
0
11 Mar 2023
Parachute: Evaluating Interactive Human-LM Co-writing Systems
Hua Shen
Tongshuang Wu
KELM
17
16
0
11 Mar 2023
Choice Over Control: How Users Write with Large Language Models using Diegetic and Non-Diegetic Prompting
Hai Dang
Sven Goller
Florian Lehmann
Daniel Buschek
AI4CE
88
73
0
06 Mar 2023
Fluid Transformers and Creative Analogies: Exploring Large Language Models' Capacity for Augmenting Cross-Domain Analogical Creativity
Zijian Ding
Arvind Srinivasan
Stephen MacNeil
Joel Chan
21
35
0
27 Feb 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
15
426
0
23 Feb 2023
BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models
Rafal Kocielnik
Shrimai Prabhumoye
Vivian Zhang
Roy Jiang
R. Alvarez
Anima Anandkumar
25
5
0
14 Feb 2023
Chain of Hindsight Aligns Language Models with Feedback
Hao Liu
Carmelo Sferrazza
Pieter Abbeel
ALM
18
115
0
06 Feb 2023
Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction
Renee Shelby
Shalaleh Rismani
Kathryn Henne
AJung Moon
Negar Rostamzadeh
...
N'Mah Yilla-Akbari
Jess Gallegos
A. Smart
Emilio Garcia
Gurleen Virk
29
187
0
11 Oct 2022
Who Wrote this? How Smart Replies Impact Language and Agency in the Workplace
Kilian Wenker
8
7
0
07 Oct 2022
Alexa, Let's Work Together: Introducing the First Alexa Prize TaskBot Challenge on Conversational Task Assistance
Anna Gottardi
Osman Ipek
Giuseppe Castellucci
Shui Hu
Lavina Vaz
...
Oleg Rokhlenko
Kate Bland
Eugene Agichtein
R. Ghanadan
Y. Maarek
30
23
0
13 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Pathway to Future Symbiotic Creativity
Yi-Ting Guo
Qi-fei Liu
Jie Chen
Wei Xue
Jie Fu
...
Fernando Rosas
Jeffrey Shaw
Xing Wu
Jiji Zhang
Jianliang Xu
13
0
0
18 Aug 2022
The Authenticity Gap in Human Evaluation
Kawin Ethayarajh
Dan Jurafsky
79
24
0
24 May 2022
Automated Crossword Solving
Eric Wallace
Nicholas Tomlin
Albert Xu
Kevin Kaichuang Yang
Eshaan Pathak
Matthew Ginsberg
Dan Klein
27
12
0
19 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
215
305
0
27 Apr 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
274
1,114
0
18 Apr 2021
Entity-level Factual Consistency of Abstractive Text Summarization
Feng Nan
Ramesh Nallapati
Zhiguo Wang
Cicero Nogueira dos Santos
Henghui Zhu
Dejiao Zhang
Kathleen McKeown
Bing Xiang
HILM
142
156
0
18 Feb 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
243
284
0
02 Feb 2021
The Impact of Multiple Parallel Phrase Suggestions on Email Input and Composition Behaviour of Native and Non-Native English Writers
Daniel Buschek
Martin Zurn
Malin Eiband
113
98
0
22 Jan 2021
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Previous
1
2