Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.09746
Cited By
Evaluating Human-Language Model Interaction
19 December 2022
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
Ashwin Paranjape
Ines Gerard-Ursin
Xiang Lisa Li
Faisal Ladhak
Frieda Rong
Rose E. Wang
Minae Kwon
Joon Sung Park
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Human-Language Model Interaction"
50 / 85 papers shown
Title
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
32
0
0
09 May 2025
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Judy Hanwen Shen
Carlos Guestrin
31
0
0
09 Apr 2025
CoLa -- Learning to Interactively Collaborate with Large LMs
Abhishek Sharma
Dan Goldwasser
LLMAG
SyDa
58
0
0
03 Apr 2025
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
50
0
0
24 Mar 2025
What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models
Keyon Vafa
Sarah Bentley
Jon M. Kleinberg
S. Mullainathan
38
0
0
21 Mar 2025
Navigating Rifts in Human-LLM Grounding: Study and Benchmark
Omar Shaikh
Hussein Mozannar
Gagan Bansal
Adam Fourney
Eric Horvitz
69
2
0
18 Mar 2025
On Benchmarking Human-Like Intelligence in Machines
Lance Ying
K. M. Collins
L. Wong
Ilia Sucholutsky
Ryan Liu
Adrian Weller
Tianmin Shu
Thomas L. Griffiths
Joshua B. Tenenbaum
ALM
ELM
78
2
0
27 Feb 2025
Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review
Rock Yuren Pang
Hope Schroeder
Kynnedy Simone Smith
Solon Barocas
Ziang Xiao
Emily Tseng
Danielle Bragg
73
3
0
22 Jan 2025
Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots
Huiqi Zou
Pengda Wang
Zihan Yan
Tianjun Sun
Ziang Xiao
90
1
0
29 Nov 2024
IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering
Ruosen Li
Barry Wang
Ruochen Li
Xinya Du
ELM
33
5
0
24 Aug 2024
Building Machines that Learn and Think with People
Katherine M. Collins
Ilia Sucholutsky
Umang Bhatt
Kartik Chandra
Lionel Wong
...
Mark K. Ho
Vikash K. Mansinghka
Adrian Weller
Joshua B. Tenenbaum
Thomas L. Griffiths
40
27
0
22 Jul 2024
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance
Kaitlyn Zhou
Jena D. Hwang
Xiang Ren
Nouha Dziri
Dan Jurafsky
Maarten Sap
30
3
0
10 Jul 2024
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paul Pu Liang
Akshay Goindani
Talha Chafekar
Leena Mathur
Haofei Yu
Ruslan Salakhutdinov
Louis-Philippe Morency
36
10
0
03 Jul 2024
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations
Lichao Zhang
Jia Yu
Shuai Zhang
Long Li
Yangyang Zhong
...
Fangsheng Weng
Fayu Pan
Jing Li
Renjun Xu
Zhenzhong Lan
32
4
0
21 Jun 2024
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Kenneth Li
Yiming Wang
Fernanda Viégas
Martin Wattenberg
25
6
0
17 Jun 2024
LLM-Mediated Domain-Specific Voice Agents: The Case of TextileBot
Shu Zhong
Elia Gatti
James Hardwick
Miriam Ribul
Youngjun Cho
Marianna Obrist
31
0
0
15 Jun 2024
Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function
Keyon Vafa
Ashesh Rambachan
S. Mullainathan
ELM
ALM
13
11
0
03 Jun 2024
Navigating the Landscape of Hint Generation Research: From the Past to the Future
Anubhav Jangra
Jamshid Mozafari
Adam Jatowt
Smaranda Muresan
27
2
0
06 Apr 2024
The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers
Hussein Mozannar
Valerie Chen
Mohammed Alsobay
Subhro Das
Sebastian Zhao
Dennis L. Wei
Manish Nagireddy
P. Sattigeri
Ameet Talwalkar
David Sontag
ELM
38
18
0
03 Apr 2024
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain
William James Bolton
Rafael Poyiadzi
Edward R. Morrell
Gabriela van Bergen Gonzalez Bueno
Lea Goetz
20
2
0
21 Mar 2024
Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback
Dong Won Lee
Hae Won Park
Yoon Kim
C. Breazeal
Louis-Philippe Morency
19
0
0
17 Mar 2024
Unveiling the Secrets of Engaging Conversations: Factors that Keep Users Hooked on Role-Playing Dialog Agents
Shuai Zhang
Yu Lu
Junwen Liu
Jia Yu
Huachuan Qiu
Yuming Yan
Zhenzhong Lan
37
5
0
18 Feb 2024
Task Supportive and Personalized Human-Large Language Model Interaction: A User Study
Ben Wang
Jiqun Liu
Jamshed Karimnazarov
Nicolas Thompson
19
16
0
09 Feb 2024
Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty
Kaitlyn Zhou
Jena D. Hwang
Xiang Ren
Maarten Sap
15
54
0
12 Jan 2024
RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language Models
Meiling Tao
Xuechen Liang
Tianyu Shi
Lei Yu
Yiting Xie
29
4
0
17 Dec 2023
Predictive Minds: LLMs As Atypical Active Inference Agents
Jan Kulveit
Clem von Stengel
Roman Leventov
LLMAG
KELM
LRM
36
1
0
16 Nov 2023
Large Language Models are In-context Teachers for Knowledge Reasoning
Jiachen Zhao
Zonghai Yao
Zhichao Yang
Hong-ye Yu
ReLM
LRM
16
1
0
12 Nov 2023
Measuring Adversarial Datasets
Yuanchen Bai
Raoyi Huang
Vijay Viswanathan
Tzu-Sheng Kuo
Tongshuang Wu
26
1
0
06 Nov 2023
Leveraging Large Language Models for Collective Decision-Making
Marios Papachristou
Longqi Yang
Chin-Chia Hsu
LLMAG
29
2
0
03 Nov 2023
Facilitating Self-Guided Mental Health Interventions Through Human-Language Model Interaction: A Case Study of Cognitive Restructuring
Ashish Sharma
Kevin Rushton
Inna Wanyin Lin
Theresa Nguyen
Tim Althoff
11
55
0
24 Oct 2023
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
Xuhui Zhou
Hao Zhu
Leena Mathur
Ruohong Zhang
Haofei Yu
...
Louis-Philippe Morency
Yonatan Bisk
Daniel Fried
Graham Neubig
Maarten Sap
LLMAG
17
115
0
18 Oct 2023
Leveraging Large Language Model for Automatic Evolving of Industrial Data-Centric R&D Cycle
Xu Yang
Xiao Yang
Weiqing Liu
Jinhui Li
Peng Yu
Zeqi Ye
Jiang Bian
15
0
0
17 Oct 2023
Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception
Harsh Kumar
Ilya Musabirov
Mohi Reza
Jiakai Shi
Xinyuan Wang
Joseph Jay Williams
Anastasia Kuzminykh
Michael Liut
11
29
0
13 Oct 2023
Can large language models provide useful feedback on research papers? A large-scale empirical analysis
Weixin Liang
Yuhui Zhang
Hancheng Cao
Binglu Wang
Daisy Ding
...
Siyu He
D. Smith
Yian Yin
Daniel A. McFarland
James Y. Zou
ALM
LM&MA
27
121
0
03 Oct 2023
ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models
Mohi Reza
Nathan Laundry
Ilya Musabirov
Peter Dushniku
Zhi Yuan “Michael” Yu
Kashish Mittal
Tovi Grossman
Michael Liut
Anastasia Kuzminykh
Joseph Jay Williams
10
21
0
29 Sep 2023
Beyond the Chat: Executable and Verifiable Text-Editing with LLMs
Philippe Laban
Jesse Vig
Marti A. Hearst
Caiming Xiong
Chien-Sheng Wu
KELM
32
27
0
27 Sep 2023
User Experience Design Professionals' Perceptions of Generative Artificial Intelligence
Jie Li
Hancheng Cao
Laura Lin
Youyang Hou
Ruihao Zhu
Abdallah El Ali
30
49
0
26 Sep 2023
Creativity Support in the Age of Large Language Models: An Empirical Study Involving Emerging Writers
Tuhin Chakrabarty
Vishakh Padmakumar
Faeze Brahman
Smaranda Muresan
50
31
0
22 Sep 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
125
137
0
19 Sep 2023
Does Writing with Language Models Reduce Content Diversity?
Vishakh Padmakumar
He He
8
79
0
11 Sep 2023
A Survey on Large Language Model based Autonomous Agents
Lei Wang
Chengbang Ma
Xueyang Feng
Zeyu Zhang
Hao-ran Yang
...
Xu Chen
Yankai Lin
Wayne Xin Zhao
Zhewei Wei
Ji-Rong Wen
LLMAG
AI4CE
LM&Ro
39
1,088
0
22 Aug 2023
LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles
Shulin Huang
Shirong Ma
Yinghui Li
Mengzuo Huang
Wuhe Zou
Weidong Zhang
Haitao Zheng
LLMAG
LRM
24
26
0
21 Aug 2023
FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines
Matthew Barker
Emma Kallina
D. Ashok
Katherine M. Collins
Ashley Casovan
Adrian Weller
Ameet Talwalkar
Valerie Chen
Umang Bhatt
20
5
0
28 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
30
96
0
20 Jul 2023
Mini-Giants: "Small" Language Models and Open Source Win-Win
Zhengping Zhou
Lezhi Li
Xinxi Chen
Andy Li
SyDa
ALM
MoE
24
5
0
17 Jul 2023
Benchmarking Large Language Model Capabilities for Conditional Generation
Joshua Maynez
Priyanka Agrawal
Sebastian Gehrmann
ELM
LM&MA
20
28
0
29 Jun 2023
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Christopher T. Small
Ivan Vendrov
Esin Durmus
Hadjar Homaei
Elizabeth Barry
Julien Cornebise
Ted Suzman
Deep Ganguli
Colin Megill
11
25
0
20 Jun 2023
Towards the Exploitation of LLM-based Chatbot for Providing Legal Support to Palestinian Cooperatives
Rabee Qasem
Banan Tantour
Mohammed Maree
AILaw
8
9
0
09 Jun 2023
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models
Jose Berengueres
Marybeth Sandell
17
0
0
06 Jun 2023
Interactive Editing for Text Summarization
Yujia Xie
Xun Wang
Si-Qing Chen
Wayne Xiong
Pengcheng He
KELM
49
2
0
05 Jun 2023
1
2
Next