Title
LLMs Get Lost In Multi-Turn Conversation Philippe Laban Hiroaki Hayashi Yingbo Zhou Jennifer Neville 32 0 0 09 May 2025
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks Judy Hanwen Shen Carlos Guestrin 31 0 0 09 Apr 2025
CoLa -- Learning to Interactively Collaborate with Large LMs Abhishek Sharma Dan Goldwasser LLMAG SyDa 58 0 0 03 Apr 2025
SPHERE: An Evaluation Card for Human-AI Systems Qianou Ma Dora Zhao Xinran Zhao Chenglei Si Chenyang Yang Ryan Louie Ehud Reiter Diyi Yang Tongshuang Wu ALM 50 0 0 24 Mar 2025
What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models Keyon Vafa Sarah Bentley Jon M. Kleinberg S. Mullainathan 38 0 0 21 Mar 2025
Navigating Rifts in Human-LLM Grounding: Study and Benchmark Omar Shaikh Hussein Mozannar Gagan Bansal Adam Fourney Eric Horvitz 69 2 0 18 Mar 2025
On Benchmarking Human-Like Intelligence in Machines Lance Ying K. M. Collins L. Wong Ilia Sucholutsky Ryan Liu Adrian Weller Tianmin Shu Thomas L. Griffiths Joshua B. Tenenbaum ALM ELM 78 2 0 27 Feb 2025
Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review Rock Yuren Pang Hope Schroeder Kynnedy Simone Smith Solon Barocas Ziang Xiao Emily Tseng Danielle Bragg 73 3 0 22 Jan 2025
Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots Huiqi Zou Pengda Wang Zihan Yan Tianjun Sun Ziang Xiao 90 1 0 29 Nov 2024
IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering Ruosen Li Barry Wang Ruochen Li Xinya Du ELM 33 5 0 24 Aug 2024
Building Machines that Learn and Think with People Katherine M. Collins Ilia Sucholutsky Umang Bhatt Kartik Chandra Lionel Wong ... Mark K. Ho Vikash K. Mansinghka Adrian Weller Joshua B. Tenenbaum Thomas L. Griffiths 40 27 0 22 Jul 2024
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance Kaitlyn Zhou Jena D. Hwang Xiang Ren Nouha Dziri Dan Jurafsky Maarten Sap 30 3 0 10 Jul 2024
HEMM: Holistic Evaluation of Multimodal Foundation Models Paul Pu Liang Akshay Goindani Talha Chafekar Leena Mathur Haofei Yu Ruslan Salakhutdinov Louis-Philippe Morency 36 10 0 03 Jul 2024
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations Lichao Zhang Jia Yu Shuai Zhang Long Li Yangyang Zhong ... Fangsheng Weng Fayu Pan Jing Li Renjun Xu Zhenzhong Lan 32 4 0 21 Jun 2024
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner Kenneth Li Yiming Wang Fernanda Viégas Martin Wattenberg 25 6 0 17 Jun 2024
LLM-Mediated Domain-Specific Voice Agents: The Case of TextileBot Shu Zhong Elia Gatti James Hardwick Miriam Ribul Youngjun Cho Marianna Obrist 31 0 0 15 Jun 2024
Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function Keyon Vafa Ashesh Rambachan S. Mullainathan ELM ALM 13 11 0 03 Jun 2024
Navigating the Landscape of Hint Generation Research: From the Past to the Future Anubhav Jangra Jamshid Mozafari Adam Jatowt Smaranda Muresan 27 2 0 06 Apr 2024
The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers Hussein Mozannar Valerie Chen Mohammed Alsobay Subhro Das Sebastian Zhao Dennis L. Wei Manish Nagireddy P. Sattigeri Ameet Talwalkar David Sontag ELM 38 18 0 03 Apr 2024
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain William James Bolton Rafael Poyiadzi Edward R. Morrell Gabriela van Bergen Gonzalez Bueno Lea Goetz 20 2 0 21 Mar 2024
Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback Dong Won Lee Hae Won Park Yoon Kim C. Breazeal Louis-Philippe Morency 19 0 0 17 Mar 2024
Unveiling the Secrets of Engaging Conversations: Factors that Keep Users Hooked on Role-Playing Dialog Agents Shuai Zhang Yu Lu Junwen Liu Jia Yu Huachuan Qiu Yuming Yan Zhenzhong Lan 37 5 0 18 Feb 2024
Task Supportive and Personalized Human-Large Language Model Interaction: A User Study Ben Wang Jiqun Liu Jamshed Karimnazarov Nicolas Thompson 19 16 0 09 Feb 2024
Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty Kaitlyn Zhou Jena D. Hwang Xiang Ren Maarten Sap 15 54 0 12 Jan 2024
RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language Models Meiling Tao Xuechen Liang Tianyu Shi Lei Yu Yiting Xie 29 4 0 17 Dec 2023
Predictive Minds: LLMs As Atypical Active Inference Agents Jan Kulveit Clem von Stengel Roman Leventov LLMAG KELM LRM 36 1 0 16 Nov 2023
Large Language Models are In-context Teachers for Knowledge Reasoning Jiachen Zhao Zonghai Yao Zhichao Yang Hong-ye Yu ReLM LRM 16 1 0 12 Nov 2023
Measuring Adversarial Datasets Yuanchen Bai Raoyi Huang Vijay Viswanathan Tzu-Sheng Kuo Tongshuang Wu 26 1 0 06 Nov 2023
Leveraging Large Language Models for Collective Decision-Making Marios Papachristou Longqi Yang Chin-Chia Hsu LLMAG 29 2 0 03 Nov 2023
Facilitating Self-Guided Mental Health Interventions Through Human-Language Model Interaction: A Case Study of Cognitive Restructuring Ashish Sharma Kevin Rushton Inna Wanyin Lin Theresa Nguyen Tim Althoff 11 55 0 24 Oct 2023
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents Xuhui Zhou Hao Zhu Leena Mathur Ruohong Zhang Haofei Yu ... Louis-Philippe Morency Yonatan Bisk Daniel Fried Graham Neubig Maarten Sap LLMAG 17 115 0 18 Oct 2023
Leveraging Large Language Model for Automatic Evolving of Industrial Data-Centric R&D Cycle Xu Yang Xiao Yang Weiqing Liu Jinhui Li Peng Yu Zeqi Ye Jiang Bian 15 0 0 17 Oct 2023
Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception Harsh Kumar Ilya Musabirov Mohi Reza Jiakai Shi Xinyuan Wang Joseph Jay Williams Anastasia Kuzminykh Michael Liut 11 29 0 13 Oct 2023
Can large language models provide useful feedback on research papers? A large-scale empirical analysis Weixin Liang Yuhui Zhang Hancheng Cao Binglu Wang Daisy Ding ... Siyu He D. Smith Yian Yin Daniel A. McFarland James Y. Zou ALM LM&MA 27 121 0 03 Oct 2023
ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models Mohi Reza Nathan Laundry Ilya Musabirov Peter Dushniku Zhi Yuan “Michael” Yu Kashish Mittal Tovi Grossman Michael Liut Anastasia Kuzminykh Joseph Jay Williams 10 21 0 29 Sep 2023
Beyond the Chat: Executable and Verifiable Text-Editing with LLMs Philippe Laban Jesse Vig Marti A. Hearst Caiming Xiong Chien-Sheng Wu KELM 32 27 0 27 Sep 2023
User Experience Design Professionals' Perceptions of Generative Artificial Intelligence Jie Li Hancheng Cao Laura Lin Youyang Hou Ruihao Zhu Abdallah El Ali 30 49 0 26 Sep 2023
Creativity Support in the Age of Large Language Models: An Empirical Study Involving Emerging Writers Tuhin Chakrabarty Vishakh Padmakumar Faeze Brahman Smaranda Muresan 50 31 0 22 Sep 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback Xingyao Wang Zihan Wang Jiateng Liu Yangyi Chen Lifan Yuan Hao Peng Heng Ji LRM 125 137 0 19 Sep 2023
Does Writing with Language Models Reduce Content Diversity? Vishakh Padmakumar He He 8 79 0 11 Sep 2023
A Survey on Large Language Model based Autonomous Agents Lei Wang Chengbang Ma Xueyang Feng Zeyu Zhang Hao-ran Yang ... Xu Chen Yankai Lin Wayne Xin Zhao Zhewei Wei Ji-Rong Wen LLMAG AI4CE LM&Ro 39 1,088 0 22 Aug 2023
LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles Shulin Huang Shirong Ma Yinghui Li Mengzuo Huang Wuhe Zou Weidong Zhang Haitao Zheng LLMAG LRM 24 26 0 21 Aug 2023
FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines Matthew Barker Emma Kallina D. Ashok Katherine M. Collins Ashley Casovan Adrian Weller Ameet Talwalkar Valerie Chen Umang Bhatt 20 5 0 28 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets Seonghyeon Ye Doyoung Kim Sungdong Kim Hyeonbin Hwang Seungone Kim Yongrae Jo James Thorne Juho Kim Minjoon Seo ALM 30 96 0 20 Jul 2023
Mini-Giants: "Small" Language Models and Open Source Win-Win Zhengping Zhou Lezhi Li Xinxi Chen Andy Li SyDa ALM MoE 24 5 0 17 Jul 2023
Benchmarking Large Language Model Capabilities for Conditional Generation Joshua Maynez Priyanka Agrawal Sebastian Gehrmann ELM LM&MA 20 28 0 29 Jun 2023
Opportunities and Risks of LLMs for Scalable Deliberation with Polis Christopher T. Small Ivan Vendrov Esin Durmus Hadjar Homaei Elizabeth Barry Julien Cornebise Ted Suzman Deep Ganguli Colin Megill 11 25 0 20 Jun 2023
Towards the Exploitation of LLM-based Chatbot for Providing Legal Support to Palestinian Cooperatives Rabee Qasem Banan Tantour Mohammed Maree AILaw 8 9 0 09 Jun 2023
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models Jose Berengueres Marybeth Sandell 17 0 0 06 Jun 2023
Interactive Editing for Text Summarization Yujia Xie Xun Wang Si-Qing Chen Wayne Xiong Pengcheng He KELM 49 2 0 05 Jun 2023