Supporting Human-AI Collaboration in Auditing LLMs with LLMs

Supporting Human-AI Collaboration in Auditing LLMs with LLMs

19 April 2023

Marco Tulio Ribeiro

Saleema Amershi

Papers citing "Supporting Human-AI Collaboration in Auditing LLMs with LLMs"

11 / 11 papers shown

Title
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks Yixin Cao Shibo Hong X. Li Jiahao Ying Yubo Ma ... Juanzi Li Aixin Sun Xuanjing Huang Tat-Seng Chua Yu Jiang ALM ELM 84 1 0 26 Apr 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation SeongYeub Chu JongWoo Kim MunYong Yi 57 3 0 21 Feb 2025
Evaluating Human-AI Collaboration: A Review and Methodological Framework George Fragiadakis Christos Diou George Kousiouris Mara Nikolaidou 57 11 0 09 Jul 2024
Enhancing user experience in large language models through human-centered design: Integrating theoretical insights with an experimental study to meet diverse software learning needs with a single document knowledge base Yuchen Wang Yin-Shan Lin Ruixin Huang Jinyin Wang Sensen Liu 21 7 0 19 May 2024
Navigating LLM Ethics: Advancements, Challenges, and Future Directions Junfeng Jiao S. Afroogh Yiming Xu Connor Phillips AILaw 60 19 0 14 May 2024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation Jessica Quaye Alicia Parrish Oana Inel Charvi Rastogi Hannah Rose Kirk ... Nathan Clement Rafael Mosquera Juan Ciro Vijay Janapa Reddi Lora Aroyo 29 7 0 14 Feb 2024
Rocks Coding, Not Development--A Human-Centric, Experimental Evaluation of LLM-Supported SE Tasks Wei Wang Huilong Ning Gaowei Zhang Libo Liu Yi Wang 26 11 0 08 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges Mingqi Gao Xinyu Hu Jie Ruan Xiao Pu Xiaojun Wan ELM LM&MA 55 29 0 02 Feb 2024
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs Chenyang Yang Rishabh Rustogi Rachel A. Brower-Sinning Grace A. Lewis Christian Kastner Tongshuang Wu KELM 30 11 0 14 Oct 2023
"I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset Eric Michael Smith Melissa Hall Melanie Kambadur Eleonora Presani Adina Williams 65 129 0 18 May 2022
Discovering and Validating AI Errors With Crowdsourced Failure Reports Ángel Alexander Cabrera Abraham J. Druck Jason I. Hong Adam Perer HAI 48 54 0 23 Sep 2021