Auditing large language models: a three-layered approach

16 February 2023

Papers citing "Auditing large language models: a three-layered approach"

28 / 28 papers shown

Title
Human-AI Governance (HAIG): A Trust-Utility Approach Zeynep Engin 42 0 0 03 May 2025
Addressing the regulatory gap: moving towards an EU AI audit ecosystem beyond the AI Act by including civil society David Hartmann José Renato Laranjeira de Pereira Chiara Streitbörger Bettina Berendt 86 6 0 20 Feb 2025
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset Khaoula Chehbouni Jonathan Colaço-Carr Yash More Jackie CK Cheung G. Farnadi 71 0 0 12 Nov 2024
Thorns and Algorithms: Navigating Generative AI Challenges Inspired by Giraffes and Acacias Waqar Hussain 33 0 0 16 Jul 2024
Navigating LLM Ethics: Advancements, Challenges, and Future Directions Junfeng Jiao S. Afroogh Yiming Xu Connor Phillips AILaw 55 19 0 14 May 2024
The Impact of Unstated Norms in Bias Analysis of Language Models Farnaz Kohankhaki D. B. Emerson David B. Emerson Laleh Seyyed-Kalantari Faiza Khan Khattak 45 1 0 04 Apr 2024
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices Shivani Kapania Ruiyi Wang Toby Jia-Jun Li Tianshi Li Hong Shen 26 6 0 28 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency Akila Wickramasekara F. Breitinger Mark Scanlon 42 7 0 29 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ... Michael Gerovitch David Bau Max Tegmark David M. Krueger Dylan Hadfield-Menell AAML 13 75 0 25 Jan 2024
Personality Traits in Large Language Models Gregory Serapio-García Mustafa Safdari Clément Crepy Luning Sun Stephen Fitz P. Romero Marwa Abdulhai Aleksandra Faust Maja J. Matarić LM&MA LLMAG 46 117 0 01 Jul 2023
Assessing Language Model Deployment with Risk Cards Leon Derczynski Hannah Rose Kirk Vidhisha Balachandran Sachin Kumar Yulia Tsvetkov M. Leiser Saif Mohammad 6 41 0 31 Mar 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 218 441 0 23 Aug 2022
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 291 2,712 0 24 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Conformity Assessments and Post-market Monitoring: A Guide to the Role of Auditing in the Proposed European AI Regulation Jakob Mokander M. Axente F. Casolari Luciano Floridi 57 83 0 09 Nov 2021
Truthful AI: Developing and governing AI that does not lie Owain Evans Owen Cotton-Barratt Lukas Finnveden Adam Bales Avital Balwit Peter Wills Luca Righetti William Saunders HILM 220 107 0 13 Oct 2021
Challenges in Detoxifying Language Models Johannes Welbl Amelia Glaese J. Uesato Sumanth Dathathri John F. J. Mellor Lisa Anne Hendricks Kirsty Anderson Pushmeet Kohli Ben Coppin Po-Sen Huang LM&MA 242 191 0 15 Sep 2021
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation Yue Wang Weishi Wang Shafiq R. Joty S. Hoi 204 1,451 0 02 Sep 2021
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate Hannah Rose Kirk B. Vidgen Paul Röttger Tristan Thrush Scott A. Hale 63 57 0 12 Aug 2021
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models Alex Tamkin Miles Brundage Jack Clark Deep Ganguli AILaw ELM 192 248 0 04 Feb 2021
Robustness Gym: Unifying the NLP Evaluation Landscape Karan Goel Nazneen Rajani Jesse Vig Samson Tan Jason M. Wu Stephan Zheng Caiming Xiong Mohit Bansal Christopher Ré AAML OffRL OOD 138 136 0 13 Jan 2021
Misspelling Correction with Pre-trained Contextual Language Model Yifei Hu X. Jing Youlim Ko Julia Taylor Rayz KELM 24 25 0 08 Jan 2021
Extracting Training Data from Large Language Models Nicholas Carlini Florian Tramèr Eric Wallace Matthew Jagielski Ariel Herbert-Voss ... Tom B. Brown D. Song Ulfar Erlingsson Alina Oprea Colin Raffel MLAU SILM 267 1,798 0 14 Dec 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 4,424 0 23 Jan 2020
The Woman Worked as a Babysitter: On Biases in Language Generation Emily Sheng Kai-Wei Chang Premkumar Natarajan Nanyun Peng 204 607 0 03 Sep 2019
A Survey on Bias and Fairness in Machine Learning Ninareh Mehrabi Fred Morstatter N. Saxena Kristina Lerman Aram Galstyan SyDa FaML 286 4,143 0 23 Aug 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 279 39,083 0 01 Sep 2014