On the Challenges of Using Black-Box APIs for Toxicity Evaluation in
Research

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

24 April 2023

Luiza Amador Pozzobon

Patrick Lewis

Papers citing "On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research"

13 / 13 papers shown

Title
Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation Vera Neplenbroek Arianna Bisazza Raquel Fernández 100 0 0 17 Feb 2025
Leveraging Open-Source Large Language Models for Native Language Identification Yee Man Ng Ilia Markov 30 0 0 15 Sep 2024
Diffusion Guided Language Modeling Justin Lovelace Varsha Kishore Yiwei Chen Kilian Q. Weinberger 36 6 0 08 Aug 2024
Exploring Human-LLM Conversations: Mental Models and the Originator of Toxicity Johannes Schneider Arianna Casanova Flores Anne-Catherine Kranz 39 2 0 08 Jul 2024
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts Caroline Brun Vassilina Nikoulina 34 1 0 25 Jun 2024
Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals Phillip Howard Kathleen C. Fraser Anahita Bhiwandiwalla S. Kiritchenko 48 9 0 30 May 2024
Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech Ghadi Alyahya Abeer Aldayel 38 2 0 18 Mar 2024
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models Luiza Amador Pozzobon Patrick Lewis Sara Hooker B. Ermiş 36 7 0 06 Mar 2024
Black-Box Access is Insufficient for Rigorous AI Audits Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ... Michael Gerovitch David Bau Max Tegmark David M. Krueger Dylan Hadfield-Menell AAML 13 76 0 25 Jan 2024
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models Luiza Amador Pozzobon B. Ermiş Patrick Lewis Sara Hooker 26 20 0 11 Oct 2023
PaLM 2 Technical Report Rohan Anil Andrew M. Dai Orhan Firat Melvin Johnson Dmitry Lepikhin ... Ce Zheng Wei Zhou Denny Zhou Slav Petrov Yonghui Wu ReLM LRM 58 1,138 0 17 May 2023
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics Shoaib Ahmed Siddiqui Nitarshan Rajkumar Tegan Maharaj David M. Krueger Sara Hooker 35 27 0 20 Sep 2022
Challenges in Detoxifying Language Models Johannes Welbl Amelia Glaese J. Uesato Sumanth Dathathri John F. J. Mellor Lisa Anne Hendricks Kirsty Anderson Pushmeet Kohli Ben Coppin Po-Sen Huang LM&MA 242 193 0 15 Sep 2021