ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.12397
  4. Cited By
On the Challenges of Using Black-Box APIs for Toxicity Evaluation in
  Research

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

24 April 2023
Luiza Amador Pozzobon
B. Ermiş
Patrick Lewis
Sara Hooker
ArXivPDFHTML

Papers citing "On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research"

13 / 13 papers shown
Title
Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
Vera Neplenbroek
Arianna Bisazza
Raquel Fernández
100
0
0
17 Feb 2025
Leveraging Open-Source Large Language Models for Native Language Identification
Leveraging Open-Source Large Language Models for Native Language Identification
Yee Man Ng
Ilia Markov
30
0
0
15 Sep 2024
Diffusion Guided Language Modeling
Diffusion Guided Language Modeling
Justin Lovelace
Varsha Kishore
Yiwei Chen
Kilian Q. Weinberger
36
6
0
08 Aug 2024
Exploring Human-LLM Conversations: Mental Models and the Originator of
  Toxicity
Exploring Human-LLM Conversations: Mental Models and the Originator of Toxicity
Johannes Schneider
Arianna Casanova Flores
Anne-Catherine Kranz
39
2
0
08 Jul 2024
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating
  Toxicity in French Texts
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts
Caroline Brun
Vassilina Nikoulina
34
1
0
25 Jun 2024
Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals
Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals
Phillip Howard
Kathleen C. Fraser
Anahita Bhiwandiwalla
S. Kiritchenko
48
9
0
30 May 2024
Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech
Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech
Ghadi Alyahya
Abeer Aldayel
38
2
0
18 Mar 2024
From One to Many: Expanding the Scope of Toxicity Mitigation in Language
  Models
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models
Luiza Amador Pozzobon
Patrick Lewis
Sara Hooker
B. Ermiş
36
7
0
06 Mar 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
13
76
0
25 Jan 2024
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented
  Models
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Luiza Amador Pozzobon
B. Ermiş
Patrick Lewis
Sara Hooker
26
20
0
11 Oct 2023
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
58
1,138
0
17 May 2023
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training
  Dynamics
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
Shoaib Ahmed Siddiqui
Nitarshan Rajkumar
Tegan Maharaj
David M. Krueger
Sara Hooker
35
27
0
20 Sep 2022
Challenges in Detoxifying Language Models
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
242
193
0
15 Sep 2021
1