ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.17391
  4. Cited By
Unveiling the Implicit Toxicity in Large Language Models

Unveiling the Implicit Toxicity in Large Language Models

29 November 2023
Jiaxin Wen
Pei Ke
Hao-Lun Sun
Zhexin Zhang
Chengfei Li
Jinfeng Bai
Minlie Huang
ArXivPDFHTML

Papers citing "Unveiling the Implicit Toxicity in Large Language Models"

7 / 7 papers shown
Title
An Adversarial Perspective on Machine Unlearning for AI Safety
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
71
31
0
26 Sep 2024
Sowing the Wind, Reaping the Whirlwind: The Impact of Editing Language
  Models
Sowing the Wind, Reaping the Whirlwind: The Impact of Editing Language Models
Rima Hazra
Sayan Layek
Somnath Banerjee
Soujanya Poria
KELM
26
17
0
19 Jan 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Can Machines Learn Morality? The Delphi Experiment
Can Machines Learn Morality? The Delphi Experiment
Liwei Jiang
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Jenny T Liang
...
Yulia Tsvetkov
Oren Etzioni
Maarten Sap
Regina A. Rini
Yejin Choi
FaML
117
110
0
14 Oct 2021
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Mai Elsherief
Caleb Ziems
D. Muchlinski
Vaishnavi Anupindi
Jordyn Seybolt
M. D. Choudhury
Diyi Yang
92
235
0
11 Sep 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,808
0
14 Dec 2020
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
406
2,576
0
03 Sep 2019
1