ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.14472
  4. Cited By
Detoxifying Large Language Models via Knowledge Editing

Detoxifying Large Language Models via Knowledge Editing

21 March 2024
Meng Wang
Ningyu Zhang
Ziwen Xu
Zekun Xi
Shumin Deng
Yunzhi Yao
Qishen Zhang
Linyi Yang
Jindong Wang
Huajun Chen
    KELM
ArXivPDFHTML

Papers citing "Detoxifying Large Language Models via Knowledge Editing"

12 / 12 papers shown
Title
Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model
Qiyuan Deng
X. Bai
Kehai Chen
Yaowei Wang
Liqiang Nie
Min Zhang
OffRL
55
0
0
13 Mar 2025
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
Qianli Ma
Dongrui Liu
Qian Chen
Linfeng Zhang
Jing Shao
MoMe
47
0
0
24 Feb 2025
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
Berk Atil
Vipul Gupta
Sarkar Snigdha Sarathi Das
R. Passonneau
59
0
0
07 Feb 2025
Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
Yang Ouyang
Hengrui Gu
Shuhang Lin
Wenyue Hua
Jie Peng
B. Kailkhura
Tianlong Chen
Kaixiong Zhou
Kaixiong Zhou
AAML
23
1
0
05 Jan 2025
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Zhipeng Chen
Liang Song
K. Zhou
Wayne Xin Zhao
B. Wang
Weipeng Chen
Ji-Rong Wen
55
0
0
10 Oct 2024
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao
Heng Zhao
Bo Shen
Ali Payani
Fan Yang
Mengnan Du
55
2
0
30 Sep 2024
In-Context Editing: Learning Knowledge from Self-Induced Distributions
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Siyuan Qi
Bangcheng Yang
Kailin Jiang
Xiaobo Wang
Jiaqi Li
Yifan Zhong
Yaodong Yang
Zilong Zheng
KELM
78
8
0
17 Jun 2024
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge
  Editing through Counterfactual Tasks
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks
Wenyue Hua
Jiang Guo
Mingwen Dong
He Zhu
Patrick K. L. Ng
Zhiguo Wang
KELM
41
17
0
31 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO
  and Toxicity
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
52
95
0
03 Jan 2024
Knowledge Editing for Large Language Models: A Survey
Knowledge Editing for Large Language Models: A Survey
Song Wang
Yaochen Zhu
Haochen Liu
Zaiyi Zheng
Chen Chen
Jundong Li
KELM
66
127
0
24 Oct 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
Language Anisotropic Cross-Lingual Model Editing
Language Anisotropic Cross-Lingual Model Editing
Yang Xu
Yutai Hou
Wanxiang Che
Min Zhang
KELM
70
24
0
25 May 2022
1