ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.20224
  4. Cited By
Can Editing LLMs Inject Harm?

Can Editing LLMs Inject Harm?

29 July 2024
Canyu Chen
Baixiang Huang
Zekun Li
Zhaorun Chen
Shiyang Lai
Xiongxiao Xu
Jia-Chen Gu
Jindong Gu
Huaxiu Yao
Chaowei Xiao
Xifeng Yan
William Wang
Philip H. S. Torr
Dawn Song
Kai Shu
    KELM
ArXivPDFHTML

Papers citing "Can Editing LLMs Inject Harm?"

9 / 9 papers shown
Title
Knowledge Editing in Language Models via Adapted Direct Preference
  Optimization
Knowledge Editing in Language Models via Adapted Direct Preference Optimization
Amit Rozner
Barak Battash
Lior Wolf
Ofir Lindenbaum
KELM
29
8
0
14 Jun 2024
EasyJailbreak: A Unified Framework for Jailbreaking Large Language
  Models
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Weikang Zhou
Xiao Wang
Limao Xiong
Han Xia
Yingshuang Gu
...
Lijun Li
Jing Shao
Tao Gui
Qi Zhang
Xuanjing Huang
60
29
0
18 Mar 2024
Consecutive Model Editing with Batch alongside HooK Layers
Consecutive Model Editing with Batch alongside HooK Layers
Shuaiyi Li
Yang Deng
Deng Cai
Hongyuan Lu
Liang Chen
Wai Lam
KELM
40
8
0
08 Mar 2024
Stable Knowledge Editing in Large Language Models
Stable Knowledge Editing in Large Language Models
Zihao Wei
Liang Pang
Hanxing Ding
Jingcheng Deng
Huawei Shen
Xueqi Cheng
KELM
60
9
0
20 Feb 2024
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge
  Editing through Counterfactual Tasks
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks
Wenyue Hua
Jiang Guo
Mingwen Dong
He Zhu
Patrick K. L. Ng
Zhiguo Wang
KELM
41
17
0
31 Jan 2024
PokeMQA: Programmable knowledge editing for Multi-hop Question Answering
PokeMQA: Programmable knowledge editing for Multi-hop Question Answering
Hengrui Gu
Kaixiong Zhou
Xiaotian Han
Ninghao Liu
Ruobing Wang
Xin Wang
LRM
KELM
45
22
0
23 Dec 2023
Knowledge Editing for Large Language Models: A Survey
Knowledge Editing for Large Language Models: A Survey
Song Wang
Yaochen Zhu
Haochen Liu
Zaiyi Zheng
Chen Chen
Jundong Li
KELM
66
127
0
24 Oct 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
202
364
0
15 Oct 2021
1