ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.11614
  4. Cited By
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces

Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces

17 June 2024
Yihuai Hong
Lei Yu
Shauli Ravfogel
Haiqin Yang
Mor Geva
    KELM
    MU
ArXivPDFHTML

Papers citing "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"

16 / 16 papers shown
Title
Not All Data Are Unlearned Equally
Not All Data Are Unlearned Equally
Aravind Krishnan
Siva Reddy
Marius Mosbach
MU
46
0
0
07 Apr 2025
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via
  Mechanistic Localization
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Guo
Aaquib Syed
Abhay Sheshadri
Aidan Ewart
Gintare Karolina Dziugaite
KELM
MU
29
5
0
16 Oct 2024
Do Unlearning Methods Remove Information from Language Model Weights?
Do Unlearning Methods Remove Information from Language Model Weights?
Aghyad Deeb
Fabien Roger
AAML
MU
40
11
0
11 Oct 2024
Dissecting Fine-Tuning Unlearning in Large Language Models
Dissecting Fine-Tuning Unlearning in Large Language Models
Yihuai Hong
Yuelin Zou
Lijie Hu
Ziqian Zeng
Di Wang
Haiqin Yang
AAML
MU
24
2
0
09 Oct 2024
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Pratiksha Thaker
Shengyuan Hu
Neil Kale
Yash Maurya
Zhiwei Steven Wu
Virginia Smith
MU
39
10
0
03 Oct 2024
Robust LLM safeguarding via refusal feature adversarial training
Robust LLM safeguarding via refusal feature adversarial training
L. Yu
Virginie Do
Karen Hambardzumyan
Nicola Cancedda
AAML
49
9
0
30 Sep 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
59
31
0
26 Sep 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
54
18
0
02 Jul 2024
Mechanistic Understanding and Mitigation of Language Model Non-Factual
  Hallucinations
Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations
Lei Yu
Meng Cao
Jackie Chi Kit Cheung
Yue Dong
HILM
20
6
0
27 Mar 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language
  Model Representations
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Jing-ling Huang
Zhengxuan Wu
Christopher Potts
Mor Geva
Atticus Geiger
53
24
0
27 Feb 2024
OLMo: Accelerating the Science of Language Models
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
124
349
0
01 Feb 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO
  and Toxicity
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
55
95
0
03 Jan 2024
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
Nianwen Si
Hao Zhang
Heyu Chang
Wenlin Zhang
Dan Qu
Weiqiang Zhang
KELM
MU
70
26
0
27 Nov 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language
  Models
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
186
260
0
28 Apr 2023
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Joel Jang
Dongkeun Yoon
Sohee Yang
Sungmin Cha
Moontae Lee
Lajanugen Logeswaran
Minjoon Seo
KELM
PILM
MU
145
110
0
04 Oct 2022
Toy Models of Superposition
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
117
314
0
21 Sep 2022
1