ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2507.18631
  4. Cited By
Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
v1v2 (latest)

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment

24 July 2025
Hao Li
Lijun Li
Zhenghao Lu
Xianyi Wei
Rui Li
Jing Shao
Lei Sha
ArXiv (abs)PDFHTMLGithub (8★)

Papers citing "Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment"

5 / 5 papers shown
Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities
Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities
Yuan Xiong
Ziqi Miao
Lijun Li
Chen Qian
Jie Li
Jing Shao
AAML
267
0
0
02 Dec 2025
HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment
HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment
Y. Liu
Lijun Li
X. Wang
Jing Shao
LLMSV
255
0
0
17 Oct 2025
LLMs Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions
LLMs Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions
Xuhao Hu
Peng Wang
Xiaoya Lu
Dongrui Liu
Xuanjing Huang
Jing Shao
147
1
0
09 Oct 2025
Weak Form Learning for Mean-Field Partial Differential Equations: an Application to Insect Movement
Weak Form Learning for Mean-Field Partial Differential Equations: an Application to Insect Movement
Seth Minor
Bret D. Elderd
Benjamin Van Allen
David M. Bortz
Vanja M. Dukic
143
0
0
09 Oct 2025
Detecting and Filtering Unsafe Training Data via Data Attribution with Denoised Representation
Detecting and Filtering Unsafe Training Data via Data Attribution with Denoised Representation
Yijun Pan
Taiwei Shi
Jieyu Zhao
Jiaqi W. Ma
TDI
150
8
0
17 Feb 2025
1
Page 1 of 1