ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.21331
  4. Cited By
Beyond Interpretability: The Gains of Feature Monosemanticity on Model
  Robustness

Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

27 October 2024
Qi Zhang
Yifei Wang
Jingyi Cui
Xiang Pan
Qi Lei
Stefanie Jegelka
Yisen Wang
    AAML
ArXivPDFHTML

Papers citing "Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness"

1 / 1 papers shown
Title
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
61
0
0
08 Mar 2025
1