Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.17969
Cited By
Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective
25 June 2024
Hanqi Yan
Yanzheng Xiang
Guangyi Chen
Yifei Wang
Lin Gui
Yulan He
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective"
6 / 6 papers shown
Title
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
45
0
0
01 May 2025
Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT
Zhengfu He
Xuyang Ge
Qiong Tang
Tianxiang Sun
Qinyuan Cheng
Xipeng Qiu
32
20
0
19 Feb 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
64
95
0
03 Jan 2024
Cognitive Reframing of Negative Thoughts through Human-Language Model Interaction
Ashish Sharma
Kevin Rushton
Inna Wanyin Lin
David Wadden
Khendra G. Lucas
Adam S. Miner
Theresa Nguyen
Tim Althoff
71
71
0
04 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
153
186
0
02 May 2023
On Feature Decorrelation in Self-Supervised Learning
Tianyu Hua
Wenxiao Wang
Zihui Xue
Sucheng Ren
Yue Wang
Hang Zhao
SSL
OOD
112
186
0
02 May 2021
1