ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.20526
  4. Cited By
Llama Scope: Extracting Millions of Features from Llama-3.1-8B with
  Sparse Autoencoders

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

27 October 2024
Zhengfu He
Wentao Shu
Xuyang Ge
Lingjie Chen
Junxuan Wang
Yunhua Zhou
Frances Liu
Qipeng Guo
Xuanjing Huang
Zuxuan Wu
Yu Jiang
Xipeng Qiu
ArXivPDFHTML

Papers citing "Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders"

7 / 7 papers shown
Title
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
Boyi Deng
Yu Wan
Yidan Zhang
Baosong Yang
Fuli Feng
41
0
0
08 May 2025
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Zhengfu He
J. Wang
Rui Lin
Xuyang Ge
Wentao Shu
Qiong Tang
J. Zhang
Xipeng Qiu
70
0
0
29 Apr 2025
MIB: A Mechanistic Interpretability Benchmark
MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller
Atticus Geiger
Sarah Wiegreffe
Dana Arad
Iván Arcuschin
...
Alessandro Stolfo
Martin Tutek
Amir Zur
David Bau
Yonatan Belinkov
41
1
0
17 Apr 2025
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
Yixin Cao
Jiahao Ying
Y. Wang
Xipeng Qiu
Xuanjing Huang
Yugang Jiang
ELM
30
2
0
10 Apr 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Mengnan Du
LLMSV
66
2
0
17 Feb 2025
Decomposing The Dark Matter of Sparse Autoencoders
Decomposing The Dark Matter of Sparse Autoencoders
Joshua Engels
Logan Riggs
Max Tegmark
LLMSV
57
9
0
18 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
75
19
0
02 Jul 2024
1