Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders

24 February 2025

Papers citing "Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders"

3 / 3 papers shown

Title
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders Dong Shu Xuansheng Wu Haiyan Zhao Mengnan Du Ninghao Liu LLMSV 35 0 0 12 May 2025
Can GPT tell us why these images are synthesized? Empowering Multimodal Large Language Models for Forensics Yiran He Yun Cao Bowen Yang Zeyu Zhang 24 0 0 16 Apr 2025
Towards Trustworthy GUI Agents: A Survey Yucheng Shi Wenhao Yu Wenlin Yao Wenhu Chen Ninghao Liu 39 2 0 30 Mar 2025