SAE-V: Interpreting Multimodal Models for Enhanced Alignment

22 February 2025

Papers citing "SAE-V: Interpreting Multimodal Models for Enhanced Alignment"

4 / 4 papers shown

Title
GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models Mariam Mahran Katharina Simbeck 233 0 0 24 Sep 2025
Visually Interpretable Subtask Reasoning for Visual Question Answering Yu Cheng A. Goel Hakan Bilen LRM 199 2 0 12 May 2025
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 541 79 0 02 Jul 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models Chameleon Team MLLM 476 595 0 16 May 2024