ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.22720
39
0

Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models

25 March 2025
Bowei Tian
Xuntao Lyu
Meng Liu
Hongyi Wang
Ang Li
ArXivPDFHTML
Abstract

Representation Engineering (RepE) has emerged as a powerful paradigm for enhancing AI transparency by focusing on high-level representations rather than individual neurons or circuits. It has proven effective in improving interpretability and control, showing that representations can emerge, propagate, and shape final model outputs in large language models (LLMs). However, in Vision-Language Models (VLMs), visual input can override factual linguistic knowledge, leading to hallucinated responses that contradict reality. To address this challenge, we make the first attempt to extend RepE to VLMs, analyzing how multimodal representations are preserved and transformed. Building on our findings and drawing inspiration from successful RepE applications, we develop a theoretical framework that explains the stability of neural activity across layers using the principal eigenvector, uncovering the underlying mechanism of RepE. We empirically validate these instrinsic properties, demonstrating their broad applicability and significance. By bridging theoretical insights with empirical validation, this work transforms RepE from a descriptive tool into a structured theoretical framework, opening new directions for improving AI robustness, fairness, and transparency.

View on arXiv
@article{tian2025_2503.22720,
  title={ Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models },
  author={ Bowei Tian and Xuntao Lyu and Meng Liu and Hongyi Wang and Ang Li },
  journal={arXiv preprint arXiv:2503.22720},
  year={ 2025 }
}
Comments on this paper