ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.06102
  4. Cited By
Patchscopes: A Unifying Framework for Inspecting Hidden Representations
  of Language Models

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

11 January 2024
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
ArXivPDFHTML

Papers citing "Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models"

50 / 78 papers shown
Title
ICL CIPHERS: Quantifying "Learning'' in In-Context Learning via Substitution Ciphers
ICL CIPHERS: Quantifying "Learning'' in In-Context Learning via Substitution Ciphers
Zhouxiang Fang
Aayush Mishra
Muhan Gao
Anqi Liu
Daniel Khashabi
42
0
0
28 Apr 2025
Functional Abstraction of Knowledge Recall in Large Language Models
Functional Abstraction of Knowledge Recall in Large Language Models
Zijian Wang
Chang Xu
KELM
32
0
0
20 Apr 2025
Linking forward-pass dynamics in Transformers and real-time human processing
Linking forward-pass dynamics in Transformers and real-time human processing
Jennifer Hu
Michael A. Lepori
Michael Franke
AI4CE
45
0
0
18 Apr 2025
Decoding Vision Transformers: the Diffusion Steering Lens
Decoding Vision Transformers: the Diffusion Steering Lens
Ryota Takatsuki
Sonia Joseph
Ippei Fujisawa
Ryota Kanai
DiffM
30
0
0
18 Apr 2025
Localized Cultural Knowledge is Conserved and Controllable in Large Language Models
Localized Cultural Knowledge is Conserved and Controllable in Large Language Models
V. Veselovsky
Berke Argin
Benedikt Stroebl
Chris Wendler
Robert West
James Evans
Thomas L. Griffiths
Arvind Narayanan
53
0
0
14 Apr 2025
Page Classification for Print Imaging Pipeline
Page Classification for Print Imaging Pipeline
Shaoyuan Xu
Cheng Lu
Mark Shaw
Peter Bauer
J. Allebach
VLM
35
0
0
03 Apr 2025
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
Guy Kaplan
Michael Toker
Yuval Reif
Yonatan Belinkov
Roy Schwartz
DiffM
48
0
0
01 Apr 2025
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
Yunzhi Yao
Jizhan Fang
Jia-Chen Gu
N. Zhang
Shumin Deng
H. Chen
Nanyun Peng
KELM
54
1
0
20 Mar 2025
LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates
LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates
Ying Shen
Lifu Huang
41
1
0
20 Mar 2025
Combining Causal Models for More Accurate Abstractions of Neural Networks
Theodora-Mara Pîslar
Sara Magliacane
Atticus Geiger
AI4CE
50
0
0
14 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models?
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
45
0
0
14 Mar 2025
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
Jiuding Sun
Jing Huang
Sidharth Baskaran
Karel DÓosterlinck
Christopher Potts
Michael Sklar
Atticus Geiger
AI4CE
55
0
0
13 Mar 2025
Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing
Neemesh Yadav
Jiarui Liu
Francesco Ortu
Roya Ensafi
Zhijing Jin
Rada Mihalcea
33
0
0
07 Mar 2025
Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation
Jonathan Jacobi
Gal Niv
LRM
ReLM
55
0
0
03 Mar 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Tianyi Lorena Yan
Robin Jia
KELM
MU
46
0
0
27 Feb 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
100
0
0
24 Feb 2025
Do Multilingual LLMs Think In English?
Do Multilingual LLMs Think In English?
Lisa Schut
Y. Gal
Sebastian Farquhar
40
3
0
24 Feb 2025
Designing Role Vectors to Improve LLM Inference Behaviour
Designing Role Vectors to Improve LLM Inference Behaviour
Daniele Potertì
Andrea Seveso
Fabio Mercorio
LLMSV
42
0
0
17 Feb 2025
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Shichang Zhang
Tessa Han
Usha Bhalla
Hima Lakkaraju
FAtt
145
0
0
17 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
49
0
0
08 Feb 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations
SEER: Self-Explainability Enhancement of Large Language Models' Representations
Guanxu Chen
Dongrui Liu
Tao Luo
Jing Shao
LRM
MILM
59
1
0
07 Feb 2025
Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference
Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference
Go Kamoda
Benjamin Heinzerling
Tatsuro Inaba
Keito Kudo
Keisuke Sakaguchi
Kentaro Inui
MILM
31
0
0
27 Jan 2025
An In-depth Investigation of Sparse Rate Reduction in Transformer-like
  Models
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
Yunzhe Hu
Difan Zou
Dong Xu
66
1
0
26 Nov 2024
Do Large Language Models Perform Latent Multi-Hop Reasoning without
  Exploiting Shortcuts?
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
Sohee Yang
Nora Kassner
E. Gribovskaya
Sebastian Riedel
Mor Geva
KELM
LRM
ReLM
78
4
0
25 Nov 2024
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang
Junkai Chen
Beier Zhu
Tingjin Luo
Yankun Shen
Xu Yang
90
4
0
23 Nov 2024
Controllable Context Sensitivity and the Knob Behind It
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
36
3
0
11 Nov 2024
Towards Unifying Interpretability and Control: Evaluation via Intervention
Towards Unifying Interpretability and Control: Evaluation via Intervention
Usha Bhalla
Suraj Srinivas
Asma Ghandeharioun
Himabindu Lakkaraju
33
5
0
07 Nov 2024
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Zhaofeng Wu
Xinyan Velocity Yu
Dani Yogatama
Jiasen Lu
Yoon Kim
AIFin
43
10
0
07 Nov 2024
Learning and Unlearning of Fabricated Knowledge in Language Models
Learning and Unlearning of Fabricated Knowledge in Language Models
Chen Sun
Nolan Miller
A. Zhmoginov
Max Vladymyrov
Mark Sandler
KELM
MU
20
1
0
29 Oct 2024
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse
  Autoencoders
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Viacheslav Surkov
Chris Wendler
Mikhail Terekhov
Justin Deschenaux
Robert West
Çağlar Gülçehre
VLM
38
13
0
28 Oct 2024
CogSteer: Cognition-Inspired Selective Layer Intervention for Efficient
  Semantic Steering in Large Language Models
CogSteer: Cognition-Inspired Selective Layer Intervention for Efficient Semantic Steering in Large Language Models
Xintong Wang
Jingheng Pan
Longqin Jiang
Liang Ding
Xingshan Li
Chris Biemann
LLMSV
19
0
0
23 Oct 2024
Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning
Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning
Minseok Choi
C. Park
Dohyun Lee
Jaegul Choo
KELM
MU
29
1
0
17 Oct 2024
Eliciting Textual Descriptions from Representations of Continuous
  Prompts
Eliciting Textual Descriptions from Representations of Continuous Prompts
Dana Ramati
Daniela Gottesman
Mor Geva
27
0
0
15 Oct 2024
From Tokens to Words: On the Inner Lexicon of LLMs
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
39
12
0
08 Oct 2024
Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
Zhuoran Zhang
Y. Li
Zijian Kan
Keyuan Cheng
Lijie Hu
Di Wang
KELM
24
4
0
08 Oct 2024
Meta-Models: An Architecture for Decoding LLM Behaviors Through
  Interpreted Embeddings and Natural Language
Meta-Models: An Architecture for Decoding LLM Behaviors Through Interpreted Embeddings and Natural Language
Anthony Costarelli
Mat Allen
Severin Field
16
1
0
03 Oct 2024
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Nick Jiang
Anish Kachinthaya
Suzie Petryk
Yossi Gandelsman
VLM
29
14
0
03 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
80
1
0
02 Oct 2024
Extracting Paragraphs from LLM Token Activations
Extracting Paragraphs from LLM Token Activations
Nicholas Pochinkov
Angelo Benoit
Lovkush Agarwal
Zainab Ali Majid
Lucile Ter-Minassian
30
1
0
10 Sep 2024
Evidence from fMRI Supports a Two-Phase Abstraction Process in Language
  Models
Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models
Emily Cheng
Richard Antonello
72
4
0
09 Sep 2024
Attend First, Consolidate Later: On the Importance of Attention in
  Different LLM Layers
Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers
Amit Ben Artzy
Roy Schwartz
25
5
0
05 Sep 2024
Active Testing of Large Language Model via Multi-Stage Sampling
Active Testing of Large Language Model via Multi-Stage Sampling
Yuheng Huang
Jiayang Song
Qiang Hu
Felix Juefei-Xu
Lei Ma
16
2
0
07 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
26
10
0
27 Jul 2024
Demystifying Verbatim Memorization in Large Language Models
Demystifying Verbatim Memorization in Large Language Models
Jing Huang
Diyi Yang
Christopher Potts
ELM
PILM
MU
45
19
0
25 Jul 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong-jia Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
47
1
0
22 Jul 2024
Multilingual Blending: LLM Safety Alignment Evaluation with Language
  Mixture
Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture
Jiayang Song
Yuheng Huang
Zhehua Zhou
Lei Ma
37
6
0
10 Jul 2024
Functional Faithfulness in the Wild: Circuit Discovery with
  Differentiable Computation Graph Pruning
Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning
Lei Yu
Jingcheng Niu
Zining Zhu
Gerald Penn
28
3
0
04 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
67
18
0
02 Jul 2024
Hopping Too Late: Exploring the Limitations of Large Language Models on
  Multi-Hop Queries
Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries
Eden Biran
Daniela Gottesman
Sohee Yang
Mor Geva
Amir Globerson
LRM
34
21
0
18 Jun 2024
Estimating Knowledge in Large Language Models Without Generating a
  Single Token
Estimating Knowledge in Large Language Models Without Generating a Single Token
Daniela Gottesman
Mor Geva
37
10
0
18 Jun 2024
12
Next