Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1610.01644
Cited By
Understanding intermediate layers using linear classifier probes
5 October 2016
Guillaume Alain
Yoshua Bengio
FAtt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Understanding intermediate layers using linear classifier probes"
50 / 158 papers shown
Title
A Multi-Perspective Analysis of Memorization in Large Language Models
Bowen Chen
Namgi Han
Yusuke Miyao
38
1
0
19 May 2024
Linear Explanations for Individual Neurons
Tuomas P. Oikarinen
Tsui-Wei Weng
FAtt
MILM
29
5
0
10 May 2024
A separability-based approach to quantifying generalization: which layer is best?
Luciano Dyballa
Evan Gerritz
Steven W. Zucker
OOD
28
3
0
02 May 2024
Does Transformer Interpretability Transfer to RNNs?
Gonccalo Paulo
Thomas Marshall
Nora Belrose
57
6
0
09 Apr 2024
Joint-Embedding Masked Autoencoder for Self-supervised Learning of Dynamic Functional Connectivity from the Human Brain
Jungwon Choi
Hyungi Lee
Byung-Hoon Kim
Juho Lee
72
0
0
11 Mar 2024
Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations
GuanWen Qiu
Da Kuang
Surbhi Goel
25
8
0
05 Mar 2024
Language Models Represent Beliefs of Self and Others
Wentao Zhu
Zhining Zhang
Yizhou Wang
MILM
LRM
42
7
0
28 Feb 2024
Descriptive Kernel Convolution Network with Improved Random Walk Kernel
Meng-Chieh Lee
Lingxiao Zhao
L. Akoglu
18
3
0
08 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
20
76
0
25 Jan 2024
Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?
Sonia Laguna
Ricards Marcinkevics
Moritz Vandenhirtz
Julia E. Vogt
22
17
0
24 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
27
87
0
11 Jan 2024
Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing
Jaeill Kim
Duhun Hwang
Eunjung Lee
Jangwon Suh
Jimyeong Kim
Wonjong Rhee
28
0
0
11 Jan 2024
FlexModel: A Framework for Interpretability of Distributed Large Language Models
Matthew Choi
Muhammad Adil Asif
John Willes
David Emerson
AI4CE
ALM
22
1
0
05 Dec 2023
Revisiting Topic-Guided Language Models
Carolina Zheng
Keyon Vafa
David M. Blei
BDL
27
1
0
04 Dec 2023
Identifying Spurious Correlations using Counterfactual Alignment
Joseph Paul Cohen
Louis Blankemeier
Akshay S. Chaudhari
CML
55
1
0
01 Dec 2023
Looped Transformers are Better at Learning Learning Algorithms
Liu Yang
Kangwook Lee
Robert D. Nowak
Dimitris Papailiopoulos
24
24
0
21 Nov 2023
Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots
Ruixiang Tang
Jiayi Yuan
Yiming Li
Zirui Liu
Rui Chen
Xia Hu
AAML
36
13
0
28 Oct 2023
Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Alex Tamkin
Mohammad Taufeeque
Noah D. Goodman
30
27
0
26 Oct 2023
Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning
Lapo Frati
Neil Traft
Jeff Clune
Nick Cheney
CLL
19
0
0
12 Oct 2023
Uncovering the Hidden Cost of Model Compression
Diganta Misra
Muawiz Chaudhary
Agam Goyal
Bharat Runwal
Pin-Yu Chen
VLM
30
0
0
29 Aug 2023
Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes
Yosuke Miyanishi
M. Nguyen
26
2
0
19 Aug 2023
Concept backpropagation: An Explainable AI approach for visualising learned concepts in neural network models
Patrik Hammersborg
Inga Strümke
FAtt
16
0
0
24 Jul 2023
Systematic Architectural Design of Scale Transformed Attention Condenser DNNs via Multi-Scale Class Representational Response Similarity Analysis
Andrew Hryniowski
Alexander Wong
13
0
0
16 Jun 2023
Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
Z. Wang
Alexander Ku
Jason Baldridge
Thomas L. Griffiths
Been Kim
UQCV
21
11
0
29 May 2023
Reverse Engineering Self-Supervised Learning
Ido Ben-Shaul
Ravid Shwartz-Ziv
Tomer Galanti
S. Dekel
Yann LeCun
SSL
15
34
0
24 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
38
34
0
05 May 2023
SR-init: An interpretable layer pruning method
Hui Tang
Yao Lu
Qi Xuan
15
8
0
14 Mar 2023
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
22
1
0
07 Feb 2023
Identifiability of latent-variable and structural-equation models: from linear to nonlinear
Aapo Hyvarinen
Ilyes Khemakhem
R. Monti
CML
30
41
0
06 Feb 2023
Trustworthy Social Bias Measurement
Rishi Bommasani
Percy Liang
27
10
0
20 Dec 2022
A Natural Bias for Language Generation Models
Clara Meister
Wojciech Stokowiec
Tiago Pimentel
Lei Yu
Laura Rimell
A. Kuncoro
MILM
25
6
0
19 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
MoMe
26
52
0
02 Dec 2022
Supervised Pretraining for Molecular Force Fields and Properties Prediction
Xiang Gao
Weihao Gao
Wen Xiao
Zhirui Wang
Chong Wang
Liang Xiang
AI4CE
17
8
0
23 Nov 2022
Layer-Stack Temperature Scaling
Amr Khalifa
Michael C. Mozer
Hanie Sedghi
Behnam Neyshabur
Ibrahim M. Alabdulmohsin
75
2
0
18 Nov 2022
Emergence of Concepts in DNNs?
Tim Räz
17
0
0
11 Nov 2022
Reinforcement Learning in an Adaptable Chess Environment for Detecting Human-understandable Concepts
Patrik Hammersborg
Inga Strümke
12
5
0
10 Nov 2022
COPEN: Probing Conceptual Knowledge in Pre-trained Language Models
Hao Peng
Xiaozhi Wang
Shengding Hu
Hailong Jin
Lei Hou
Juanzi Li
Zhiyuan Liu
Qun Liu
15
22
0
08 Nov 2022
A Law of Data Separation in Deep Learning
Hangfeng He
Weijie J. Su
OOD
21
36
0
31 Oct 2022
The Curious Case of Benign Memorization
Sotiris Anagnostidis
Gregor Bachmann
Lorenzo Noci
Thomas Hofmann
AAML
41
8
0
25 Oct 2022
GULP: a prediction-based metric between representations
Enric Boix Adserà
Hannah Lawrence
George Stepaniants
Philippe Rigollet
38
11
0
12 Oct 2022
Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis
P. Braca
L. Millefiori
A. Aubry
S. Maranò
A. De Maio
P. Willett
27
12
0
22 Jul 2022
Lipschitz Continuity Retained Binary Neural Network
Yuzhang Shang
Dan Xu
Bin Duan
Ziliang Zong
Liqiang Nie
Yan Yan
11
19
0
13 Jul 2022
Probing via Prompting
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
29
13
0
04 Jul 2022
When are Post-hoc Conceptual Explanations Identifiable?
Tobias Leemann
Michael Kirchhof
Yao Rong
Enkelejda Kasneci
Gjergji Kasneci
50
10
0
28 Jun 2022
Evaluating Self-Supervised Learning for Molecular Graph Embeddings
Hanchen Wang
Jean Kaddour
Shengchao Liu
Jian Tang
Joan Lasenby
Qi Liu
24
20
0
16 Jun 2022
Disentangling visual and written concepts in CLIP
Joanna Materzyñska
Antonio Torralba
David Bau
CoGe
21
47
0
15 Jun 2022
Contrastive Learning as Goal-Conditioned Reinforcement Learning
Benjamin Eysenbach
Tianjun Zhang
Ruslan Salakhutdinov
Sergey Levine
SSL
OffRL
23
137
0
15 Jun 2022
On the Usefulness of Embeddings, Clusters and Strings for Text Generator Evaluation
Tiago Pimentel
Clara Meister
Ryan Cotterell
38
7
0
31 May 2022
Self-supervised models of audio effectively explain human cortical responses to speech
Aditya R. Vaidya
Shailee Jain
Alexander G. Huth
23
42
0
27 May 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
SSL
34
53
0
15 Apr 2022
Previous
1
2
3
4
Next