Understanding intermediate layers using linear classifier probes

5 October 2016

Papers citing "Understanding intermediate layers using linear classifier probes"

50 / 158 papers shown

Title
A Multi-Perspective Analysis of Memorization in Large Language Models Bowen Chen Namgi Han Yusuke Miyao 38 1 0 19 May 2024
Linear Explanations for Individual Neurons Tuomas P. Oikarinen Tsui-Wei Weng FAtt MILM 29 5 0 10 May 2024
A separability-based approach to quantifying generalization: which layer is best? Luciano Dyballa Evan Gerritz Steven W. Zucker OOD 28 3 0 02 May 2024
Does Transformer Interpretability Transfer to RNNs? Gonccalo Paulo Thomas Marshall Nora Belrose 57 6 0 09 Apr 2024
Joint-Embedding Masked Autoencoder for Self-supervised Learning of Dynamic Functional Connectivity from the Human Brain Jungwon Choi Hyungi Lee Byung-Hoon Kim Juho Lee 72 0 0 11 Mar 2024
Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations GuanWen Qiu Da Kuang Surbhi Goel 25 8 0 05 Mar 2024
Language Models Represent Beliefs of Self and Others Wentao Zhu Zhining Zhang Yizhou Wang MILM LRM 42 7 0 28 Feb 2024
Descriptive Kernel Convolution Network with Improved Random Walk Kernel Meng-Chieh Lee Lingxiao Zhao L. Akoglu 18 3 0 08 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ... Michael Gerovitch David Bau Max Tegmark David M. Krueger Dylan Hadfield-Menell AAML 20 76 0 25 Jan 2024
Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable? Sonia Laguna Ricards Marcinkevics Moritz Vandenhirtz Julia E. Vogt 22 17 0 24 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models Asma Ghandeharioun Avi Caciularu Adam Pearce Lucas Dixon Mor Geva 27 87 0 11 Jan 2024
Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing Jaeill Kim Duhun Hwang Eunjung Lee Jangwon Suh Jimyeong Kim Wonjong Rhee 28 0 0 11 Jan 2024
FlexModel: A Framework for Interpretability of Distributed Large Language Models Matthew Choi Muhammad Adil Asif John Willes David Emerson AI4CE ALM 22 1 0 05 Dec 2023
Revisiting Topic-Guided Language Models Carolina Zheng Keyon Vafa David M. Blei BDL 27 1 0 04 Dec 2023
Identifying Spurious Correlations using Counterfactual Alignment Joseph Paul Cohen Louis Blankemeier Akshay S. Chaudhari CML 55 1 0 01 Dec 2023
Looped Transformers are Better at Learning Learning Algorithms Liu Yang Kangwook Lee Robert D. Nowak Dimitris Papailiopoulos 24 24 0 21 Nov 2023
Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots Ruixiang Tang Jiayi Yuan Yiming Li Zirui Liu Rui Chen Xia Hu AAML 36 13 0 28 Oct 2023
Codebook Features: Sparse and Discrete Interpretability for Neural Networks Alex Tamkin Mohammad Taufeeque Noah D. Goodman 30 27 0 26 Oct 2023
Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning Lapo Frati Neil Traft Jeff Clune Nick Cheney CLL 19 0 0 12 Oct 2023
Uncovering the Hidden Cost of Model Compression Diganta Misra Muawiz Chaudhary Agam Goyal Bharat Runwal Pin-Yu Chen VLM 30 0 0 29 Aug 2023
Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes Yosuke Miyanishi M. Nguyen 26 2 0 19 Aug 2023
Concept backpropagation: An Explainable AI approach for visualising learned concepts in neural network models Patrik Hammersborg Inga Strümke FAtt 16 0 0 24 Jul 2023
Systematic Architectural Design of Scale Transformed Attention Condenser DNNs via Multi-Scale Class Representational Response Similarity Analysis Andrew Hryniowski Alexander Wong 13 0 0 16 Jun 2023
Gaussian Process Probes (GPP) for Uncertainty-Aware Probing Z. Wang Alexander Ku Jason Baldridge Thomas L. Griffiths Been Kim UQCV 21 11 0 29 May 2023
Reverse Engineering Self-Supervised Learning Ido Ben-Shaul Ravid Shwartz-Ziv Tomer Galanti S. Dekel Yann LeCun SSL 15 34 0 24 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval Arijit Ray Filip Radenovic Abhimanyu Dubey Bryan A. Plummer Ranjay Krishna Kate Saenko CoGe VLM 38 34 0 05 May 2023
SR-init: An interpretable layer pruning method Hui Tang Yao Lu Qi Xuan 15 8 0 14 Mar 2023
Revisiting Pre-training in Audio-Visual Learning Ruoxuan Feng Wenke Xia Di Hu 22 1 0 07 Feb 2023
Identifiability of latent-variable and structural-equation models: from linear to nonlinear Aapo Hyvarinen Ilyes Khemakhem R. Monti CML 30 41 0 06 Feb 2023
Trustworthy Social Bias Measurement Rishi Bommasani Percy Liang 27 10 0 20 Dec 2022
A Natural Bias for Language Generation Models Clara Meister Wojciech Stokowiec Tiago Pimentel Lei Yu Laura Rimell A. Kuncoro MILM 25 6 0 19 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning Shachar Don-Yehiya Elad Venezian Colin Raffel Noam Slonim Yoav Katz Leshem Choshen MoMe 26 52 0 02 Dec 2022
Supervised Pretraining for Molecular Force Fields and Properties Prediction Xiang Gao Weihao Gao Wen Xiao Zhirui Wang Chong Wang Liang Xiang AI4CE 17 8 0 23 Nov 2022
Layer-Stack Temperature Scaling Amr Khalifa Michael C. Mozer Hanie Sedghi Behnam Neyshabur Ibrahim M. Alabdulmohsin 75 2 0 18 Nov 2022
Emergence of Concepts in DNNs? Tim Räz 17 0 0 11 Nov 2022
Reinforcement Learning in an Adaptable Chess Environment for Detecting Human-understandable Concepts Patrik Hammersborg Inga Strümke 12 5 0 10 Nov 2022
COPEN: Probing Conceptual Knowledge in Pre-trained Language Models Hao Peng Xiaozhi Wang Shengding Hu Hailong Jin Lei Hou Juanzi Li Zhiyuan Liu Qun Liu 15 22 0 08 Nov 2022
A Law of Data Separation in Deep Learning Hangfeng He Weijie J. Su OOD 21 36 0 31 Oct 2022
The Curious Case of Benign Memorization Sotiris Anagnostidis Gregor Bachmann Lorenzo Noci Thomas Hofmann AAML 41 8 0 25 Oct 2022
GULP: a prediction-based metric between representations Enric Boix Adserà Hannah Lawrence George Stepaniants Philippe Rigollet 38 11 0 12 Oct 2022
Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis P. Braca L. Millefiori A. Aubry S. Maranò A. De Maio P. Willett 27 12 0 22 Jul 2022
Lipschitz Continuity Retained Binary Neural Network Yuzhang Shang Dan Xu Bin Duan Ziliang Zong Liqiang Nie Yan Yan 11 19 0 13 Jul 2022
Probing via Prompting Jiaoda Li Ryan Cotterell Mrinmaya Sachan 29 13 0 04 Jul 2022
When are Post-hoc Conceptual Explanations Identifiable? Tobias Leemann Michael Kirchhof Yao Rong Enkelejda Kasneci Gjergji Kasneci 50 10 0 28 Jun 2022
Evaluating Self-Supervised Learning for Molecular Graph Embeddings Hanchen Wang Jean Kaddour Shengchao Liu Jian Tang Joan Lasenby Qi Liu 24 20 0 16 Jun 2022
Disentangling visual and written concepts in CLIP Joanna Materzyñska Antonio Torralba David Bau CoGe 21 47 0 15 Jun 2022
Contrastive Learning as Goal-Conditioned Reinforcement Learning Benjamin Eysenbach Tianjun Zhang Ruslan Salakhutdinov Sergey Levine SSL OffRL 23 137 0 15 Jun 2022
On the Usefulness of Embeddings, Clusters and Strings for Text Generator Evaluation Tiago Pimentel Clara Meister Ryan Cotterell 38 7 0 31 May 2022
Self-supervised models of audio effectively explain human cortical responses to speech Aditya R. Vaidya Shailee Jain Alexander G. Huth 23 42 0 27 May 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi N. Harada K. Kashino SSL 34 53 0 15 Apr 2022