Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.13018
Cited By
Getting aligned on representational alignment
18 October 2023
Ilia Sucholutsky
Lukas Muttenthaler
Adrian Weller
Andi Peng
Andreea Bobu
Been Kim
Bradley C. Love
Erin Grant
Iris Groen
Jascha Achterberg
Joshua B. Tenenbaum
Katherine M. Collins
Katherine L. Hermann
Kerem Oktar
Klaus Greff
M. Hebart
Nori Jacoby
Qiuyi Zhang
Raja Marjieh
Robert Geirhos
Sherol Chen
Simon Kornblith
Sunayana Rane
Talia Konkle
Thomas P. O'Connell
Thomas Unterthiner
Andrew Kyle Lampinen
Klaus-Robert Muller
M. Toneva
Thomas L. Griffiths
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Getting aligned on representational alignment"
23 / 23 papers shown
Title
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
Kola Ayonrinde
Louis Jaburi
MILM
60
1
0
01 May 2025
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
Max Klabunde
Tassilo Wald
Tobias Schumacher
Klaus H. Maier-Hein
Markus Strohmaier
Adriana Iamnitchi
AI4TS
VLM
52
5
0
13 Mar 2025
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan
Julian Forsyth
Thomas Fel
M. Kowal
Konstantinos G. Derpanis
69
7
0
06 Feb 2025
We're Different, We're the Same: Creative Homogeneity Across LLMs
Emily Wenger
Yoed Kenett
81
3
0
31 Jan 2025
Dimensions underlying the representational alignment of deep neural networks with humans
F. Mahner
Lukas Muttenthaler
Umut Güçlü
M. Hebart
28
4
0
28 Jan 2025
Measuring Error Alignment for Decision-Making Systems
Binxia Xu
Antonis Bikakis
Daniel Onah
A. Vlachidis
Luke Dickens
29
0
0
03 Jan 2025
Quantifying Knowledge Distillation Using Partial Information Decomposition
Pasan Dissanayake
Faisal Hamman
Barproda Halder
Ilia Sucholutsky
Qiuyi Zhang
Sanghamitra Dutta
31
0
0
12 Nov 2024
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng
Diego Doimo
Corentin Kervadec
Iuri Macocco
Jade Yu
A. Laio
Marco Baroni
98
11
0
24 May 2024
Learning with Language-Guided State Abstractions
Andi Peng
Ilia Sucholutsky
Belinda Z. Li
T. Sumers
Thomas L. Griffiths
Jacob Andreas
Julie A. Shah
LM&Ro
29
8
0
28 Feb 2024
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
38
63
0
10 May 2023
Human Uncertainty in Concept-Based AI Systems
Katherine M. Collins
Matthew Barker
M. Zarlenga
Naveen Raman
Umang Bhatt
M. Jamnik
Ilia Sucholutsky
Adrian Weller
Krishnamurthy Dvijotham
52
39
0
22 Mar 2023
Analyzing Diffusion as Serial Reproduction
Raja Marjieh
Ilia Sucholutsky
Thomas A. Langlois
Nori Jacoby
Thomas L. Griffiths
DiffM
25
4
0
29 Sep 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
217
495
0
28 Sep 2022
Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off
M. Zarlenga
Pietro Barbiero
Gabriele Ciravegna
G. Marra
Francesco Giannini
...
F. Precioso
S. Melacci
Adrian Weller
Pietro Lio'
M. Jamnik
47
52
0
19 Sep 2022
The developmental trajectory of object recognition robustness: children are like small adults but unlike big deep neural networks
Lukas Huber
Robert Geirhos
Felix Wichmann
35
12
0
20 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Passive Attention in Artificial Neural Networks Predicts Human Visual Selectivity
Thomas A. Langlois
H. C. Zhao
Erin Grant
Ishita Dasgupta
Thomas L. Griffiths
Nori Jacoby
33
13
0
14 Jul 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
3,790
0
24 Feb 2021
On the surprising similarities between supervised and self-supervised models
Robert Geirhos
Kantharaju Narayanappa
Benjamin Mitzkus
Matthias Bethge
Felix Wichmann
Wieland Brendel
OOD
SSL
DRL
45
45
0
16 Oct 2020
On Completeness-aware Concept-Based Explanations in Deep Neural Networks
Chih-Kuan Yeh
Been Kim
Sercan Ö. Arik
Chun-Liang Li
Tomas Pfister
Pradeep Ravikumar
FAtt
115
293
0
17 Oct 2019
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,561
0
18 Sep 2019
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
234
11,568
0
09 Mar 2017
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Balaji Lakshminarayanan
Alexander Pritzel
Charles Blundell
UQCV
BDL
268
4,940
0
05 Dec 2016
1