Learning Interpretable Concepts: Unifying Causal Representation Learning
and Foundation Models

Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models

14 February 2024

Goutham Rajendran

Bernhard Schölkopf

Pradeep Ravikumar

Papers citing "Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models"

12 / 12 papers shown

Title
Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models Ruta Binkyte Ivaxi Sheth Zhijing Jin Mohammad Havaei Bernhard Schölkopf Mario Fritz 44 0 0 28 Feb 2025
LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation Bowen Li Zhaoyu Li Qiwei Du Jinqi Luo Wenshan Wang ... Katia P. Sycara Pradeep Kumar Ravikumar Alexander G. Gray X. Si Sebastian A. Scherer AI4CE LRM 71 2 0 01 Nov 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling Emanuele Marconato Sébastien Lachapelle Sebastian Weichwald Luigi Gresele 50 3 0 30 Oct 2024
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 153 170 0 02 May 2023
Posterior Collapse and Latent Variable Non-identifiability Yixin Wang David M. Blei John P. Cunningham CML DRL 65 70 0 02 Jan 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 207 486 0 01 Nov 2022
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 117 314 0 21 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 213 327 0 23 Aug 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,261 0 28 Jan 2022
Contrastive Learning Inverts the Data Generating Process Roland S. Zimmermann Yash Sharma Steffen Schneider Matthias Bethge Wieland Brendel SSL 227 206 0 17 Feb 2021
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 196 876 0 03 May 2018