Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents

Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents

11 January 2024

Quentin Delfosse

Sebastian Sztwiertnia

Wolfgang Stammer

Kristian Kersting

Papers citing "Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents"

12 / 12 papers shown

Title
Interpretable end-to-end Neurosymbolic Reinforcement Learning agents Nils Grandien Quentin Delfosse Kristian Kersting OffRL 21 2 0 18 Oct 2024
BlendRL: A Framework for Merging Symbolic and Neural Policy Learning Hikaru Shindo Quentin Delfosse D. Dhami Kristian Kersting 33 3 0 15 Oct 2024
Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations Yupei Yang Biwei Huang Fan Feng Xinyue Wang Shikui Tu Lei Xu CML OOD TTA 30 1 0 30 Jul 2024
Boosting Object Representation Learning via Motion and Object Continuity Quentin Delfosse Wolfgang Stammer Thomas Rothenbacher Dwarak Vittal Kristian Kersting OCL 16 20 0 16 Nov 2022
Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences L. Guan Karthik Valmeekam Subbarao Kambhampati 42 8 0 28 Oct 2022
Neural Networks are Decision Trees Çağlar Aytekin FAtt 24 24 0 11 Oct 2022
GlanceNets: Interpretabile, Leak-proof Concept-based Models Emanuele Marconato Andrea Passerini Stefano Teso 96 64 0 31 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,261 0 28 Jan 2022
Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations Wolfgang Stammer Marius Memmel P. Schramowski Kristian Kersting 76 25 0 04 Dec 2021
Adaptive Rational Activations to Boost Deep Reinforcement Learning Quentin Delfosse P. Schramowski Martin Mundt Alejandro Molina Kristian Kersting 26 8 0 18 Feb 2021
AI safety via debate G. Irving Paul Christiano Dario Amodei 196 199 0 02 May 2018
You Only Look Once: Unified, Real-Time Object Detection Joseph Redmon S. Divvala Ross B. Girshick Ali Farhadi ObjD 266 35,677 0 08 Jun 2015