Leveraging Contextual Counterfactuals Toward Belief Calibration

13 July 2023

Papers citing "Leveraging Contextual Counterfactuals Toward Belief Calibration"

6 / 6 papers shown

Title
Getting aligned on representational alignment Ilia Sucholutsky Lukas Muttenthaler Adrian Weller Andi Peng Andreea Bobu ... Thomas Unterthiner Andrew Kyle Lampinen Klaus-Robert Muller M. Toneva Thomas L. Griffiths 54 72 0 18 Oct 2023
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 225 495 0 28 Sep 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Counterfactual Plans under Distributional Ambiguity N. Bui D. Nguyen Viet Anh Nguyen 54 24 0 29 Jan 2022
ViCE: Visual Counterfactual Explanations for Machine Learning Models Oscar Gomez Steffen Holter Jun Yuan E. Bertini AAML 49 92 0 05 Mar 2020
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 273 1,561 0 18 Sep 2019