CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models

22 December 2021

Papers citing "CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models"

14 / 14 papers shown

Title
Reasoning Elicitation in Language Models via Counterfactual Feedback Alihan Hüyük Xinnuo Xu Jacqueline R. M. A. Maasch Aditya V. Nori Javier González ReLM LRM 292 1 0 02 Oct 2024
Counterfactual Token Generation in Large Language Models Ivi Chatzi N. C. Benz Eleni Straitouri Stratis Tsirtsis Manuel Gomez Rodriguez LRM 49 4 0 25 Sep 2024
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding Yijia Xiao Edward Sun Yiqiao Jin Qifan Wang Wei Wang 50 13 0 21 Aug 2024
ACCORD: Closing the Commonsense Measurability Gap François Roewer-Després Jinyue Feng Zining Zhu Frank Rudzicz LRM 61 0 0 04 Jun 2024
Shaking the foundations: delusions in sequence models for interaction and control Pedro A. Ortega M. Kunesch Grégoire Delétang Tim Genewein Jordi Grau-Moya ... Yutian Chen Scott E. Reed Marcus Hutter Nando de Freitas Shane Legg 25 64 0 20 Oct 2021
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark Nicholas Lourie Ronan Le Bras Chandra Bhagavatula Yejin Choi LRM 42 138 0 24 Mar 2021
Towards Causal Representation Learning Bernhard Schölkopf Francesco Locatello Stefan Bauer Nan Rosemary Ke Nal Kalchbrenner Anirudh Goyal Yoshua Bengio OOD CML AI4CE 78 320 0 22 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 328 2,051 0 31 Dec 2020
MPNet: Masked and Permuted Pre-training for Language Understanding Kaitao Song Xu Tan Tao Qin Jianfeng Lu Tie-Yan Liu 63 1,093 0 20 Apr 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference Timo Schick Hinrich Schütze 282 1,606 0 21 Jan 2020
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions Christopher Clark Kenton Lee Ming-Wei Chang Tom Kwiatkowski Michael Collins Kristina Toutanova 110 1,475 0 24 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems Alex Jinpeng Wang Yada Pruksachatkun Nikita Nangia Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 106 2,287 0 02 May 2019
Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR Sandra Wachter Brent Mittelstadt Chris Russell MLAU 29 2,332 0 01 Nov 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference Adina Williams Nikita Nangia Samuel R. Bowman 287 4,444 0 18 Apr 2017