What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations

30 November 2023

Papers citing "What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations"

5 / 5 papers shown

Title
Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation Xiangjue Dong Yibo Wang Philip S. Yu James Caverlee 24 25 0 01 Nov 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Assessing the Reliability of Word Embedding Gender Bias Measures Yupei Du Qixiang Fang D. Nguyen 27 21 0 10 Sep 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 219 291 0 24 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 236 1,508 0 31 Dec 2020