Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.18168
Cited By
Personas as a Way to Model Truthfulness in Language Models
27 October 2023
Nitish Joshi
Javier Rando
Abulhair Saparov
Najoung Kim
He He
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Personas as a Way to Model Truthfulness in Language Models"
3 / 3 papers shown
Title
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
61
7
0
26 May 2024
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
218
299
0
26 Apr 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,915
0
04 Mar 2022
1