Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.12344
Cited By
The Better Angels of Machine Personality: How Personality Relates to LLM Safety
17 July 2024
Jie M. Zhang
Dongrui Liu
Chao Qian
Ziyue Gan
Yong-jin Liu
Yu Qiao
Jing Shao
LLMAG
PILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Better Angels of Machine Personality: How Personality Relates to LLM Safety"
9 / 9 papers shown
Title
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
Jiahao Qiu
Yinghui He
Xinzhe Juan
Y. Wang
Y. Liu
Zixin Yao
Yue Wu
Xun Jiang
L. Yang
Mengdi Wang
AI4MH
62
0
0
13 Apr 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations
Guanxu Chen
Dongrui Liu
Tao Luo
Jing Shao
LRM
MILM
59
1
0
07 Feb 2025
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
89
13
0
06 Sep 2024
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Weikang Zhou
Xiao Wang
Limao Xiong
Han Xia
Yingshuang Gu
...
Lijun Li
Jing Shao
Tao Gui
Qi Zhang
Xuanjing Huang
71
29
0
18 Mar 2024
Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models
Yang Lu
Jordan Yu
Shou-Hsuan Stephen Huang
28
2
0
21 Dec 2023
Learning to Model Editing Processes
Machel Reid
Graham Neubig
KELM
BDL
101
34
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Trustworthy AI: A Computational Perspective
Haochen Liu
Yiqi Wang
Wenqi Fan
Xiaorui Liu
Yaxin Li
Shaili Jain
Yunhao Liu
Anil K. Jain
Jiliang Tang
FaML
90
193
0
12 Jul 2021
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,561
0
18 Sep 2019
1