ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.12344
  4. Cited By
The Better Angels of Machine Personality: How Personality Relates to LLM
  Safety

The Better Angels of Machine Personality: How Personality Relates to LLM Safety

17 July 2024
Jie M. Zhang
Dongrui Liu
Chao Qian
Ziyue Gan
Yong-jin Liu
Yu Qiao
Jing Shao
    LLMAG
    PILM
ArXivPDFHTML

Papers citing "The Better Angels of Machine Personality: How Personality Relates to LLM Safety"

9 / 9 papers shown
Title
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
Jiahao Qiu
Yinghui He
Xinzhe Juan
Y. Wang
Y. Liu
Zixin Yao
Yue Wu
Xun Jiang
L. Yang
Mengdi Wang
AI4MH
62
0
0
13 Apr 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations
SEER: Self-Explainability Enhancement of Large Language Models' Representations
Guanxu Chen
Dongrui Liu
Tao Luo
Jing Shao
LRM
MILM
59
1
0
07 Feb 2025
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
89
13
0
06 Sep 2024
EasyJailbreak: A Unified Framework for Jailbreaking Large Language
  Models
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Weikang Zhou
Xiao Wang
Limao Xiong
Han Xia
Yingshuang Gu
...
Lijun Li
Jing Shao
Tao Gui
Qi Zhang
Xuanjing Huang
71
29
0
18 Mar 2024
Illuminating the Black Box: A Psychometric Investigation into the
  Multifaceted Nature of Large Language Models
Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models
Yang Lu
Jordan Yu
Shou-Hsuan Stephen Huang
28
2
0
21 Dec 2023
Learning to Model Editing Processes
Learning to Model Editing Processes
Machel Reid
Graham Neubig
KELM
BDL
101
34
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Trustworthy AI: A Computational Perspective
Trustworthy AI: A Computational Perspective
Haochen Liu
Yiqi Wang
Wenqi Fan
Xiaorui Liu
Yaxin Li
Shaili Jain
Yunhao Liu
Anil K. Jain
Jiliang Tang
FaML
90
193
0
12 Jul 2021
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,561
0
18 Sep 2019
1