ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.14324
  4. Cited By
Evaluating the Moral Beliefs Encoded in LLMs

Evaluating the Moral Beliefs Encoded in LLMs

26 July 2023
Nino Scherrer
Claudia Shi
Amir Feder
David M. Blei
ArXivPDFHTML

Papers citing "Evaluating the Moral Beliefs Encoded in LLMs"

26 / 26 papers shown
Title
A Comparative Analysis of Ethical and Safety Gaps in LLMs using Relative Danger Coefficient
A Comparative Analysis of Ethical and Safety Gaps in LLMs using Relative Danger Coefficient
Yehor Tereshchenko
Mika Hämäläinen
ELM
31
1
0
06 May 2025
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
Ayoung Lee
Ryan Sungmo Kwon
Peter Railton
Lu Wang
ELM
49
0
0
15 Apr 2025
Unmasking Conversational Bias in AI Multiagent Systems
Unmasking Conversational Bias in AI Multiagent Systems
Erica Coppolillo
Giuseppe Manco
Luca Maria Aiello
LLMAG
52
0
0
24 Jan 2025
Scopes of Alignment
Scopes of Alignment
Kush R. Varshney
Zahra Ashktorab
Djallel Bouneffouf
Matthew D Riemer
Justin D. Weisz
34
0
0
15 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
108
61
0
25 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
73
0
0
12 Nov 2024
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Yu Ying Chiu
Liwei Jiang
Yejin Choi
51
2
0
03 Oct 2024
Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models
Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models
Haoran Ye
Yuhang Xie
Yuanyi Ren
Hanjun Fang
Xin Zhang
Guojie Song
LM&MA
30
1
0
18 Sep 2024
Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional
  Principles in Complex Scenarios
Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional Principles in Complex Scenarios
Camilla Bignotti
C. Camassa
AILaw
ELM
36
1
0
29 Jul 2024
Does Cross-Cultural Alignment Change the Commonsense Morality of
  Language Models?
Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?
Yuu Jinnai
47
1
0
24 Jun 2024
ValueBench: Towards Comprehensively Evaluating Value Orientations and
  Understanding of Large Language Models
ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models
Yuanyi Ren
Haoran Ye
Hanjun Fang
Xin Zhang
Guojie Song
LLMAG
ELM
29
3
0
06 Jun 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
53
30
0
08 Apr 2024
SaGE: Evaluating Moral Consistency in Large Language Models
SaGE: Evaluating Moral Consistency in Large Language Models
Vamshi Bonagiri
Sreeram Vennam
Priyanshul Govil
Ponnurangam Kumaraguru
Manas Gaur
ELM
41
0
0
21 Feb 2024
Moral Foundations of Large Language Models
Moral Foundations of Large Language Models
Marwa Abdulhai
Gregory Serapio-Garcia
Clément Crepy
Daria Valter
John Canny
Natasha Jaques
LRM
57
42
0
23 Oct 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
215
1,701
0
07 Apr 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
218
2,232
0
22 Mar 2023
An Analysis of the Effects of Decoding Algorithms on Fairness in
  Open-Ended Language Generation
An Analysis of the Effects of Decoding Algorithms on Fairness in Open-Ended Language Generation
Jwala Dhamala
Varun Kumar
Rahul Gupta
Kai-Wei Chang
Aram Galstyan
16
7
0
07 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
495
0
28 Sep 2022
Moral Mimicry: Large Language Models Produce Moral Rationalizations
  Tailored to Political Identity
Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity
Gabriel Simmons
98
57
0
24 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy
Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy
Kathleen C. Fraser
S. Kiritchenko
Esma Balkir
99
36
0
25 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Can Machines Learn Morality? The Delphi Experiment
Can Machines Learn Morality? The Delphi Experiment
Liwei Jiang
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Jenny T Liang
...
Yulia Tsvetkov
Oren Etzioni
Maarten Sap
Regina A. Rini
Yejin Choi
FaML
117
110
0
14 Oct 2021
Unsolved Problems in ML Safety
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
173
268
0
28 Sep 2021
Measuring and Improving Consistency in Pretrained Language Models
Measuring and Improving Consistency in Pretrained Language Models
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Abhilasha Ravichander
Eduard H. Hovy
Hinrich Schütze
Yoav Goldberg
HILM
258
343
0
01 Feb 2021
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
1