ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.01639
  4. Cited By
Moral Alignment for LLM Agents
v1v2v3v4 (latest)

Moral Alignment for LLM Agents

International Conference on Learning Representations (ICLR), 2024
2 October 2024
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
ArXiv (abs)PDFHTMLGithub (9★)

Papers citing "Moral Alignment for LLM Agents"

50 / 69 papers shown
Black-Box Guardrail Reverse-engineering Attack
Black-Box Guardrail Reverse-engineering Attack
Hongwei Yao
Yun Xia
Shuo Shao
Haoran Shi
Tong Qiao
C. Wang
AAML
243
0
0
06 Nov 2025
Accumulating Context Changes the Beliefs of Language Models
Accumulating Context Changes the Beliefs of Language Models
Jiayi Geng
Howard Chen
Ryan Liu
Manoel Horta Ribeiro
Robb Willer
Graham Neubig
Thomas L. Griffiths
KELM
532
7
0
03 Nov 2025
Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning
Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning
P. Migliarini
Mashal Afzal Memon
Marco Autili
P. Inverardi
LRM
111
0
0
01 Oct 2025
MoVa: Towards Generalizable Classification of Human Morals and Values
MoVa: Towards Generalizable Classification of Human Morals and Values
Ziyu Chen
Junfei Sun
Chenxi Li
Tuan Dung Nguyen
Jing Yao
Xiaoyuan Yi
Xing Xie
Chenhao Tan
Lexing Xie
140
5
0
29 Sep 2025
Survival at Any Cost? LLMs and the Choice Between Self-Preservation and Human Harm
Survival at Any Cost? LLMs and the Choice Between Self-Preservation and Human Harm
Alireza Mohamadi
Ali Yavari
141
1
0
15 Sep 2025
From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users
From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users
Sadia Sultana Chowa
Riasad Alvi
Subhey Sadi Rahman
M. R
M. R
M. Islam
Mukhtar Hussain
Sami Azam
LLMAGLM&RoELM
396
20
0
24 Aug 2025
Black Box Deployed -- Functional Criteria for Artificial Moral Agents in the LLM Era
Black Box Deployed -- Functional Criteria for Artificial Moral Agents in the LLM Era
Matthew E. Brophy
181
0
0
17 Jul 2025
Many LLMs Are More Utilitarian Than One
Many LLMs Are More Utilitarian Than One
Anita Keshmirian
Razan Baltaji
Babak Hemmatian
Hadi Asghari
Lav Varshney
LLMAG
271
3
0
01 Jul 2025
A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures
A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures
Dezhang Kong
Shi Lin
Zhenhua Xu
Z. J. Wang
Minghao Li
...
Ningyu Zhang
Chaochao Chen
Chunming Wu
Muhammad Khurram Khan
Meng Han
LLMAG
452
45
0
24 Jun 2025
Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives
Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives
Wei Zeng
Hengshu Zhu
Chuan Qin
Han Wu
Yihang Cheng
...
Xiaowei Jin
Yinuo Shen
Zhenxing Wang
Feimin Zhong
Hui Xiong
AI4TS
530
0
0
11 Jun 2025
Who Gets the Kidney? Human-AI Alignment, Indecision, and Moral Values
Who Gets the Kidney? Human-AI Alignment, Indecision, and Moral Values
John P. Dickerson
Hadi Hosseini
Samarth Khanna
Leona Pierce
251
0
0
30 May 2025
When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas
When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas
Steffen Backmann
David Guzman Piedrahita
Emanuel Tewolde
Amélie Reymond
Bernhard Schölkopf
Zhijing Jin
402
8
0
25 May 2025
Interpretable Risk Mitigation in LLM Agent Systems
Interpretable Risk Mitigation in LLM Agent Systems
Jan Chojnacki
LLMAG
506
4
0
15 May 2025
Assessing the Potential of Generative Agents in Crowdsourced Fact-Checking
Assessing the Potential of Generative Agents in Crowdsourced Fact-Checking
Luigia Costabile
Gian Marco Orlando
V. Gatta
V. Moscato
369
11
0
24 Apr 2025
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
Ayoung Lee
Ryan Sungmo Kwon
Peter Railton
Lu Wang
ELM
590
4
0
15 Apr 2025
Efficient Reinforcement Learning with Large Language Model Priors
Efficient Reinforcement Learning with Large Language Model Priors
Xue Yan
Yan Song
Xidong Feng
Mengyue Yang
Haifeng Zhang
Haitham Bou Ammar
Jun Wang
OffRL
287
27
0
10 Oct 2024
Collective Constitutional AI: Aligning a Language Model with Public
  Input
Collective Constitutional AI: Aligning a Language Model with Public Input
Saffron Huang
Divya Siddarth
Liane Lovitt
Thomas I. Liao
Esin Durmus
Alex Tamkin
Deep Ganguli
ELM
451
155
0
12 Jun 2024
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language
  Models
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
Yadong Zhang
Shaoguang Mao
Tao Ge
Xun Wang
Adrian de Wynter
Yan Xia
Wenshan Wu
Ting Song
Man Lan
Furu Wei
LRM
420
111
0
01 Apr 2024
Scaling Instructable Agents Across Many Simulated Worlds
Scaling Instructable Agents Across Many Simulated Worlds
Sima Team
Maria Abi Raad
Arun Ahuja
Catarina Barros
F. Besse
...
Daan Wierstra
Duncan Williams
Nathaniel Wong
Sarah York
Nick Young
LM&Ro
439
73
0
13 Mar 2024
Dynamics of Moral Behavior in Heterogeneous Populations of Learning
  Agents
Dynamics of Moral Behavior in Heterogeneous Populations of Learning Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
363
5
0
07 Mar 2024
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via
  Game-Theoretic Evaluations
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
Jinhao Duan
Renming Zhang
James Diffenderfer
B. Kailkhura
Lichao Sun
Elias Stengel-Eskin
Mohit Bansal
Tianlong Chen
Kaidi Xu
ELMLRM
404
107
0
19 Feb 2024
(Ir)rationality and Cognitive Biases in Large Language Models
(Ir)rationality and Cognitive Biases in Large Language Models
Olivia Macmillan-Scott
Mirco Musolesi
LRM
325
41
0
14 Feb 2024
A Roadmap to Pluralistic Alignment
A Roadmap to Pluralistic Alignment
Taylor Sorensen
Jared Moore
Jillian R. Fisher
Mitchell L. Gordon
Niloofar Mireshghallah
...
Liwei Jiang
Ximing Lu
Nouha Dziri
Tim Althoff
Yejin Choi
457
173
0
07 Feb 2024
Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
Luca Beurer-Kellner
Marc Fischer
Martin Vechev
435
87
0
07 Feb 2024
Can Large Language Models Serve as Rational Players in Game Theory? A
  Systematic Analysis
Can Large Language Models Serve as Rational Players in Game Theory? A Systematic AnalysisAAAI Conference on Artificial Intelligence (AAAI), 2023
Caoyun Fan
Jindou Chen
Yaohui Jin
Hao He
294
115
0
09 Dec 2023
Generative agent-based modeling with actions grounded in physical,
  social, or digital space using Concordia
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia
A. Vezhnevets
J. Agapiou
Avia Aharon
Ron Ziv
Jayd Matyas
Edgar A. Duénez-Guzmán
William A. Cunningham
Simon Osindero
Danny Karmon
Joel Z Leibo
LLMAGLM&RoAI4CE
398
99
0
06 Dec 2023
Moral Foundations of Large Language Models
Moral Foundations of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Marwa Abdulhai
Gregory Serapio-Garcia
Clément Crepy
Daria Valter
John Canny
Natasha Jaques
LRM
312
90
0
23 Oct 2023
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
1.2K
657
0
20 Oct 2023
Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas Griffiths
LLMAGLM&Ro
755
331
0
05 Sep 2023
Taken out of context: On measuring situational awareness in LLMs
Taken out of context: On measuring situational awareness in LLMs
Lukas Berglund
Asa Cooper Stickland
Mikita Balesni
Max Kaufmann
Meg Tong
Tomasz Korbak
Daniel Kokotajlo
Owain Evans
LLMAGLRM
248
116
0
01 Sep 2023
A Survey on Large Language Model based Autonomous Agents
A Survey on Large Language Model based Autonomous Agents
Lei Wang
Chengbang Ma
Xueyang Feng
Zeyu Zhang
Hao-ran Yang
...
Xu Chen
Yankai Lin
Wayne Xin Zhao
Zhewei Wei
Ji-Rong Wen
LLMAGAI4CELM&Ro
843
2,553
0
22 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALMOffRL
464
799
0
27 Jul 2023
From Word Models to World Models: Translating from Natural Language to
  the Probabilistic Language of Thought
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
L. Wong
Gabriel Grand
Alexander K. Lew
Noah D. Goodman
Vikash K. Mansinghka
Jacob Andreas
J. Tenenbaum
LRMAI4CE
281
139
0
22 Jun 2023
Strategic Reasoning with Language Models
Strategic Reasoning with Language Models
Kanishk Gandhi
Dorsa Sadigh
Noah D. Goodman
LM&RoLRM
234
60
0
30 May 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
1.1K
7,889
0
29 May 2023
Training Socially Aligned Language Models on Simulated Social
  Interactions
Training Socially Aligned Language Models on Simulated Social InteractionsInternational Conference on Learning Representations (ICLR), 2023
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
405
94
0
26 May 2023
Playing repeated games with Large Language Models
Playing repeated games with Large Language ModelsNature Human Behaviour (Nat Hum Behav), 2023
Elif Akata
Lion Schulz
Julian Coda-Forno
Seong Joon Oh
Matthias Bethge
Eric Schulz
1.3K
228
0
26 May 2023
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang
Yuqi Xie
Yunfan Jiang
Ajay Mandlekar
Chaowei Xiao
Yuke Zhu
Linxi Fan
Anima Anandkumar
LM&RoSyDa
688
1,429
0
25 May 2023
Role-Play with Large Language Models
Role-Play with Large Language ModelsNature (Nature), 2023
Murray Shanahan
Kyle McDonell
Laria Reynolds
LLMAG
269
505
0
25 May 2023
Gorilla: Large Language Model Connected with Massive APIs
Gorilla: Large Language Model Connected with Massive APIsNeural Information Processing Systems (NeurIPS), 2023
Shishir G. Patil
Tianjun Zhang
Xin Wang
Joseph E. Gonzalez
ELMCLLALMSyDa
548
1,039
0
24 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human BehaviorACM Symposium on User Interface Software and Technology (UIST), 2023
Cristina Mata
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Abigail Z. Jacobs
Michael S. Bernstein
LM&RoAI4CE
1.1K
3,613
0
07 Apr 2023
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging
  Face
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging FaceNeural Information Processing Systems (NeurIPS), 2023
Yongliang Shen
Kaitao Song
Xu Tan
Dongsheng Li
Weiming Lu
Yueting Zhuang
MLLM
1.3K
1,371
0
30 Mar 2023
Reflexion: Language Agents with Verbal Reinforcement Learning
Reflexion: Language Agents with Verbal Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Noah Shinn
Federico Cassano
Beck Labash
A. Gopinath
Karthik Narasimhan
Shunyu Yao
LLMAGKELM
932
2,945
0
20 Mar 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
5.3K
23,506
0
15 Mar 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use ToolsNeural Information Processing Systems (NeurIPS), 2023
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDaRALM
691
3,323
0
09 Feb 2023
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement
  Learning
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
400
28
0
20 Jan 2023
The political ideology of conversational AI: Converging evidence on
  ChatGPT's pro-environmental, left-libertarian orientation
The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientationSocial Science Research Network (SSRN), 2023
Jochen Hartmann
Jasper Schwenzow
Maximilian Witte
330
307
0
05 Jan 2023
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDaMoMe
1.5K
2,709
0
15 Dec 2022
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language ModelsInternational Conference on Learning Representations (ICLR), 2022
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAGReLMLRM
3.4K
6,822
0
06 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALMAAML
639
660
0
28 Sep 2022
12
Next
Page 1 of 2