Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2410.01639
Cited By
v1
v2
v3
v4 (latest)
Moral Alignment for LLM Agents
International Conference on Learning Representations (ICLR), 2024
2 October 2024
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
Re-assign community
ArXiv (abs)
PDF
HTML
Github (9★)
Papers citing
"Moral Alignment for LLM Agents"
50 / 69 papers shown
Black-Box Guardrail Reverse-engineering Attack
Hongwei Yao
Yun Xia
Shuo Shao
Haoran Shi
Tong Qiao
C. Wang
AAML
243
0
0
06 Nov 2025
Accumulating Context Changes the Beliefs of Language Models
Jiayi Geng
Howard Chen
Ryan Liu
Manoel Horta Ribeiro
Robb Willer
Graham Neubig
Thomas L. Griffiths
KELM
532
7
0
03 Nov 2025
Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning
P. Migliarini
Mashal Afzal Memon
Marco Autili
P. Inverardi
LRM
111
0
0
01 Oct 2025
MoVa: Towards Generalizable Classification of Human Morals and Values
Ziyu Chen
Junfei Sun
Chenxi Li
Tuan Dung Nguyen
Jing Yao
Xiaoyuan Yi
Xing Xie
Chenhao Tan
Lexing Xie
140
5
0
29 Sep 2025
Survival at Any Cost? LLMs and the Choice Between Self-Preservation and Human Harm
Alireza Mohamadi
Ali Yavari
141
1
0
15 Sep 2025
From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users
Sadia Sultana Chowa
Riasad Alvi
Subhey Sadi Rahman
M. R
M. R
M. Islam
Mukhtar Hussain
Sami Azam
LLMAG
LM&Ro
ELM
396
20
0
24 Aug 2025
Black Box Deployed -- Functional Criteria for Artificial Moral Agents in the LLM Era
Matthew E. Brophy
181
0
0
17 Jul 2025
Many LLMs Are More Utilitarian Than One
Anita Keshmirian
Razan Baltaji
Babak Hemmatian
Hadi Asghari
Lav Varshney
LLMAG
271
3
0
01 Jul 2025
A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures
Dezhang Kong
Shi Lin
Zhenhua Xu
Z. J. Wang
Minghao Li
...
Ningyu Zhang
Chaochao Chen
Chunming Wu
Muhammad Khurram Khan
Meng Han
LLMAG
452
45
0
24 Jun 2025
Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives
Wei Zeng
Hengshu Zhu
Chuan Qin
Han Wu
Yihang Cheng
...
Xiaowei Jin
Yinuo Shen
Zhenxing Wang
Feimin Zhong
Hui Xiong
AI4TS
530
0
0
11 Jun 2025
Who Gets the Kidney? Human-AI Alignment, Indecision, and Moral Values
John P. Dickerson
Hadi Hosseini
Samarth Khanna
Leona Pierce
251
0
0
30 May 2025
When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas
Steffen Backmann
David Guzman Piedrahita
Emanuel Tewolde
Amélie Reymond
Bernhard Schölkopf
Zhijing Jin
402
8
0
25 May 2025
Interpretable Risk Mitigation in LLM Agent Systems
Jan Chojnacki
LLMAG
506
4
0
15 May 2025
Assessing the Potential of Generative Agents in Crowdsourced Fact-Checking
Luigia Costabile
Gian Marco Orlando
V. Gatta
V. Moscato
369
11
0
24 Apr 2025
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
Ayoung Lee
Ryan Sungmo Kwon
Peter Railton
Lu Wang
ELM
590
4
0
15 Apr 2025
Efficient Reinforcement Learning with Large Language Model Priors
Xue Yan
Yan Song
Xidong Feng
Mengyue Yang
Haifeng Zhang
Haitham Bou Ammar
Jun Wang
OffRL
287
27
0
10 Oct 2024
Collective Constitutional AI: Aligning a Language Model with Public Input
Saffron Huang
Divya Siddarth
Liane Lovitt
Thomas I. Liao
Esin Durmus
Alex Tamkin
Deep Ganguli
ELM
451
155
0
12 Jun 2024
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
Yadong Zhang
Shaoguang Mao
Tao Ge
Xun Wang
Adrian de Wynter
Yan Xia
Wenshan Wu
Ting Song
Man Lan
Furu Wei
LRM
420
111
0
01 Apr 2024
Scaling Instructable Agents Across Many Simulated Worlds
Sima Team
Maria Abi Raad
Arun Ahuja
Catarina Barros
F. Besse
...
Daan Wierstra
Duncan Williams
Nathaniel Wong
Sarah York
Nick Young
LM&Ro
439
73
0
13 Mar 2024
Dynamics of Moral Behavior in Heterogeneous Populations of Learning Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
363
5
0
07 Mar 2024
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
Jinhao Duan
Renming Zhang
James Diffenderfer
B. Kailkhura
Lichao Sun
Elias Stengel-Eskin
Mohit Bansal
Tianlong Chen
Kaidi Xu
ELM
LRM
404
107
0
19 Feb 2024
(Ir)rationality and Cognitive Biases in Large Language Models
Olivia Macmillan-Scott
Mirco Musolesi
LRM
325
41
0
14 Feb 2024
A Roadmap to Pluralistic Alignment
Taylor Sorensen
Jared Moore
Jillian R. Fisher
Mitchell L. Gordon
Niloofar Mireshghallah
...
Liwei Jiang
Ximing Lu
Nouha Dziri
Tim Althoff
Yejin Choi
457
173
0
07 Feb 2024
Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
Luca Beurer-Kellner
Marc Fischer
Martin Vechev
435
87
0
07 Feb 2024
Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis
AAAI Conference on Artificial Intelligence (AAAI), 2023
Caoyun Fan
Jindou Chen
Yaohui Jin
Hao He
294
115
0
09 Dec 2023
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia
A. Vezhnevets
J. Agapiou
Avia Aharon
Ron Ziv
Jayd Matyas
Edgar A. Duénez-Guzmán
William A. Cunningham
Simon Osindero
Danny Karmon
Joel Z Leibo
LLMAG
LM&Ro
AI4CE
398
99
0
06 Dec 2023
Moral Foundations of Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Marwa Abdulhai
Gregory Serapio-Garcia
Clément Crepy
Daria Valter
John Canny
Natasha Jaques
LRM
312
90
0
23 Oct 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
1.2K
657
0
20 Oct 2023
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas Griffiths
LLMAG
LM&Ro
755
331
0
05 Sep 2023
Taken out of context: On measuring situational awareness in LLMs
Lukas Berglund
Asa Cooper Stickland
Mikita Balesni
Max Kaufmann
Meg Tong
Tomasz Korbak
Daniel Kokotajlo
Owain Evans
LLMAG
LRM
248
116
0
01 Sep 2023
A Survey on Large Language Model based Autonomous Agents
Lei Wang
Chengbang Ma
Xueyang Feng
Zeyu Zhang
Hao-ran Yang
...
Xu Chen
Yankai Lin
Wayne Xin Zhao
Zhewei Wei
Ji-Rong Wen
LLMAG
AI4CE
LM&Ro
843
2,553
0
22 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
464
799
0
27 Jul 2023
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
L. Wong
Gabriel Grand
Alexander K. Lew
Noah D. Goodman
Vikash K. Mansinghka
Jacob Andreas
J. Tenenbaum
LRM
AI4CE
281
139
0
22 Jun 2023
Strategic Reasoning with Language Models
Kanishk Gandhi
Dorsa Sadigh
Noah D. Goodman
LM&Ro
LRM
234
60
0
30 May 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Neural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
1.1K
7,889
0
29 May 2023
Training Socially Aligned Language Models on Simulated Social Interactions
International Conference on Learning Representations (ICLR), 2023
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
405
94
0
26 May 2023
Playing repeated games with Large Language Models
Nature Human Behaviour (Nat Hum Behav), 2023
Elif Akata
Lion Schulz
Julian Coda-Forno
Seong Joon Oh
Matthias Bethge
Eric Schulz
1.3K
228
0
26 May 2023
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang
Yuqi Xie
Yunfan Jiang
Ajay Mandlekar
Chaowei Xiao
Yuke Zhu
Linxi Fan
Anima Anandkumar
LM&Ro
SyDa
688
1,429
0
25 May 2023
Role-Play with Large Language Models
Nature (Nature), 2023
Murray Shanahan
Kyle McDonell
Laria Reynolds
LLMAG
269
505
0
25 May 2023
Gorilla: Large Language Model Connected with Massive APIs
Neural Information Processing Systems (NeurIPS), 2023
Shishir G. Patil
Tianjun Zhang
Xin Wang
Joseph E. Gonzalez
ELM
CLL
ALM
SyDa
548
1,039
0
24 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
ACM Symposium on User Interface Software and Technology (UIST), 2023
Cristina Mata
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Abigail Z. Jacobs
Michael S. Bernstein
LM&Ro
AI4CE
1.1K
3,613
0
07 Apr 2023
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
Neural Information Processing Systems (NeurIPS), 2023
Yongliang Shen
Kaitao Song
Xu Tan
Dongsheng Li
Weiming Lu
Yueting Zhuang
MLLM
1.3K
1,371
0
30 Mar 2023
Reflexion: Language Agents with Verbal Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2023
Noah Shinn
Federico Cassano
Beck Labash
A. Gopinath
Karthik Narasimhan
Shunyu Yao
LLMAG
KELM
932
2,945
0
20 Mar 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
5.3K
23,506
0
15 Mar 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Neural Information Processing Systems (NeurIPS), 2023
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDa
RALM
691
3,323
0
09 Feb 2023
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
400
28
0
20 Jan 2023
The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation
Social Science Research Network (SSRN), 2023
Jochen Hartmann
Jasper Schwenzow
Maximilian Witte
330
307
0
05 Jan 2023
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
1.5K
2,709
0
15 Dec 2022
ReAct: Synergizing Reasoning and Acting in Language Models
International Conference on Learning Representations (ICLR), 2022
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
3.4K
6,822
0
06 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
639
660
0
28 Sep 2022
1
2
Next
Page 1 of 2