TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent
Constitution

TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution

2 February 2024

Papers citing "TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution"

11 / 11 papers shown

Title
Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering Joshua Owotogbe LLMAG 52 0 0 06 May 2025
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models Bang Zhang Ruotian Ma Qingxuan Jiang Peisong Wang Jiaqi Chen ... Fanghua Ye Jian Li Yifan Yang Zhaopeng Tu Xiaolong Li LLMAG ELM ALM 97 25 1 01 May 2025
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models Bang An Shiyue Zhang Mark Dredze 54 0 0 25 Apr 2025
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents Hanrong Zhang Jingyuan Huang Kai Mei Yifei Yao Zhenting Wang Chenlu Zhan Hongwei Wang Yongfeng Zhang AAML LLMAG ELM 48 18 0 03 Oct 2024
War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars Wenyue Hua Lizhou Fan Lingyao Li Kai Mei Jianchao Ji Yingqiang Ge Libby Hemphill Yongfeng Zhang LM&Ro LLMAG 125 87 0 28 Nov 2023
Controlled Text Generation with Natural Language Instructions Wangchunshu Zhou Yuchen Eleanor Jiang Ethan Gotlieb Wilcox Ryan Cotterell Mrinmaya Sachan 152 84 0 27 Apr 2023
Generative Agents: Interactive Simulacra of Human Behavior J. Park Joseph C. O'Brien Carrie J. Cai Meredith Ringel Morris Percy Liang Michael S. Bernstein LM&Ro AI4CE 215 1,701 0 07 Apr 2023
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 225 495 0 28 Sep 2022
Housekeep: Tidying Virtual Households using Commonsense Reasoning Yash Kant Arun Ramachandran Sriram Yenamandra Igor Gilitschenski Dhruv Batra Andrew Szot Harsh Agrawal LM&Ro LRM 152 70 0 22 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,730 0 04 Mar 2022
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models Torsten Scholak Nathan Schucher Dzmitry Bahdanau 146 373 0 10 Sep 2021