v1v2 (latest)

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents

North American Chapter of the Association for Computational Linguistics (NAACL), 2024

13 September 2024

ArXiv (abs)PDF HTML Github (291★)

Papers citing "AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents"

15 / 15 papers shown

Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning

144

02 Dec 2025

Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations

244

20 Oct 2025

DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios

217

17 Oct 2025

Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL

126

16 Oct 2025

Agentic Misalignment: How LLMs Could Be Insider Threats

198

05 Oct 2025

Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence

175

01 Oct 2025

Generative Value Conflicts Reveal LLM Priorities

262

29 Sep 2025

Generalizability of Large Language Model-Based Agents: A Comprehensive Survey

253

19 Sep 2025

Can LLMs Lie? Investigation beyond Hallucination

206

03 Sep 2025

187

30 Aug 2025

MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models

470

14 Aug 2025

Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task

222

22 Jul 2025

PRISON: Unmasking the Criminal Potential of Large Language Models

334

19 Jun 2025

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

...

548

05 Mar 2025

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

...

Sean Welleck

Graham Neubig

Moontae Lee

Kyungjae Lee

Minjoon Seo

ELM ALM LM&MA

550

09 Jun 2024