ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.09013
  4. Cited By
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
v1v2 (latest)

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents

North American Chapter of the Association for Computational Linguistics (NAACL), 2024
13 September 2024
Zhe Su
Xuhui Zhou
Sanketh Rangreji
Anubha Kabra
Julia Mendelsohn
Faeze Brahman
Maarten Sap
    LLMAG
ArXiv (abs)PDFHTMLGithub (291★)

Papers citing "AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents"

15 / 15 papers shown
Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning
Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning
Zhonghao He
Tianyi Qiu
Hirokazu Shirado
Maarten Sap
LRM
144
4
0
02 Dec 2025
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
Tong Chen
Akari Asai
Luke Zettlemoyer
Hannaneh Hajishirzi
Faeze Brahman
OffRLHILMLRM
244
1
0
20 Oct 2025
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
Yao Huang
Yitong Sun
Yichi Zhang
Ruochen Zhang
Yinpeng Dong
Xingxing Wei
217
5
0
17 Oct 2025
Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL
Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL
Marwa Abdulhai
Ryan Cheng
Aryansh Shrivastava
Natasha Jaques
Y. Gal
Sergey Levine
126
2
0
16 Oct 2025
Agentic Misalignment: How LLMs Could Be Insider Threats
Agentic Misalignment: How LLMs Could Be Insider Threats
Aengus Lynch
Benjamin Wright
Caleb Larson
Stuart Ritchie
Sören Mindermann
Ethan Perez
Kevin K. Troy
Evan Hubinger
198
81
0
05 Oct 2025
Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence
Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence
Myra Cheng
Cinoo Lee
Pranav Khadpe
Sunny Yu
Dyllan Han
Dan Jurafsky
175
21
0
01 Oct 2025
Generative Value Conflicts Reveal LLM Priorities
Generative Value Conflicts Reveal LLM Priorities
Andy Liu
Kshitish Ghate
Mona Diab
Daniel Fried
Atoosa Kasirzadeh
Max Kleiman-Weiner
262
6
0
29 Sep 2025
Generalizability of Large Language Model-Based Agents: A Comprehensive Survey
Generalizability of Large Language Model-Based Agents: A Comprehensive Survey
Minxing Zhang
Yi Yang
Roy Xie
Bhuwan Dhingra
Shuyan Zhou
Jian Pei
LLMAGLM&RoAI4CE
253
4
0
19 Sep 2025
Can LLMs Lie? Investigation beyond Hallucination
Can LLMs Lie? Investigation beyond Hallucination
Haoran Huan
Mihir Prabhudesai
Mengning Wu
Shantanu Jaiswal
Deepak Pathak
HILM
206
2
0
03 Sep 2025
Social World Models
Social World Models
Xuhui Zhou
Jiarui Liu
Akhila Yerukola
Hyunwoo J. Kim
Maarten Sap
187
2
0
30 Aug 2025
MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
Xinyan Jiang
L. Zhang
Jiayi Zhang
Qingsong Yang
Guimin Hu
Di Wang
Lijie Hu
LLMSV
470
8
0
14 Aug 2025
Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task
Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task
Jared Moore
Ned Cooper
Rasmus Overmark
Beba Cibralic
Nick Haber
Cameron R. Jones
LLMAGLRM
222
3
0
22 Jul 2025
PRISON: Unmasking the Criminal Potential of Large Language Models
PRISON: Unmasking the Criminal Potential of Large Language Models
Xinyi Wu
Geng Hong
Pei Chen
Yueyue Chen
Xudong Pan
Min Yang
334
3
0
19 Jun 2025
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Richard Ren
Arunim Agarwal
Mantas Mazeika
Cristina Menghini
Robert Vacareanu
...
Matias Geralnik
Adam Khoja
Dean Lee
Summer Yue
Dan Hendrycks
HILMALM
548
28
0
05 Mar 2025
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELMALMLM&MA
550
80
0
09 Jun 2024
1
Page 1 of 1