ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.03279
  4. Cited By
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
  and Ethical Behavior in the MACHIAVELLI Benchmark
v1v2v3v4 (latest)

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

International Conference on Machine Learning (ICML), 2023
6 April 2023
Alexander Pan
Chan Jun Shern
Andy Zou
Nathaniel Li
Steven Basart
Thomas Woodside
Jonathan Ng
Hanlin Zhang
Scott Emmons
Dan Hendrycks
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark"

2 / 52 papers shown
Title
SWAN: A Generic Framework for Auditing Textual Conversational Systems
SWAN: A Generic Framework for Auditing Textual Conversational Systems
T. Sakai
93
10
0
15 May 2023
The Alignment Problem from a Deep Learning Perspective
The Alignment Problem from a Deep Learning PerspectiveInternational Conference on Learning Representations (ICLR), 2022
Richard Ngo
Lawrence Chan
Sören Mindermann
435
243
0
30 Aug 2022
Previous
12