ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.06560
  4. Cited By
Inverse Constitutional AI: Compressing Preferences into Principles

Inverse Constitutional AI: Compressing Preferences into Principles

2 June 2024
Arduin Findeis
Timo Kaufmann
Eyke Hüllermeier
Samuel Albanie
Robert Mullins
    SyDa
ArXivPDFHTML

Papers citing "Inverse Constitutional AI: Compressing Preferences into Principles"

11 / 11 papers shown
Title
Dataset Featurization: Uncovering Natural Language Features through Unsupervised Data Reconstruction
Michal Bravansky
Vaclav Kubon
Suhas Hariharan
Robert Kirk
50
0
0
24 Feb 2025
AI Alignment at Your Discretion
AI Alignment at Your Discretion
Maarten Buyl
Hadi Khalaf
C. M. Verdun
Lucas Monteiro Paes
Caio Vieira Machado
Flavio du Pin Calmon
26
0
0
10 Feb 2025
IntentGPT: Few-shot Intent Discovery with Large Language Models
IntentGPT: Few-shot Intent Discovery with Large Language Models
Juan A. Rodriguez
Nicholas Botzer
David Vazquez
Christopher Pal
M. Pedersoli
I. Laradji
VLM
52
1
0
16 Nov 2024
Chain of Alignment: Integrating Public Will with Expert Intelligence for
  Language Model Alignment
Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment
Andrew Konya
Aviv Ovadya
K. J. Kevin Feng
Quan Ze Chen
Lisa Schirch
Colin Irwin
Amy X. Zhang
ALM
42
0
0
15 Nov 2024
Policy Prototyping for LLMs: Pluralistic Alignment via Interactive and
  Collaborative Policymaking
Policy Prototyping for LLMs: Pluralistic Alignment via Interactive and Collaborative Policymaking
K. J. Kevin Feng
Inyoung Cheong
Quan Ze Chen
Amy X. Zhang
31
0
0
13 Sep 2024
Self-Directed Synthetic Dialogues and Revisions Technical Report
Self-Directed Synthetic Dialogues and Revisions Technical Report
Nathan Lambert
Hailey Schoelkopf
Aaron Gokaslan
Luca Soldaini
Valentina Pyatkin
Louis Castricato
SyDa
35
2
0
25 Jul 2024
ValueScope: Unveiling Implicit Norms and Values via Return Potential
  Model of Social Interactions
ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions
Chan Young Park
Shuyue Stella Li
Hayoung Jung
Svitlana Volkova
Tanushree Mitra
David Jurgens
Yulia Tsvetkov
36
5
0
02 Jul 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
Humans or LLMs as the Judge? A Study on Judgement Biases
Guiming Hardy Chen
Shunian Chen
Ziche Liu
Feng Jiang
Benyou Wang
56
89
0
16 Feb 2024
Specific versus General Principles for Constitutional AI
Specific versus General Principles for Constitutional AI
Sandipan Kundu
Yuntao Bai
Saurav Kadavath
Amanda Askell
Andrew Callahan
...
Zac Hatfield-Dodds
Sören Mindermann
Nicholas Joseph
Sam McCandlish
Jared Kaplan
AILaw
54
24
0
20 Oct 2023
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
207
178
0
20 Oct 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1