Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.20087
Cited By
ProgressGym: Alignment with a Millennium of Moral Progress
28 June 2024
Tianyi Qiu
Yang Zhang
Xuchuan Huang
Jasmine Xinze Li
Jiaming Ji
Yaodong Yang
AI4TS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ProgressGym: Alignment with a Millennium of Moral Progress"
11 / 11 papers shown
Title
Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society
Feifei Zhao
Y. Wang
Enmeng Lu
Dongcheng Zhao
Bing Han
...
Chao Liu
Yaodong Yang
Yi Zeng
Boyuan Chen
Jinyu Fan
80
0
0
24 Apr 2025
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
Ayoung Lee
Ryan Sungmo Kwon
Peter Railton
Lu Wang
ELM
45
0
0
15 Apr 2025
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Zhaowei Zhang
Fengshuo Bai
Qizhi Chen
Chengdong Ma
Mingzhi Wang
Haoran Sun
Zilong Zheng
Yaodong Yang
54
3
0
26 Feb 2025
Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play
Yifan Zeng
Liang Kairong
Fangzhou Dong
Peijia Zheng
46
0
0
26 Oct 2024
Language Models as Critical Thinking Tools: A Case Study of Philosophers
Andre Ye
Jared Moore
Rose Novick
Amy X. Zhang
KELM
ELM
LRM
LLMAG
23
5
0
06 Apr 2024
Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
Xianghe Pang
Shuo Tang
Rui Ye
Yuxin Xiong
Bolun Zhang
Yanfeng Wang
Siheng Chen
108
27
0
08 Feb 2024
Heterogeneous Value Alignment Evaluation for Large Language Models
Zhaowei Zhang
Ceyao Zhang
N. Liu
Siyuan Qi
Ziqi Rong
Song-Chun Zhu
Shuguang Cui
Yaodong Yang
43
6
0
26 May 2023
On the Risk of Misinformation Pollution with Large Language Models
Yikang Pan
Liangming Pan
Wenhu Chen
Preslav Nakov
Min-Yen Kan
W. Wang
DeLMO
190
105
0
23 May 2023
Large Language Model Programs
Imanol Schlag
Sainbayar Sukhbaatar
Asli Celikyilmaz
Wen-tau Yih
Jason Weston
Jürgen Schmidhuber
Xian Li
LRM
29
14
0
09 May 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
161
268
0
28 Sep 2021
1