Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.14659
Cited By
Alignment of Language Agents
26 March 2021
Zachary Kenton
Tom Everitt
Laura Weidinger
Iason Gabriel
Vladimir Mikulik
G. Irving
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Alignment of Language Agents"
21 / 21 papers shown
Title
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
108
2
0
24 Feb 2025
Multi-objective Reinforcement learning from AI Feedback
Marcus Williams
33
1
0
11 Jun 2024
ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback
Zhenyu Hou
Yiin Niu
Zhengxiao Du
Xiaohan Zhang
Xiao Liu
...
Qinkai Zheng
Minlie Huang
Hongning Wang
Jie Tang
Yuxiao Dong
ALM
22
17
0
01 Apr 2024
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
33
16
0
07 Mar 2024
When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks
Hao Peng
Xiaozhi Wang
Jianhui Chen
Weikai Li
Y. Qi
...
Zhili Wu
Kaisheng Zeng
Bin Xu
Lei Hou
Juanzi Li
24
28
0
15 Nov 2023
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation
Chen Dun
Mirian Hipolito Garcia
Guoqing Zheng
Ahmed Hassan Awadallah
Anastasios Kyrillidis
Robert Sim
74
6
0
04 Oct 2023
Deception Abilities Emerged in Large Language Models
Thilo Hagendorff
LLMAG
30
75
0
31 Jul 2023
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Viet Dac Lai
Chien Van Nguyen
Nghia Trung Ngo
Thuat Nguyen
Franck Dernoncourt
Ryan A. Rossi
Thien Huu Nguyen
ALM
40
128
0
29 Jul 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
24
504
0
07 Mar 2023
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
45
322
0
07 Dec 2022
Fine-tuning language models to find agreement among humans with diverse preferences
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
...
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
21
215
0
28 Nov 2022
Reward Gaming in Conditional Text Generation
Richard Yuanzhe Pang
Vishakh Padmakumar
Thibault Sellam
Ankur P. Parikh
He He
21
24
0
16 Nov 2022
In conversation with Artificial Intelligence: aligning language models with human values
Atoosa Kasirzadeh
Iason Gabriel
10
98
0
01 Sep 2022
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
...
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
25
49
0
16 Jun 2022
A Generalist Agent
Scott E. Reed
Konrad Zolna
Emilio Parisotto
Sergio Gomez Colmenarejo
Alexander Novikov
...
Yutian Chen
R. Hadsell
Oriol Vinyals
Mahyar Bordbar
Nando de Freitas
LM&Ro
LLMAG
AI4CE
54
783
0
12 May 2022
Mind the gap: Challenges of deep learning approaches to Theory of Mind
Jaan Aru
Aqeel Labash
Oriol Corcoll
Raul Vicente
15
26
0
30 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
308
11,909
0
04 Mar 2022
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
Irene Solaiman
Christy Dennison
16
221
0
18 Jun 2021
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
Alex Tamkin
Miles Brundage
Jack Clark
Deep Ganguli
AILaw
ELM
192
258
0
04 Feb 2021
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
206
616
0
03 Sep 2019
AI safety via debate
G. Irving
Paul Christiano
Dario Amodei
199
199
0
02 May 2018
1