Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2112.00861
Cited By
v1
v2
v3 (latest)
A General Language Assistant as a Laboratory for Alignment
1 December 2021
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
T. Henighan
Andy Jones
Nicholas Joseph
Benjamin Mann
Nova Dassarma
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
John Kernion
Kamal Ndousse
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"A General Language Assistant as a Laboratory for Alignment"
50 / 701 papers shown
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
272
190
0
09 Nov 2023
Unveiling Safety Vulnerabilities of Large Language Models
George Kour
Marcel Zalmanovici
Naama Zwerdling
Esther Goldbraich
Ora Nova Fandina
Ateret Anaby-Tavor
Orna Raz
E. Farchi
AAML
254
32
0
07 Nov 2023
FinGPT: Large Generative Models for a Small Language
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Risto Luukkonen
Ville Komulainen
Jouni Luoma
Anni Eskelinen
Jenna Kanerva
...
Mikko Merioksa
Jyrki Heinonen
Aija Vahtola
Samuel Antao
S. Pyysalo
LM&MA
185
63
0
03 Nov 2023
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Nathan Lambert
Roberto Calandra
ALM
465
39
0
31 Oct 2023
Automatic Evaluation of Generative Models with Instruction Tuning
IEEE Games Entertainment Media Conference (IEEE GEM), 2023
Shuhaib Mehri
Vered Shwartz
ELM
ALM
136
2
0
30 Oct 2023
Personas as a Way to Model Truthfulness in Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Nitish Joshi
Javier Rando
Abulhair Saparov
Najoung Kim
He He
HILM
395
40
0
27 Oct 2023
Unpacking the Ethical Value Alignment in Big Models
Xiaoyuan Yi
Jing Yao
Xiting Wang
Xing Xie
185
17
0
26 Oct 2023
SuperHF: Supervised Iterative Learning from Human Feedback
Gabriel Mukobi
Peter Chatain
Su Fong
Robert Windesheim
Gitta Kutyniok
Kush S. Bhatia
Silas Alberti
ALM
262
12
0
25 Oct 2023
OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models
Mingfeng Xue
Dayiheng Liu
Kexin Yang
Guanting Dong
Wenqiang Lei
Zheng Yuan
Chang Zhou
Jingren Zhou
LLMAG
176
3
0
25 Oct 2023
AI Alignment and Social Choice: Fundamental Limitations and Policy Implications
Social Science Research Network (SSRN), 2023
Abhilash Mishra
92
34
0
24 Oct 2023
Self-Guard: Empower the LLM to Safeguard Itself
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Zezhong Wang
Fangkai Yang
Lu Wang
Lu Wang
Hongru Wang
Liang Chen
Qingwei Lin
Kam-Fai Wong
270
57
0
24 Oct 2023
Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yanchen Liu
Srishti Gautam
Jiaqi Ma
Himabindu Lakkaraju
LMTD
218
20
0
23 Oct 2023
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Xinlu Zhang
Chenxin Tian
Xianjun Yang
Lichang Chen
Zekun Li
Linda R. Petzold
LM&MA
460
86
0
23 Oct 2023
From the Pursuit of Universal AGI Architecture to Systematic Approach to Heterogenous AGI: Addressing Alignment, Energy, & AGI Grand Challenges
International Journal of Semantic Computing (IJSC), 2023
Eren Kurshan
413
0
0
23 Oct 2023
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
M. Boubdir
Edward Kim
Beyza Ermis
Marzieh Fadaee
Sara Hooker
ALM
283
21
0
22 Oct 2023
Values, Ethics, Morals? On the Use of Moral Concepts in NLP Research
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Karina Vida
Judith Simon
Anne Lauscher
243
21
0
21 Oct 2023
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai
Xuehai Pan
Ruiyang Sun
Jiaming Ji
Xinbo Xu
Mickel Liu
Yizhou Wang
Yaodong Yang
399
537
0
19 Oct 2023
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
International Conference on Learning Representations (ICLR), 2023
Rui Zheng
Wei Shen
Yuan Hua
Wenbin Lai
Jiajun Sun
...
Xiao Wang
Haoran Huang
Tao Gui
Tao Gui
Xuanjing Huang
285
22
0
18 Oct 2023
Group Preference Optimization: Few-Shot Alignment of Large Language Models
International Conference on Learning Representations (ICLR), 2023
Siyan Zhao
John Dang
Aditya Grover
345
46
0
17 Oct 2023
RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior Mechanisms
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Enyu Zhou
Rui Zheng
Zhiheng Xi
Songyang Gao
Xiaoran Fan
Zichu Fei
Jingting Ye
Tao Gui
Tao Gui
Xuanjing Huang
159
5
0
17 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Weijing Chen
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
441
68
0
16 Oct 2023
The Consensus Game: Language Model Generation via Equilibrium Search
Athul Paul Jacob
Songlin Yang
Gabriele Farina
Jacob Andreas
246
34
0
13 Oct 2023
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Neural Information Processing Systems (NeurIPS), 2023
Maya Okawa
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
CoGe
DiffM
498
83
0
13 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
International Conference on Learning Representations (ICLR), 2023
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALM
LM&MA
ELM
522
372
0
12 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
International Conference on Learning Representations (ICLR), 2023
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELM
ALM
412
264
0
11 Oct 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
358
63
0
11 Oct 2023
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
International Conference on Learning Representations (ICLR), 2023
Yangsibo Huang
Samyak Gupta
Mengzhou Xia
Kai Li
Danqi Chen
AAML
255
405
0
10 Oct 2023
MetaAgents: Large Language Model Based Agents for Decision-Making on Teaming
Yuan Li
Lichao Sun
Yixuan Zhang
LLMAG
LM&Ro
383
119
0
10 Oct 2023
SALMON: Self-Alignment with Instructable Reward Models
International Conference on Learning Representations (ICLR), 2023
Zhiqing Sun
Songlin Yang
Hongxin Zhang
Qinhong Zhou
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
ALM
SyDa
353
53
0
09 Oct 2023
A Closer Look into Automatic Evaluation Using Large Language Models
Cheng-Han Chiang
Hunghuei Lee
ELM
ALM
LM&MA
142
18
0
09 Oct 2023
Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for Autonomous LLM-powered Multi-Agent Architectures
Thorsten Händler
LLMAG
206
35
0
05 Oct 2023
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
385
9
0
03 Oct 2023
Ask Again, Then Fail: Large Language Models' Vacillations in Judgment
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Qiming Xie
Zengzhi Wang
Yi Feng
Rui Xia
AAML
HILM
653
12
0
03 Oct 2023
Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Saurabh Srivastava
Chengyue Huang
Weiguo Fan
Ziyu Yao
LLMAG
280
9
0
03 Oct 2023
Tool-Augmented Reward Modeling
International Conference on Learning Representations (ICLR), 2023
Lei Li
Yekun Chai
Shuohuan Wang
Yu Sun
Hao Tian
Ningyu Zhang
Hua Wu
OffRL
259
22
0
02 Oct 2023
Enabling Language Models to Implicitly Learn Self-Improvement
Ziqi Wang
Le Hou
Tianjian Lu
Yuexin Wu
Yunxuan Li
Hongkun Yu
Heng Ji
ReLM
LRM
279
9
0
02 Oct 2023
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning
International Conference on Learning Representations (ICLR), 2023
Mustafa Shukor
Alexandre Ramé
Corentin Dancette
Matthieu Cord
LRM
MLLM
428
26
0
01 Oct 2023
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
International Conference on Learning Representations (ICLR), 2023
Amita Gajewar
Paul Vicol
G. Bansal
David J Fleet
267
300
0
29 Sep 2023
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
797
3,067
0
28 Sep 2023
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Timothée Darcet
Yuyu Zhang
Yijie Zhu
Chenguang Xi
Pengyang Gao
Piotr Bojanowski
Kevin Chen-Chuan Chang
ELM
355
26
0
28 Sep 2023
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
International Conference on Learning Representations (ICLR), 2023
Simon Mahns
Yibo Jiang
Yuguang Yang
Han Liu
Yuxin Chen
255
145
0
28 Sep 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
251
28
0
28 Sep 2023
Large Language Model Alignment: A Survey
Shangda Wu
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
359
282
0
26 Sep 2023
Aligning Large Multimodal Models with Factually Augmented RLHF
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhiqing Sun
Sheng Shen
Shengcao Cao
Haotian Liu
Chunyuan Li
...
Liangyan Gui
Yu-Xiong Wang
Yiming Yang
Kurt Keutzer
Trevor Darrell
VLM
285
592
0
25 Sep 2023
Can LLM-Generated Misinformation Be Detected?
International Conference on Learning Representations (ICLR), 2023
Canyu Chen
Kai Shu
DeLMO
782
241
0
25 Sep 2023
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Baolin Peng
Linfeng Song
Ye Tian
Lifeng Jin
Haitao Mi
Dong Yu
190
21
0
18 Sep 2023
RAIN: Your Language Models Can Align Themselves without Finetuning
International Conference on Learning Representations (ICLR), 2023
Yuhui Li
Fangyun Wei
Jinjing Zhao
Chao Zhang
Hongyang R. Zhang
SILM
295
157
0
13 Sep 2023
Mitigating the Alignment Tax of RLHF
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Xingtai Lv
Tong Zhang
MoMe
CLL
666
129
0
12 Sep 2023
Everyone Deserves A Reward: Learning Customized Human Preferences
Pengyu Cheng
Jiawen Xie
Ke Bai
Yong Dai
Nan Du
213
43
0
06 Sep 2023
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Daoyuan Chen
Yilun Huang
Zhijian Ma
Hesen Chen
Xuchen Pan
...
Zhaoyang Liu
Jinyang Gao
Yaliang Li
Bolin Ding
Jingren Zhou
SyDa
VLM
297
59
0
05 Sep 2023
Previous
1
2
3
...
10
11
12
13
14
15
Next