ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.00861
  4. Cited By
A General Language Assistant as a Laboratory for Alignment
v1v2v3 (latest)

A General Language Assistant as a Laboratory for Alignment

1 December 2021
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
T. Henighan
Andy Jones
Nicholas Joseph
Benjamin Mann
Nova Dassarma
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
John Kernion
Kamal Ndousse
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
    ALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "A General Language Assistant as a Laboratory for Alignment"

50 / 701 papers shown
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLMVLM
272
190
0
09 Nov 2023
Unveiling Safety Vulnerabilities of Large Language Models
Unveiling Safety Vulnerabilities of Large Language Models
George Kour
Marcel Zalmanovici
Naama Zwerdling
Esther Goldbraich
Ora Nova Fandina
Ateret Anaby-Tavor
Orna Raz
E. Farchi
AAML
254
32
0
07 Nov 2023
FinGPT: Large Generative Models for a Small Language
FinGPT: Large Generative Models for a Small LanguageConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Risto Luukkonen
Ville Komulainen
Jouni Luoma
Anni Eskelinen
Jenna Kanerva
...
Mikko Merioksa
Jyrki Heinonen
Aija Vahtola
Samuel Antao
S. Pyysalo
LM&MA
185
63
0
03 Nov 2023
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from
  Human Feedback
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Nathan Lambert
Roberto Calandra
ALM
465
39
0
31 Oct 2023
Automatic Evaluation of Generative Models with Instruction Tuning
Automatic Evaluation of Generative Models with Instruction TuningIEEE Games Entertainment Media Conference (IEEE GEM), 2023
Shuhaib Mehri
Vered Shwartz
ELMALM
136
2
0
30 Oct 2023
Personas as a Way to Model Truthfulness in Language Models
Personas as a Way to Model Truthfulness in Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Nitish Joshi
Javier Rando
Abulhair Saparov
Najoung Kim
He He
HILM
395
40
0
27 Oct 2023
Unpacking the Ethical Value Alignment in Big Models
Unpacking the Ethical Value Alignment in Big Models
Xiaoyuan Yi
Jing Yao
Xiting Wang
Xing Xie
185
17
0
26 Oct 2023
SuperHF: Supervised Iterative Learning from Human Feedback
SuperHF: Supervised Iterative Learning from Human Feedback
Gabriel Mukobi
Peter Chatain
Su Fong
Robert Windesheim
Gitta Kutyniok
Kush S. Bhatia
Silas Alberti
ALM
262
12
0
25 Oct 2023
OccuQuest: Mitigating Occupational Bias for Inclusive Large Language
  Models
OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models
Mingfeng Xue
Dayiheng Liu
Kexin Yang
Guanting Dong
Wenqiang Lei
Zheng Yuan
Chang Zhou
Jingren Zhou
LLMAG
176
3
0
25 Oct 2023
AI Alignment and Social Choice: Fundamental Limitations and Policy
  Implications
AI Alignment and Social Choice: Fundamental Limitations and Policy ImplicationsSocial Science Research Network (SSRN), 2023
Abhilash Mishra
92
34
0
24 Oct 2023
Self-Guard: Empower the LLM to Safeguard Itself
Self-Guard: Empower the LLM to Safeguard ItselfNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Zezhong Wang
Fangkai Yang
Lu Wang
Lu Wang
Hongru Wang
Liang Chen
Qingwei Lin
Kam-Fai Wong
270
57
0
24 Oct 2023
Confronting LLMs with Traditional ML: Rethinking the Fairness of Large
  Language Models in Tabular Classifications
Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular ClassificationsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yanchen Liu
Srishti Gautam
Jiaqi Ma
Himabindu Lakkaraju
LMTD
218
20
0
23 Oct 2023
AlpaCare:Instruction-tuned Large Language Models for Medical Application
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Xinlu Zhang
Chenxin Tian
Xianjun Yang
Lichang Chen
Zekun Li
Linda R. Petzold
LM&MA
460
86
0
23 Oct 2023
From the Pursuit of Universal AGI Architecture to Systematic Approach to Heterogenous AGI: Addressing Alignment, Energy, & AGI Grand Challenges
From the Pursuit of Universal AGI Architecture to Systematic Approach to Heterogenous AGI: Addressing Alignment, Energy, & AGI Grand ChallengesInternational Journal of Semantic Computing (IJSC), 2023
Eren Kurshan
413
0
0
23 Oct 2023
Which Prompts Make The Difference? Data Prioritization For Efficient
  Human LLM Evaluation
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
M. Boubdir
Edward Kim
Beyza Ermis
Marzieh Fadaee
Sara Hooker
ALM
283
21
0
22 Oct 2023
Values, Ethics, Morals? On the Use of Moral Concepts in NLP Research
Values, Ethics, Morals? On the Use of Moral Concepts in NLP ResearchConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Karina Vida
Judith Simon
Anne Lauscher
243
21
0
21 Oct 2023
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai
Xuehai Pan
Ruiyang Sun
Jiaming Ji
Xinbo Xu
Mickel Liu
Yizhou Wang
Yaodong Yang
399
537
0
19 Oct 2023
Improving Generalization of Alignment with Human Preferences through
  Group Invariant Learning
Improving Generalization of Alignment with Human Preferences through Group Invariant LearningInternational Conference on Learning Representations (ICLR), 2023
Rui Zheng
Wei Shen
Yuan Hua
Wenbin Lai
Jiajun Sun
...
Xiao Wang
Haoran Huang
Tao Gui
Tao Gui
Xuanjing Huang
285
22
0
18 Oct 2023
Group Preference Optimization: Few-Shot Alignment of Large Language
  Models
Group Preference Optimization: Few-Shot Alignment of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Siyan Zhao
John Dang
Aditya Grover
345
46
0
17 Oct 2023
RealBehavior: A Framework for Faithfully Characterizing Foundation
  Models' Human-like Behavior Mechanisms
RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior MechanismsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Enyu Zhou
Rui Zheng
Zhiheng Xi
Songyang Gao
Xiaoran Fan
Zichu Fei
Jingting Ye
Tao Gui
Tao Gui
Xuanjing Huang
159
5
0
17 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future
  Directions
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Weijing Chen
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
441
68
0
16 Oct 2023
The Consensus Game: Language Model Generation via Equilibrium Search
The Consensus Game: Language Model Generation via Equilibrium Search
Athul Paul Jacob
Songlin Yang
Gabriele Farina
Jacob Andreas
246
34
0
13 Oct 2023
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic TaskNeural Information Processing Systems (NeurIPS), 2023
Maya Okawa
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
CoGeDiffM
498
83
0
13 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language
  Models
Prometheus: Inducing Fine-grained Evaluation Capability in Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALMLM&MAELM
522
372
0
12 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
Evaluating Large Language Models at Evaluating Instruction FollowingInternational Conference on Learning Representations (ICLR), 2023
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELMALM
412
264
0
11 Oct 2023
The Past, Present and Better Future of Feedback Learning in Large
  Language Models for Subjective Human Preferences and Values
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and ValuesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
358
63
0
11 Oct 2023
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Catastrophic Jailbreak of Open-source LLMs via Exploiting GenerationInternational Conference on Learning Representations (ICLR), 2023
Yangsibo Huang
Samyak Gupta
Mengzhou Xia
Kai Li
Danqi Chen
AAML
255
405
0
10 Oct 2023
MetaAgents: Large Language Model Based Agents for Decision-Making on Teaming
MetaAgents: Large Language Model Based Agents for Decision-Making on Teaming
Yuan Li
Lichao Sun
Yixuan Zhang
LLMAGLM&Ro
383
119
0
10 Oct 2023
SALMON: Self-Alignment with Instructable Reward Models
SALMON: Self-Alignment with Instructable Reward ModelsInternational Conference on Learning Representations (ICLR), 2023
Zhiqing Sun
Songlin Yang
Hongxin Zhang
Qinhong Zhou
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
ALMSyDa
353
53
0
09 Oct 2023
A Closer Look into Automatic Evaluation Using Large Language Models
A Closer Look into Automatic Evaluation Using Large Language Models
Cheng-Han Chiang
Hunghuei Lee
ELMALMLM&MA
142
18
0
09 Oct 2023
Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for
  Autonomous LLM-powered Multi-Agent Architectures
Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for Autonomous LLM-powered Multi-Agent Architectures
Thorsten Händler
LLMAG
206
35
0
05 Oct 2023
The Empty Signifier Problem: Towards Clearer Paradigms for
  Operationalising "Alignment" in Large Language Models
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
385
9
0
03 Oct 2023
Ask Again, Then Fail: Large Language Models' Vacillations in Judgment
Ask Again, Then Fail: Large Language Models' Vacillations in JudgmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Qiming Xie
Zengzhi Wang
Yi Feng
Rui Xia
AAMLHILM
653
12
0
03 Oct 2023
Instances Need More Care: Rewriting Prompts for Instances with LLMs in
  the Loop Yields Better Zero-Shot Performance
Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot PerformanceAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Saurabh Srivastava
Chengyue Huang
Weiguo Fan
Ziyu Yao
LLMAG
280
9
0
03 Oct 2023
Tool-Augmented Reward Modeling
Tool-Augmented Reward ModelingInternational Conference on Learning Representations (ICLR), 2023
Lei Li
Yekun Chai
Shuohuan Wang
Yu Sun
Hao Tian
Ningyu Zhang
Hua Wu
OffRL
259
22
0
02 Oct 2023
Enabling Language Models to Implicitly Learn Self-Improvement
Enabling Language Models to Implicitly Learn Self-Improvement
Ziqi Wang
Le Hou
Tianjian Lu
Yuexin Wu
Yunxuan Li
Hongkun Yu
Heng Ji
ReLMLRM
279
9
0
02 Oct 2023
Beyond Task Performance: Evaluating and Reducing the Flaws of Large
  Multimodal Models with In-Context Learning
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context LearningInternational Conference on Learning Representations (ICLR), 2023
Mustafa Shukor
Alexandre Ramé
Corentin Dancette
Matthieu Cord
LRMMLLM
428
26
0
01 Oct 2023
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Directly Fine-Tuning Diffusion Models on Differentiable RewardsInternational Conference on Learning Representations (ICLR), 2023
Amita Gajewar
Paul Vicol
G. Bansal
David J Fleet
267
300
0
29 Sep 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
797
3,067
0
28 Sep 2023
GPT-Fathom: Benchmarking Large Language Models to Decipher the
  Evolutionary Path towards GPT-4 and Beyond
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Timothée Darcet
Yuyu Zhang
Yijie Zhu
Chenguang Xi
Pengyang Gao
Piotr Bojanowski
Kevin Chen-Chuan Chang
ELM
355
26
0
28 Sep 2023
Beyond Reverse KL: Generalizing Direct Preference Optimization with
  Diverse Divergence Constraints
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence ConstraintsInternational Conference on Learning Representations (ICLR), 2023
Simon Mahns
Yibo Jiang
Yuguang Yang
Han Liu
Yuxin Chen
255
145
0
28 Sep 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
251
28
0
28 Sep 2023
Large Language Model Alignment: A Survey
Large Language Model Alignment: A Survey
Shangda Wu
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
359
282
0
26 Sep 2023
Aligning Large Multimodal Models with Factually Augmented RLHF
Aligning Large Multimodal Models with Factually Augmented RLHFAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhiqing Sun
Sheng Shen
Shengcao Cao
Haotian Liu
Chunyuan Li
...
Liangyan Gui
Yu-Xiong Wang
Yiming Yang
Kurt Keutzer
Trevor Darrell
VLM
285
592
0
25 Sep 2023
Can LLM-Generated Misinformation Be Detected?
Can LLM-Generated Misinformation Be Detected?International Conference on Learning Representations (ICLR), 2023
Canyu Chen
Kai Shu
DeLMO
782
241
0
25 Sep 2023
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Baolin Peng
Linfeng Song
Ye Tian
Lifeng Jin
Haitao Mi
Dong Yu
190
21
0
18 Sep 2023
RAIN: Your Language Models Can Align Themselves without Finetuning
RAIN: Your Language Models Can Align Themselves without FinetuningInternational Conference on Learning Representations (ICLR), 2023
Yuhui Li
Fangyun Wei
Jinjing Zhao
Chao Zhang
Hongyang R. Zhang
SILM
295
157
0
13 Sep 2023
Mitigating the Alignment Tax of RLHF
Mitigating the Alignment Tax of RLHFConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Xingtai Lv
Tong Zhang
MoMeCLL
666
129
0
12 Sep 2023
Everyone Deserves A Reward: Learning Customized Human Preferences
Everyone Deserves A Reward: Learning Customized Human Preferences
Pengyu Cheng
Jiawen Xie
Ke Bai
Yong Dai
Nan Du
213
43
0
06 Sep 2023
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Daoyuan Chen
Yilun Huang
Zhijian Ma
Hesen Chen
Xuchen Pan
...
Zhaoyang Liu
Jinyang Gao
Yaliang Li
Bolin Ding
Jingren Zhou
SyDaVLM
297
59
0
05 Sep 2023
Previous
123...101112131415
Next