ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.05862
  4. Cited By
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

12 April 2022
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
Nova Dassarma
Dawn Drain
Stanislav Fort
Deep Ganguli
T. Henighan
Nicholas Joseph
Saurav Kadavath
John Kernion
Tom Conerly
S. E. Showk
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
Tristan Hume
Scott Johnston
Shauna Kravec
Liane Lovitt
Neel Nanda
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
ArXivPDFHTML

Papers citing "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

50 / 1,795 papers shown
Title
Finetuning Text-to-Image Diffusion Models for Fairness
Finetuning Text-to-Image Diffusion Models for Fairness
Xudong Shen
Chao Du
Tianyu Pang
Min-Bin Lin
Yongkang Wong
Mohan S. Kankanhalli
21
49
0
11 Nov 2023
TransformCode: A Contrastive Learning Framework for Code Embedding via
  Subtree Transformation
TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation
Zixiang Xian
Rubing Huang
Dave Towey
Chunrong Fang
Zhenyu Chen
10
5
0
10 Nov 2023
Fake Alignment: Are LLMs Really Aligned Well?
Fake Alignment: Are LLMs Really Aligned Well?
Yixu Wang
Yan Teng
Kexin Huang
Chengqi Lyu
Songyang Zhang
Wenwei Zhang
Xingjun Ma
Yu-Gang Jiang
Yu Qiao
Yingchun Wang
29
15
0
10 Nov 2023
Establishing Performance Baselines in Fine-Tuning, Retrieval-Augmented
  Generation and Soft-Prompting for Non-Specialist LLM Users
Establishing Performance Baselines in Fine-Tuning, Retrieval-Augmented Generation and Soft-Prompting for Non-Specialist LLM Users
Jennifer Dodgson
Nanzheng Lin
Julian Peh
Akira Rafhael Janson Pattirane
Alfath Daryl Alhajir
Eko Ridho Dinarto
Joseph Lim
Syed Danyal Ahmad
RALM
30
7
0
10 Nov 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
129
117
0
09 Nov 2023
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for
  Human-Aligned LLMs
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs
Shuyi Xie
Wenlin Yao
Yong Dai
Shaobo Wang
Donlin Zhou
...
Zhichao Hu
Dong Yu
Zhengyou Zhang
Jing Nie
Yuhong Liu
ELM
ALM
16
4
0
09 Nov 2023
A Survey of Large Language Models in Medicine: Progress, Application,
  and Challenge
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
Hongjian Zhou
Fenglin Liu
Boyang Gu
Xinyu Zou
Jinfa Huang
...
Yefeng Zheng
Lei A. Clifton
Zheng Li
Fenglin Liu
David A. Clifton
LM&MA
31
106
0
09 Nov 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in
  Perspectives
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
34
9
0
09 Nov 2023
First Tragedy, then Parse: History Repeats Itself in the New Era of
  Large Language Models
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Naomi Saphra
Eve Fleisig
Kyunghyun Cho
Adam Lopez
LRM
17
8
0
08 Nov 2023
Evaluating multiple large language models in pediatric ophthalmology
Evaluating multiple large language models in pediatric ophthalmology
J. Holmes
Rui Peng
Yiwei Li
Jinyu Hu
Zheng Liu
...
Wei Liu
Hong Wei
Jie Zou
Tianming Liu
Yi Shao
AI4Ed
ELM
LM&MA
21
0
0
07 Nov 2023
Black-Box Prompt Optimization: Aligning Large Language Models without
  Model Training
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Jiale Cheng
Xiao Liu
Kehan Zheng
Pei Ke
Hongning Wang
Yuxiao Dong
Jie Tang
Minlie Huang
29
78
0
07 Nov 2023
Unveiling Safety Vulnerabilities of Large Language Models
Unveiling Safety Vulnerabilities of Large Language Models
George Kour
Marcel Zalmanovici
Naama Zwerdling
Esther Goldbraich
Ora Nova Fandina
Ateret Anaby-Tavor
Orna Raz
E. Farchi
AAML
16
15
0
07 Nov 2023
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
Geyang Guo
Ranchi Zhao
Tianyi Tang
Wayne Xin Zhao
Ji-Rong Wen
ALM
32
27
0
07 Nov 2023
A Survey of Large Language Models Attribution
A Survey of Large Language Models Attribution
Dongfang Li
Zetian Sun
Xinshuo Hu
Zhenyu Liu
Ziyang Chen
Baotian Hu
Aiguo Wu
Min Zhang
HILM
13
49
0
07 Nov 2023
Scalable and Transferable Black-Box Jailbreaks for Language Models via
  Persona Modulation
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation
Rusheb Shah
Quentin Feuillade--Montixi
Soroush Pour
Arush Tagade
Stephen Casper
Javier Rando
26
122
0
06 Nov 2023
Can LLMs Follow Simple Rules?
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David A. Wagner
ALM
23
26
0
06 Nov 2023
LLMs grasp morality in concept
LLMs grasp morality in concept
Mark Pock
Andre Ye
Jared Moore
FaML
19
2
0
04 Nov 2023
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
Jiao Ou
Junda Lu
Che Liu
Yihong Tang
Fuzheng Zhang
Di Zhang
Kun Gai
ALM
LM&MA
30
14
0
03 Nov 2023
Successor Features for Efficient Multisubject Controlled Text Generation
Successor Features for Efficient Multisubject Controlled Text Generation
Mengyao Cao
Mehdi Fatemi
Jackie Chi Kit Cheung
Samira Shabanian
BDL
26
0
0
03 Nov 2023
Leveraging Large Language Models for Collective Decision-Making
Leveraging Large Language Models for Collective Decision-Making
Marios Papachristou
Longqi Yang
Chin-Chia Hsu
LLMAG
31
2
0
03 Nov 2023
The Impact of Preference Agreement in Reinforcement Learning from Human
  Feedback: A Case Study in Summarization
The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization
Sian Gooding
Hassan Mansoor
10
1
0
02 Nov 2023
Making Harmful Behaviors Unlearnable for Large Language Models
Making Harmful Behaviors Unlearnable for Large Language Models
Xin Zhou
Yi Lu
Ruotian Ma
Tao Gui
Qi Zhang
Xuanjing Huang
MU
36
9
0
02 Nov 2023
Improving Interpersonal Communication by Simulating Audiences with
  Language Models
Improving Interpersonal Communication by Simulating Audiences with Language Models
Ryan Liu
Howard Yen
Raja Marjieh
Thomas L. Griffiths
Ranjay Krishna
12
11
0
01 Nov 2023
The Mystery of In-Context Learning: A Comprehensive Survey on
  Interpretation and Analysis
The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis
Yuxiang Zhou
Jiazheng Li
Yanzheng Xiang
Hanqi Yan
Lin Gui
Yulan He
22
14
0
01 Nov 2023
Robust Safety Classifier for Large Language Models: Adversarial Prompt
  Shield
Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield
Jinhwa Kim
Ali Derakhshan
Ian G. Harris
AAML
98
16
0
31 Oct 2023
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from
  Human Feedback
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Nathan Lambert
Roberto Calandra
ALM
18
31
0
31 Oct 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
10
7
0
31 Oct 2023
Leveraging Word Guessing Games to Assess the Intelligence of Large
  Language Models
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
Tian Liang
Zhiwei He
Jen-tse Huang
Wenxuan Wang
Wenxiang Jiao
Rui Wang
Yujiu Yang
Zhaopeng Tu
Shuming Shi
Xing Wang
LLMAG
58
5
0
31 Oct 2023
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark
  for Large Language Models
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
Yuxin Jiang
Yufei Wang
Xingshan Zeng
Wanjun Zhong
Liangyou Li
Fei Mi
Lifeng Shang
Xin Jiang
Qun Liu
Wei Wang
ALM
15
25
0
31 Oct 2023
Automatic Evaluation of Generative Models with Instruction Tuning
Automatic Evaluation of Generative Models with Instruction Tuning
Shuhaib Mehri
Vered Shwartz
ELM
ALM
10
1
0
30 Oct 2023
MoCa: Measuring Human-Language Model Alignment on Causal and Moral
  Judgment Tasks
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Allen Nie
Yuhui Zhang
Atharva Amdekar
Chris Piech
Tatsunori Hashimoto
Tobias Gerstenberg
25
34
0
30 Oct 2023
Constituency Parsing using LLMs
Constituency Parsing using LLMs
Xuefeng Bai
Jialong Wu
Yulong Chen
Zhongqing Wang
Yue Zhang
33
1
0
30 Oct 2023
Skywork: A More Open Bilingual Foundation Model
Skywork: A More Open Bilingual Foundation Model
Tianwen Wei
Liang Zhao
Lichang Zhang
Bo Zhu
Lijie Wang
...
Yongyi Peng
Xiaojuan Liang
Shuicheng Yan
Han Fang
Yahui Zhou
27
92
0
30 Oct 2023
FP8-LM: Training FP8 Large Language Models
FP8-LM: Training FP8 Large Language Models
Houwen Peng
Kan Wu
Yixuan Wei
Guoshuai Zhao
Yuxiang Yang
...
Zheng-Wei Zhang
Shuguang Liu
Joe Chau
Han Hu
Peng Cheng
MQ
59
38
0
27 Oct 2023
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection
  Method
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method
Yukun Zhao
Lingyong Yan
Weiwei Sun
Guoliang Xing
Chong Meng
Shuaiqiang Wang
Zhicong Cheng
Zhaochun Ren
Dawei Yin
27
35
0
27 Oct 2023
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models
  via Contextual Integrity Theory
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
Niloofar Mireshghallah
Hyunwoo J. Kim
Xuhui Zhou
Yulia Tsvetkov
Maarten Sap
Reza Shokri
Yejin Choi
PILM
30
73
0
27 Oct 2023
Social Contract AI: Aligning AI Assistants with Implicit Group Norms
Social Contract AI: Aligning AI Assistants with Implicit Group Norms
Jan-Philipp Fränken
Sam Kwok
Peixuan Ye
Kanishk Gandhi
Dilip Arumugam
Jared Moore
Alex Tamkin
Tobias Gerstenberg
Noah D. Goodman
29
7
0
26 Oct 2023
Unpacking the Ethical Value Alignment in Big Models
Unpacking the Ethical Value Alignment in Big Models
Xiaoyuan Yi
Jing Yao
Xiting Wang
Xing Xie
24
11
0
26 Oct 2023
Controlled Decoding from Language Models
Controlled Decoding from Language Models
Sidharth Mudgal
Jong Lee
H. Ganapathy
Yaguang Li
Tao Wang
...
Michael Collins
Trevor Strohman
Jilin Chen
Alex Beutel
Ahmad Beirami
32
69
0
25 Oct 2023
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing
  & Attribution in AI
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Shayne Longpre
Robert Mahari
Anthony Chen
Naana Obeng-Marnu
Damien Sileo
...
K. Bollacker
Tongshuang Wu
Luis Villa
Sandy Pentland
Sara Hooker
15
55
0
25 Oct 2023
BabyStories: Can Reinforcement Learning Teach Baby Language Models to
  Write Better Stories?
BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories?
Xingmeng Zhao
Tongnian Wang
Sheri Osborn
Anthony Rios
8
4
0
25 Oct 2023
Improving Diversity of Demographic Representation in Large Language
  Models via Collective-Critiques and Self-Voting
Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Preethi Lahoti
Nicholas Blumm
Xiao Ma
Raghavendra Kotikalapudi
Sahitya Potluri
...
Hansa Srinivasan
Ben Packer
Ahmad Beirami
Alex Beutel
Jilin Chen
39
28
0
25 Oct 2023
CycleAlign: Iterative Distillation from Black-box LLM to White-box
  Models for Better Human Alignment
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
Jixiang Hong
Quan Tu
C. Chen
Xing Gao
Ji Zhang
Rui Yan
ALM
14
11
0
25 Oct 2023
The Distributional Hypothesis Does Not Fully Explain the Benefits of
  Masked Language Model Pretraining
The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining
Ting-Rui Chiang
Dani Yogatama
25
1
0
25 Oct 2023
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of
  LLMs through a Global Scale Prompt Hacking Competition
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition
Sander Schulhoff
Jeremy Pinto
Anaum Khan
Louis-Franccois Bouchard
Chenglei Si
Svetlina Anati
Valen Tagliabue
Anson Liu Kost
Christopher Carnahan
Jordan L. Boyd-Graber
SILM
29
41
0
24 Oct 2023
AI Alignment and Social Choice: Fundamental Limitations and Policy
  Implications
AI Alignment and Social Choice: Fundamental Limitations and Policy Implications
Abhilash Mishra
12
21
0
24 Oct 2023
Instruct and Extract: Instruction Tuning for On-Demand Information
  Extraction
Instruct and Extract: Instruction Tuning for On-Demand Information Extraction
Yizhu Jiao
Ming Zhong
Sha Li
Ruining Zhao
Siru Ouyang
Heng Ji
Jiawei Han
33
24
0
24 Oct 2023
SoK: Memorization in General-Purpose Large Language Models
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David E. Evans
Shruti Tople
Robert West
KELM
LLMAG
16
20
0
24 Oct 2023
Self-Guard: Empower the LLM to Safeguard Itself
Self-Guard: Empower the LLM to Safeguard Itself
Zezhong Wang
Fangkai Yang
Lu Wang
Pu Zhao
Hongru Wang
Liang Chen
Qingwei Lin
Kam-Fai Wong
69
28
0
24 Oct 2023
Generative Language Models Exhibit Social Identity Biases
Generative Language Models Exhibit Social Identity Biases
Tiancheng Hu
Yara Kyrychenko
Steve Rathje
Nigel Collier
S. V. D. Linden
Jon Roozenbeek
30
37
0
24 Oct 2023
Previous
123...272829...343536
Next