Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.08685
Cited By
Safer-Instruct: Aligning Language Models with Automated Preference Data
15 November 2023
Taiwei Shi
Kai Chen
Jieyu Zhao
ALM
SyDa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Safer-Instruct: Aligning Language Models with Automated Preference Data"
18 / 18 papers shown
Title
Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini
Sachin Goyal
Dylan Sam
Alex Robey
Yash Savani
Yiding Jiang
Andy Zou
Zacharcy C. Lipton
J. Zico Kolter
50
0
0
23 Apr 2025
ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
Tong Chen
Faeze Brahman
Jiacheng Liu
Niloofar Mireshghallah
Weijia Shi
Pang Wei Koh
Luke Zettlemoyer
Hannaneh Hajishirzi
36
0
0
20 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
38
2
0
12 Apr 2025
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song
Xuwei Ding
Jieyu Zhang
Taiwei Shi
Ryotaro Shimizu
Rahul Gupta
Y. Liu
Jian Kang
Jieyu Zhao
KELM
54
0
0
30 Mar 2025
Like Father, Like Son: Kinship-Aware Preference Mapping (KARMA) for Automatic Alignment in Large Language Models
Jeesu Jung
Chanjun Park
Sangkeun Jung
64
0
0
26 Feb 2025
Self-Boosting Large Language Models with Synthetic Preference Data
Qingxiu Dong
Li Dong
Xingxing Zhang
Zhifang Sui
Furu Wei
SyDa
34
6
0
09 Oct 2024
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
Fei Wang
Ninareh Mehrabi
Palash Goyal
Rahul Gupta
Kai-Wei Chang
Aram Galstyan
ALM
40
1
0
07 Oct 2024
ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model
Lifan Jiang
Zhihui Wang
Siqi Yin
Guangxiao Ma
Peng Zhang
Boxi Wu
DiffM
51
0
0
28 Aug 2024
Know Your Limits: A Survey of Abstention in Large Language Models
Bingbing Wen
Jihan Yao
Shangbin Feng
Chenjun Xu
Yulia Tsvetkov
Bill Howe
Lucy Lu Wang
49
5
0
25 Jul 2024
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Shang Wang
Tianqing Zhu
Bo Liu
Ming Ding
Xu Guo
Dayong Ye
Wanlei Zhou
Philip S. Yu
PILM
52
16
0
12 Jun 2024
The Real, the Better: Aligning Large Language Models with Online Human Behaviors
Guanying Jiang
Lingyong Yan
Haibo Shi
Dawei Yin
28
2
0
01 May 2024
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Jingyu Zhang
Marc Marone
Tianjian Li
Benjamin Van Durme
Daniel Khashabi
85
9
0
05 Apr 2024
How Susceptible are Large Language Models to Ideological Manipulation?
Kai Chen
Zihao He
Jun Yan
Taiwei Shi
Kristina Lerman
27
10
0
18 Feb 2024
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILM
ELM
24
468
0
04 Dec 2023
Self-Guard: Empower the LLM to Safeguard Itself
Zezhong Wang
Fangkai Yang
Lu Wang
Pu Zhao
Hongru Wang
Liang Chen
Qingwei Lin
Kam-Fai Wong
48
28
0
24 Oct 2023
Instruction Tuning with Human Curriculum
Bruce W. Lee
Hyunsoo Cho
Kang Min Yoo
35
3
0
14 Oct 2023
Instruction Induction: From Few Examples to Natural Language Task Descriptions
Or Honovich
Uri Shaham
Samuel R. Bowman
Omer Levy
ELM
LRM
110
135
0
22 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
1