Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.10668
Cited By
Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation
20 August 2024
Haoyu Wang
Bingzhe Wu
Yatao Bian
Yongzhe Chang
Xueqian Wang
Peilin Zhao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation"
3 / 3 papers shown
Title
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
Haoxiang Wang
Wei Xiong
Tengyang Xie
Han Zhao
Tong Zhang
46
132
0
18 Jun 2024
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping
Haoyu Wang
Guozheng Ma
Ziqiao Meng
Zeyu Qin
Li Shen
...
Liu Liu
Yatao Bian
Tingyang Xu
Xueqian Wang
Peilin Zhao
55
12
0
12 Feb 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
1