Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.12999
Cited By
POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization
16 October 2024
Batuhan K. Karaman
Ishmam Zabir
Alon Benhaim
Vishrav Chaudhary
M. Sabuncu
Xia Song
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization"
1 / 1 papers shown
Title
Rule Based Rewards for Language Model Safety
Tong Mu
Alec Helyar
Johannes Heidecke
Joshua Achiam
Andrea Vallone
Ian Kivlichan
Molly Lin
Alex Beutel
John Schulman
Lilian Weng
ALM
34
35
0
02 Nov 2024
1