Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.05585
Cited By
Proximal Policy Optimization Actual Combat: Manipulating Output Tokenizer Length
10 August 2023
Miao Fan
Chen Hu
Shuchang Zhou
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Proximal Policy Optimization Actual Combat: Manipulating Output Tokenizer Length"
1 / 1 papers shown
Title
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
1