Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.03173
Cited By
B
\mathcal{B}
B
-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis
4 October 2023
Zishun Yu
Yunzhe Tao
Liyu Chen
Tao Sun
Hongxia Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis"
11 / 11 papers shown
Title
Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs
Marina Sakharova
Abhinav Anand
Mira Mezini
44
0
0
21 Apr 2025
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark
Ruizhong Qiu
Weiliang Will Zeng
Hanghang Tong
James Ezick
Christopher Lott
84
15
0
20 Feb 2025
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Zishun Yu
Tengyu Xu
Di Jin
Karthik Abinav Sankararaman
Yun He
...
Eryk Helenowski
Chen Zhu
Sinong Wang
Hao Ma
Han Fang
LRM
49
4
0
29 Jan 2025
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai
Haoran Sun
Huang Fang
Shuohuan Wang
Yu Sun
Hua-Hong Wu
41
1
0
03 Oct 2024
Reinforcement Learning with Knowledge Representation and Reasoning: A Brief Survey
Chao Yu
Xuejing Zheng
H. Zhuo
OffRL
LRM
42
7
0
24 Apr 2023
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le
Yue Wang
Akhilesh Deepak Gotmare
Silvio Savarese
S. Hoi
SyDa
ALM
118
232
0
05 Jul 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
201
1,451
0
02 Sep 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
194
614
0
20 May 2021
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRL
GP
321
1,944
0
04 May 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
1