Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.08998
Cited By
Reinforced Self-Training (ReST) for Language Modeling
17 August 2023
Çağlar Gülçehre
T. Paine
S. Srinivasan
Ksenia Konyushkova
L. Weerts
Abhishek Sharma
Aditya Siddhant
Alexa Ahern
Miaosen Wang
Chenjie Gu
Wolfgang Macherey
Arnaud Doucet
Orhan Firat
Nando de Freitas
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reinforced Self-Training (ReST) for Language Modeling"
50 / 225 papers shown
Title
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
DynaSaur: Large Language Agents Beyond Predefined Actions
Dang Nguyen
Viet Dac Lai
Seunghyun Yoon
Ryan Rossi
Handong Zhao
...
Nedim Lipka
Yu-Chiang Frank Wang
Trung H. Bui
Franck Dernoncourt
Tianyi Zhou
LM&Ro
ELM
LLMAG
47
6
0
04 Nov 2024
Rule Based Rewards for Language Model Safety
Tong Mu
Alec Helyar
Johannes Heidecke
Joshua Achiam
Andrea Vallone
Ian Kivlichan
Molly Lin
Alex Beutel
John Schulman
Lilian Weng
ALM
36
35
0
02 Nov 2024
Vision-Language Models Can Self-Improve Reasoning via Reflection
Kanzhi Cheng
Yantao Li
Fangzhi Xu
Jianbing Zhang
Hao Zhou
Yang Liu
ReLM
LRM
47
17
0
30 Oct 2024
Matryoshka: Learning to Drive Black-Box LLMs with LLMs
Changhao Li
Yuchen Zhuang
Rushi Qiang
Haotian Sun
H. Dai
Chao Zhang
Bo Dai
LRM
26
4
0
28 Oct 2024
SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning
Shivam Adarsh
Kumar Shridhar
Caglar Gulcehre
Nicholas Monath
Mrinmaya Sachan
LRM
29
2
0
24 Oct 2024
Learning from others' mistakes: Finetuning machine translation models with span-level error annotations
Lily H. Zhang
Hamid Dadkhahi
M. Finkelstein
Firas Trabelsi
Jiaming Luo
Markus Freitag
29
1
0
21 Oct 2024
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks
Rongxing Liu
Kumar Shridhar
Manish Prajapat
Patrick Xia
Mrinmaya Sachan
LLMAG
LRM
23
3
0
21 Oct 2024
Can Large Language Models Invent Algorithms to Improve Themselves?
Yoichi Ishibashi
Taro Yano
Masafumi Oyamada
AIFin
LRM
34
1
0
21 Oct 2024
Automated Proof Generation for Rust Code via Self-Evolution
Tianyu Chen
Shuai Lu
Shan Lu
Y. Gong
Chenyuan Yang
...
Peng Cheng
Fan Yang
Shuvendu Lahiri
Tao Xie
Lidong Zhou
39
7
0
21 Oct 2024
Keep Guessing? When Considering Inference Scaling, Mind the Baselines
G. Yona
Or Honovich
Omer Levy
Roee Aharoni
UQLM
LRM
33
0
0
20 Oct 2024
TAGExplainer: Narrating Graph Explanations for Text-Attributed Graph Learning Models
Bo Pan
Zhen Xiong
Guanchen Wu
Zheng Zhang
Yifei Zhang
Liang Zhao
FAtt
31
1
0
20 Oct 2024
Anchored Alignment for Self-Explanations Enhancement
Luis Felipe Villa-Arenas
Ata Nizamoglu
Qianli Wang
Sebastian Möller
Vera Schmitt
24
0
0
17 Oct 2024
A Survey on Data Synthesis and Augmentation for Large Language Models
Ke Wang
Jiahui Zhu
Minjie Ren
Z. Liu
Shiwei Li
...
Chenkai Zhang
Xiaoyu Wu
Qiqi Zhan
Qingjie Liu
Yunhong Wang
SyDa
38
15
0
16 Oct 2024
QE-EBM: Using Quality Estimators as Energy Loss for Machine Translation
Gahyun Yoo
Jay Yoon Lee
24
0
0
14 Oct 2024
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System
Weize Chen
Jiarui Yuan
Chen Qian
Cheng Yang
Zhiyuan Liu
Maosong Sun
LLMAG
28
4
0
10 Oct 2024
Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation
Sweta Agrawal
José G. C. de Souza
Ricardo Rei
António Farinhas
Gonçalo Faria
Patrick Fernandes
Nuno M. Guerreiro
Andre Martins
24
5
0
10 Oct 2024
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Shenao Zhang
Zhihan Liu
Boyi Liu
Y. Zhang
Yingxiang Yang
Y. Liu
Liyu Chen
Tao Sun
Z. Wang
87
2
0
10 Oct 2024
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
Yougang Lyu
Lingyong Yan
Zihan Wang
Dawei Yin
Pengjie Ren
Maarten de Rijke
Z. Z. Ren
55
6
0
10 Oct 2024
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Zirui Zhao
Hanze Dong
Amrita Saha
Caiming Xiong
Doyen Sahoo
LRM
27
3
0
10 Oct 2024
Self-Boosting Large Language Models with Synthetic Preference Data
Qingxiu Dong
Li Dong
Xingxing Zhang
Zhifang Sui
Furu Wei
SyDa
34
6
0
09 Oct 2024
Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics
Stefano Perrella
Lorenzo Proietti
Pere-Lluís Huguet Cabot
Edoardo Barba
Roberto Navigli
21
3
0
07 Oct 2024
Rationale-Aware Answer Verification by Pairwise Self-Evaluation
Akira Kawabata
Saku Sugawara
LRM
31
2
0
07 Oct 2024
Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance
Seungyong Moon
Bumsoo Park
Hyun Oh Song
RALM
AIFin
21
1
0
03 Oct 2024
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning
Huimu Yu
Xing Wu
Weidong Yin
Debing Zhang
Songlin Hu
LRM
26
5
0
03 Oct 2024
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
23
2
0
02 Oct 2024
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Yuling Shi
Songsong Wang
Chengcheng Wan
Xiaodong Gu
ELM
29
6
0
02 Oct 2024
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
Danqing Wang
Jianxin Ma
Fei Fang
Lei Li
LLMAG
LRM
110
0
0
02 Oct 2024
From Lists to Emojis: How Format Bias Affects Model Alignment
Xuanchang Zhang
Wei Xiong
Lichang Chen
Tianyi Zhou
Heng Huang
Tong Zhang
ALM
33
10
0
18 Sep 2024
Semi-Supervised Reward Modeling via Iterative Self-Training
Yifei He
Haoxiang Wang
Ziyan Jiang
Alexandros Papangelis
Han Zhao
OffRL
31
2
0
10 Sep 2024
EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding
Muye Huang
Han Lai
Xinyu Zhang
Wenjun Wu
Jie Ma
Lingling Zhang
Jun Liu
33
4
0
03 Sep 2024
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Hritik Bansal
Arian Hosseini
Rishabh Agarwal
Vinh Q. Tran
Mehran Kazemi
SyDa
OffRL
LRM
35
35
0
29 Aug 2024
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
Han Xia
Songyang Gao
Qiming Ge
Zhiheng Xi
Qi Zhang
Xuanjing Huang
36
4
0
27 Aug 2024
Preference-Guided Reflective Sampling for Aligning Language Models
Hai Ye
Hwee Tou Ng
29
3
0
22 Aug 2024
Importance Weighting Can Help Large Language Models Self-Improve
Chunyang Jiang
Chi-min Chan
Wei Xue
Qifeng Liu
Yike Guo
35
3
0
19 Aug 2024
Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
Jin Gao
Lei Gan
Yuankai Li
Yixin Ye
Dequan Wang
24
2
0
02 Aug 2024
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
Tianduo Wang
Shichen Li
Wei Lu
LRM
AI4CE
45
14
1
25 Jul 2024
A Survey on Symbolic Knowledge Distillation of Large Language Models
Kamal Acharya
Alvaro Velasquez
H. Song
SyDa
29
4
0
12 Jul 2024
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
Yuzhe Gu
Ziwei Ji
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai Chen
HILM
34
5
0
05 Jul 2024
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives
Luísa Shimabucoro
Sebastian Ruder
Julia Kreutzer
Marzieh Fadaee
Sara Hooker
SyDa
21
4
0
01 Jul 2024
Exploring Advanced Large Language Models with LLMsuite
Giorgio Roffo
LLMAG
17
0
0
01 Jul 2024
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Yuheng Zhang
Dian Yu
Baolin Peng
Linfeng Song
Ye Tian
Mingyue Huo
Nan Jiang
Haitao Mi
Dong Yu
35
14
0
30 Jun 2024
PORT: Preference Optimization on Reasoning Traces
Salem Lahlou
Abdalgader Abubaker
Hakim Hacid
LRM
31
1
0
23 Jun 2024
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
Haoxiang Wang
Wei Xiong
Tengyang Xie
Han Zhao
Tong Zhang
46
135
0
18 Jun 2024
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Fangzhi Xu
Qiushi Sun
Kanzhi Cheng
J. Liu
Yu Qiao
Zhiyong Wu
LLMAG
29
5
0
17 Jun 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
29
5
0
15 Jun 2024
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Rui Yang
Ruomeng Ding
Yong Lin
Huan Zhang
Tong Zhang
26
42
0
14 Jun 2024
From Text to Life: On the Reciprocal Relationship between Artificial Life and Large Language Models
Eleni Nisioti
Claire Glanois
Elias Najarro
Andrew Dai
Elliot Meyerson
J. Pedersen
Laetitia Teodorescu
Conor F. Hayes
Shyam Sudhakaran
Sebastian Risi
AI4CE
LM&Ro
37
2
0
14 Jun 2024
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
Xuan Zhang
Chao Du
Tianyu Pang
Qian Liu
Wei Gao
Min-Bin Lin
LRM
AI4CE
44
34
0
13 Jun 2024
ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
Xu Zhang
Xunjian Yin
Xiaojun Wan
40
3
0
13 Jun 2024
Previous
1
2
3
4
5
Next