Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.18629
Cited By
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
26 June 2024
Xin Lai
Zhuotao Tian
Yukang Chen
Senqiao Yang
Xiangru Peng
Jiaya Jia
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
21 / 71 papers shown
Title
PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment
Jiawei Li
Xinyue Liang
Yizhe Yang
Chong Feng
Yang Gao
LRM
63
0
0
18 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
49
45
1
15 Nov 2024
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning
Yihe Deng
Paul Mineiro
LRM
21
3
0
29 Oct 2024
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch
Yuyang Ding
Xinyu Shi
Xiaobo Liang
Juntao Li
Qiaoming Zhu
Min Zhang
ELM
AIMat
SyDa
LRM
16
8
0
24 Oct 2024
Aligning CodeLLMs with Direct Preference Optimization
Yibo Miao
Bofei Gao
Shanghaoran Quan
Junyang Lin
Daoguang Zan
J. Liu
Jian Yang
Tianyu Liu
Zhijie Deng
50
5
0
24 Oct 2024
RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards
Xinze Li
Sen Mei
Zhenghao Liu
Yukun Yan
Shuo Wang
...
H. Chen
Ge Yu
Zhiyuan Liu
Maosong Sun
Chenyan Xiong
35
6
0
17 Oct 2024
Process Reward Model with Q-Value Rankings
W. Li
Yixuan Li
LRM
41
13
0
15 Oct 2024
Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps
Han Wang
Yilin Zhao
Dian Li
Xiaohan Wang
Gang Liu
Xuguang Lan
H. Wang
LRM
34
1
0
14 Oct 2024
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Guanlin Liu
Kaixuan Ji
Ning Dai
Zheng Wu
Chen Dun
Q. Gu
Lin Yan
Quanquan Gu
Lin Yan
OffRL
LRM
42
8
0
11 Oct 2024
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
L. Yang
Zhaochen Yu
T. Zhang
Minkai Xu
Joseph E. Gonzalez
Bin Cui
Shuicheng Yan
ELM
ReLM
LRM
39
0
0
11 Oct 2024
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
Weibin Liao
Xu Chu
Yasha Wang
LRM
33
6
0
10 Oct 2024
Subtle Errors Matter: Preference Learning via Error-injected Self-editing
Kaishuai Xu
Tiezheng YU
Wenjun Hou
Yi Cheng
Chak Tou Leong
Liangyou Li
Xin Jiang
Lifeng Shang
Qun Liu
Wenjie Li
LRM
53
0
0
09 Oct 2024
LRHP: Learning Representations for Human Preferences via Preference Pairs
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Qiaozhi He
Murun Yang
Tong Xiao
Chunliang Zhang
Tongran Liu
Jingbo Zhu
AI4TS
29
0
0
06 Oct 2024
Latent Feature Mining for Predictive Model Enhancement with Large Language Models
Bingxuan Li
Pengyi Shi
Amy Ward
38
0
0
06 Oct 2024
Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization
Jianing Wang
Yang Zhou
Xiaocheng Zhang
Mengjiao Bao
Peng Yan
20
0
0
17 Sep 2024
Selective Preference Optimization via Token-Level Reward Function Estimation
Kailai Yang
Zhiwei Liu
Qianqian Xie
Jimin Huang
Erxue Min
Sophia Ananiadou
20
9
0
24 Aug 2024
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Pranav Putta
Edmund Mills
Naman Garg
S. Motwani
Chelsea Finn
Divyansh Garg
Rafael Rafailov
LLMAG
LRM
18
19
0
13 Aug 2024
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs
Zimu Lu
Aojun Zhou
Houxing Ren
Ke Wang
Weikang Shi
Junting Pan
Mingjie Zhan
Hongsheng Li
SyDa
LRM
45
42
0
26 Feb 2024
Complexity-Based Prompting for Multi-Step Reasoning
Yao Fu
Hao-Chun Peng
Ashish Sabharwal
Peter Clark
Tushar Khot
ReLM
LRM
152
298
0
03 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Previous
1
2