ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.02231
  4. Cited By
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
v1v2v3 (latest)

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

4 June 2023
Banghua Zhu
Hiteshi Sharma
Felipe Vieira Frujeri
Shi Dong
Chenguang Zhu
Michael I. Jordan
Jiantao Jiao
    OSLM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Fine-Tuning Language Models with Advantage-Induced Policy Alignment"

34 / 34 papers shown
Title
Alignment-Aware Decoding
Alignment-Aware Decoding
Frédéric Berdoz
Luca A. Lanzendörfer
René Caky
Roger Wattenhofer
88
0
0
30 Sep 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesComputer Vision and Pattern Recognition (CVPR), 2025
Huanyi Zheng
Yuzhuo Tian
Hao Chen
Chunluan Zhou
Qingpei Guo
Yongxu Liu
M. Yang
Chunhua Shen
MLLMVLM
221
8
0
11 Mar 2025
BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in AlignmentNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Sizhe Wang
Yongqi Tong
Hengyuan Zhang
Dawei Li
Xin Zhang
Tianlong Chen
415
14
0
21 Feb 2025
One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity
One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversityNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Sonia K. Murthy
Tomer Ullman
Jennifer Hu
ALM
280
30
0
07 Nov 2024
Learning to Better Search with Language Models via Guided Reinforced Self-Training
Learning to Better Search with Language Models via Guided Reinforced Self-Training
Seungyong Moon
Bumsoo Park
Hyun Oh Song
AIFinRALM
203
4
0
03 Oct 2024
Preference Alignment Improves Language Model-Based TTS
Preference Alignment Improves Language Model-Based TTSIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jinchuan Tian
Chunlei Zhang
Jiatong Shi
Hao Zhang
Jianwei Yu
Shinji Watanabe
Dong Yu
209
20
0
19 Sep 2024
Policy Filtration for RLHF to Mitigate Noise in Reward Models
Policy Filtration for RLHF to Mitigate Noise in Reward Models
Chuheng Zhang
Wei Shen
Li Zhao
Xuyun Zhang
Xiaolong Xu
Wanchun Dou
Jiang Biang
OffRL
223
6
0
11 Sep 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Mission Impossible: A Statistical Perspective on Jailbreaking LLMsNeural Information Processing Systems (NeurIPS), 2024
Jingtong Su
Mingyu Lee
SangKeun Lee
151
21
0
02 Aug 2024
AI Safety in Generative AI Large Language Models: A Survey
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
309
35
0
06 Jul 2024
Self-Evolution Fine-Tuning for Policy Optimization
Self-Evolution Fine-Tuning for Policy Optimization
Ruijun Chen
Jiehao Liang
Shiping Gao
Fanqi Wan
Xiaojun Quan
136
0
0
16 Jun 2024
Latent Logic Tree Extraction for Event Sequence Explanation from LLMs
Latent Logic Tree Extraction for Event Sequence Explanation from LLMs
Zitao Song
Chao Yang
Chaojie Wang
Bo An
Shuang Li
400
8
0
03 Jun 2024
Online Self-Preferring Language Models
Online Self-Preferring Language Models
Yuanzhao Zhai
Zhuo Zhang
Kele Xu
Hanyang Peng
Yue Yu
Dawei Feng
Cheng Yang
Bo Ding
Huaimin Wang
147
0
0
23 May 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Gokul Swamy
Kianté Brantley
Thorsten Joachims
J. Andrew Bagnell
Jason D. Lee
Wen Sun
OffRL
236
57
0
25 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from
  Human Feedback for LLMs
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
Ameet Deshpande
Bruno Castro da Silva
286
81
0
12 Apr 2024
Dataset Reset Policy Optimization for RLHF
Dataset Reset Policy Optimization for RLHF
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Kianté Brantley
Dipendra Kumar Misra
Jason D. Lee
Wen Sun
OffRL
331
28
0
12 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELMKELM
304
58
0
08 Apr 2024
Stream of Search (SoS): Learning to Search in Language
Stream of Search (SoS): Learning to Search in Language
Kanishk Gandhi
Denise Lee
Gabriel Grand
Muxin Liu
Winson Cheng
Archit Sharma
Noah D. Goodman
RALMAIFinLRM
204
112
0
01 Apr 2024
Scaling Data Diversity for Fine-Tuning Language Models in Human
  Alignment
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment
Feifan Song
Bowen Yu
Hao Lang
Haiyang Yu
Fei Huang
Houfeng Wang
Yongbin Li
ALM
144
22
0
17 Mar 2024
Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
Seiji Gobara
Hidetaka Kamigaito
Taro Watanabe
164
5
0
22 Feb 2024
Generative AI Security: Challenges and Countermeasures
Generative AI Security: Challenges and Countermeasures
Banghua Zhu
Norman Mu
Jiantao Jiao
David Wagner
AAMLSILM
162
14
0
20 Feb 2024
Robust Prompt Optimization for Defending Language Models Against
  Jailbreaking Attacks
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Andy Zhou
Bo Li
Haohan Wang
AAML
353
124
0
30 Jan 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
189
41
0
29 Jan 2024
I am a Strange Dataset: Metalinguistic Tests for Language Models
I am a Strange Dataset: Metalinguistic Tests for Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Tristan Thrush
Jared Moore
Miguel Monares
Christopher Potts
Douwe Kiela
193
7
0
10 Jan 2024
Uncertainty-Penalized Reinforcement Learning from Human Feedback with
  Diverse Reward LoRA Ensembles
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
Yuanzhao Zhai
Han Zhang
Yu Lei
Yue Yu
Kele Xu
Dawei Feng
Bo Ding
Huaimin Wang
AI4CE
277
39
0
30 Dec 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
239
286
0
18 Dec 2023
ULMA: Unified Language Model Alignment with Human Demonstration and
  Point-wise Preference
ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference
Tianchi Cai
Xierui Song
Jiyan Jiang
Fei Teng
Jinjie Gu
Guannan Zhang
ALM
162
8
0
05 Dec 2023
PromptMix: A Class Boundary Augmentation Method for Large Language Model
  Distillation
PromptMix: A Class Boundary Augmentation Method for Large Language Model DistillationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Gaurav Sahu
Olga Vechtomova
Dzmitry Bahdanau
I. Laradji
VLM
320
34
0
22 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language ModelsInternational Conference on Machine Learning (ICML), 2023
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Tian Ding
Zhimin Luo
299
119
0
16 Oct 2023
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Towards the Fundamental Limits of Knowledge Transfer over Finite DomainsInternational Conference on Learning Representations (ICLR), 2023
Qingyue Zhao
Banghua Zhu
328
5
0
11 Oct 2023
Generating and Evaluating Tests for K-12 Students with Language Model
  Simulations: A Case Study on Sentence Reading Efficiency
Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading EfficiencyConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
E. Zelikman
Wanjing Anya Ma
Jasmine E. Tran
Diyi Yang
Jason D. Yeatman
Nick Haber
AI4Ed
132
14
0
10 Oct 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for
  LLM Alignment
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Tianhao Wu
Banghua Zhu
Ruoyu Zhang
Zhaojin Wen
Kannan Ramchandran
Jiantao Jiao
328
70
0
30 Sep 2023
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
Víctor Gallego
SyDa
176
4
0
11 Aug 2023
Preference Ranking Optimization for Human Alignment
Preference Ranking Optimization for Human AlignmentAAAI Conference on Artificial Intelligence (AAAI), 2023
Feifan Song
Yu Bowen
Minghao Li
Haiyang Yu
Fei Huang
Yongbin Li
Houfeng Wang
ALM
168
324
0
30 Jun 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for
  Language Models
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
274
13
0
24 May 2023
1