ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.01752
  4. Cited By
On the Weaknesses of Reinforcement Learning for Neural Machine
  Translation

On the Weaknesses of Reinforcement Learning for Neural Machine Translation

3 July 2019
Leshem Choshen
Lior Fox
Zohar Aizenbud
Omri Abend
ArXivPDFHTML

Papers citing "On the Weaknesses of Reinforcement Learning for Neural Machine Translation"

50 / 75 papers shown
Title
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
Zhuocheng Gong
Jian Guan
Wei Wu
Huishuai Zhang
Dongyan Zhao
72
1
0
08 May 2025
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
41
1
0
05 May 2025
Deep Reasoning Translation via Reinforcement Learning
Deep Reasoning Translation via Reinforcement Learning
Jiaan Wang
Fandong Meng
Jie Zhou
OffRL
LRM
33
0
0
14 Apr 2025
UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality
UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality
Zelei Cheng
Xin-Qiang Cai
Yuting Tang
Pushi Zhang
Boming Yang
Masashi Sugiyama
Xinyu Xing
49
0
0
10 Mar 2025
Improving LLM Safety Alignment with Dual-Objective Optimization
Xuandong Zhao
Will Cai
Tianneng Shi
David Huang
Licong Lin
Song Mei
Dawn Song
AAML
MU
69
1
0
05 Mar 2025
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
Yuheng Zhang
Dian Yu
Tao Ge
Linfeng Song
Zhichen Zeng
Haitao Mi
Nan Jiang
Dong Yu
63
1
0
24 Feb 2025
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Shang Liu
Yu Pan
Guanting Chen
Xiaocheng Li
80
2
0
19 Nov 2024
Matryoshka: Learning to Drive Black-Box LLMs with LLMs
Matryoshka: Learning to Drive Black-Box LLMs with LLMs
Changhao Li
Yuchen Zhuang
Rushi Qiang
Haotian Sun
H. Dai
Chao Zhang
Bo Dai
LRM
26
4
0
28 Oct 2024
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Guanlin Liu
Kaixuan Ji
Ning Dai
Zheng Wu
Chen Dun
Q. Gu
Lin Yan
Quanquan Gu
Lin Yan
OffRL
LRM
65
10
0
11 Oct 2024
RRM: Robust Reward Model Training Mitigates Reward Hacking
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu
Wei Xiong
Jie Jessie Ren
Lichang Chen
Junru Wu
...
Yuan Liu
Bilal Piot
Abe Ittycheriah
Aviral Kumar
Mohammad Saleh
AAML
56
14
0
20 Sep 2024
From Lists to Emojis: How Format Bias Affects Model Alignment
From Lists to Emojis: How Format Bias Affects Model Alignment
Xuanchang Zhang
Wei Xiong
Lichang Chen
Dinesh Manocha
Heng Huang
Tong Zhang
ALM
35
11
0
18 Sep 2024
Segment-Based Interactive Machine Translation for Pre-trained Models
Segment-Based Interactive Machine Translation for Pre-trained Models
Ángel Navarro
Francisco Casacuberta
VLM
39
0
0
09 Jul 2024
Interpretable Preferences via Multi-Objective Reward Modeling and
  Mixture-of-Experts
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
Haoxiang Wang
Wei Xiong
Tengyang Xie
Han Zhao
Tong Zhang
54
142
0
18 Jun 2024
Style Transfer with Multi-iteration Preference Optimization
Style Transfer with Multi-iteration Preference Optimization
Shuai Liu
Jonathan May
45
4
0
17 Jun 2024
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan
Yibo Miao
J. Li
Yipin Zhang
Jian Xie
Zhijie Deng
Dong Yan
57
11
0
11 Jun 2024
RLHF Workflow: From Reward Modeling to Online RLHF
RLHF Workflow: From Reward Modeling to Online RLHF
Hanze Dong
Wei Xiong
Bo Pang
Haoxiang Wang
Han Zhao
Yingbo Zhou
Nan Jiang
Doyen Sahoo
Caiming Xiong
Tong Zhang
OffRL
29
99
0
13 May 2024
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine
  Translation
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation
Hao Wang
Tetsuro Morimura
Ukyo Honda
Daisuke Kawahara
21
0
0
02 May 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Li Zhao
Xinle Cheng
Jiang Bian
Di He
Jiang Bian
Liwei Wang
60
57
0
29 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from
  Human Feedback for LLMs
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
Ameet Deshpande
Bruno Castro da Silva
29
34
0
12 Apr 2024
Strengthening Multimodal Large Language Model with Bootstrapped
  Preference Optimization
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Renjie Pi
Tianyang Han
Wei Xiong
Jipeng Zhang
Runtao Liu
Rui Pan
Tong Zhang
MLLM
48
34
0
13 Mar 2024
Arithmetic Control of LLMs for Diverse User Preferences: Directional
  Preference Alignment with Multi-Objective Rewards
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Haoxiang Wang
Yong Lin
Wei Xiong
Rui Yang
Shizhe Diao
Shuang Qiu
Han Zhao
Tong Zhang
40
72
0
28 Feb 2024
Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP
  Guided Reinforcement Learning
Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning
Antoine Chaffin
Ewa Kijak
Vincent Claveau
47
0
0
21 Feb 2024
Dense Reward for Free in Reinforcement Learning from Human Feedback
Dense Reward for Free in Reinforcement Learning from Human Feedback
Alex J. Chan
Hao Sun
Samuel Holt
M. Schaar
26
32
0
01 Feb 2024
YODA: Teacher-Student Progressive Learning for Language Models
YODA: Teacher-Student Progressive Learning for Language Models
Jianqiao Lu
Wanjun Zhong
Yufei Wang
Zhijiang Guo
Qi Zhu
...
Baojun Wang
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
LRM
32
7
0
28 Jan 2024
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
38
164
0
18 Dec 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
18
7
0
31 Oct 2023
SuperHF: Supervised Iterative Learning from Human Feedback
SuperHF: Supervised Iterative Learning from Human Feedback
Gabriel Mukobi
Peter Chatain
Su Fong
Robert Windesheim
Gitta Kutyniok
Kush S. Bhatia
Silas Alberti
ALM
44
6
0
25 Oct 2023
Mitigating the Alignment Tax of RLHF
Mitigating the Alignment Tax of RLHF
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Yuan Yao
Tong Zhang
MoMe
CLL
29
69
0
12 Sep 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
53
10
0
28 Aug 2023
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence
  Generation
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation
Chenglong Wang
Hang Zhou
Yimin Hu
Yi Huo
Bei Li
Tongran Liu
Tong Xiao
Jingbo Zhu
27
8
0
04 Aug 2023
Reinforcement Learning for Generative AI: State of the Art,
  Opportunities and Open Research Challenges
Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges
Giorgio Franceschelli
Mirco Musolesi
AI4CE
40
20
0
31 Jul 2023
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large
  Foundation Models
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
Shizhe Diao
Rui Pan
Hanze Dong
Kashun Shum
Jipeng Zhang
Wei Xiong
Tong Zhang
ALM
22
63
0
21 Jun 2023
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs
  without Fine-tuning
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
Ximing Lu
Faeze Brahman
Peter West
Jaehun Jang
Khyathi Raghavi Chandu
...
Bill Yuchen Lin
Skyler Hallinan
Xiang Ren
Sean Welleck
Yejin Choi
28
26
0
24 May 2023
Optimizing Non-Autoregressive Transformers with Contrastive Learning
Optimizing Non-Autoregressive Transformers with Contrastive Learning
Chen An
Jiangtao Feng
Fei Huang
Xipeng Qiu
Lingpeng Kong
44
0
0
23 May 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
18
410
0
13 Apr 2023
Tailoring Language Generation Models under Total Variation Distance
Tailoring Language Generation Models under Total Variation Distance
Haozhe Ji
Pei Ke
Zhipeng Hu
Rongsheng Zhang
Minlie Huang
28
18
0
26 Feb 2023
Switching to Discriminative Image Captioning by Relieving a Bottleneck
  of Reinforcement Learning
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning
Ukyo Honda
Taro Watanabe
Yuji Matsumoto
16
9
0
06 Dec 2022
Soft Alignment Objectives for Robust Adaptation of Language Generation
Soft Alignment Objectives for Robust Adaptation of Language Generation
Michal vStefánik
Marek Kadlcík
Petr Sojka
30
2
0
29 Nov 2022
Reinforcement Learning with Large Action Spaces for Neural Machine
  Translation
Reinforcement Learning with Large Action Spaces for Neural Machine Translation
Asaf Yehudai
Leshem Choshen
Lior Fox
Omri Abend
23
6
0
06 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
31
240
0
03 Oct 2022
GROOT: Corrective Reward Optimization for Generative Sequential Labeling
GROOT: Corrective Reward Optimization for Generative Sequential Labeling
Kazuma Hashimoto
K. Raman
VLM
24
1
0
29 Sep 2022
Lagrangian Method for Q-Function Learning (with Applications to Machine
  Translation)
Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)
Bojun Huang
21
1
0
22 Jul 2022
MAD for Robust Reinforcement Learning in Machine Translation
MAD for Robust Reinforcement Learning in Machine Translation
Domenic Donato
Lei Yu
Wang Ling
Chris Dyer
MoE
14
7
0
18 Jul 2022
Joint Generator-Ranker Learning for Natural Language Generation
Joint Generator-Ranker Learning for Natural Language Generation
Weizhou Shen
Yeyun Gong
Yelong Shen
Song Wang
Xiaojun Quan
Nan Duan
Weizhu Chen
42
5
0
28 Jun 2022
Why is constrained neural language generation particularly challenging?
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
59
14
0
11 Jun 2022
Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov
  Decision Processes
Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes
Tetsuro Morimura
Kazuhiro Ota
Kenshi Abe
Peinan Zhang
OffRL
25
0
0
02 Jun 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning
  Language Models with no Catastrophic Forgetting
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
25
51
0
01 Jun 2022
RL with KL penalties is better viewed as Bayesian inference
RL with KL penalties is better viewed as Bayesian inference
Tomasz Korbak
Ethan Perez
Christopher L. Buckley
OffRL
38
73
0
23 May 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation
  Practices for Generated Text
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELM
AI4CE
69
184
0
14 Feb 2022
GAP-Gen: Guided Automatic Python Code Generation
GAP-Gen: Guided Automatic Python Code Generation
Junchen Zhao
Yurun Song
Junlin Wang
Ian G. Harris
16
6
0
19 Jan 2022
12
Next