ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.02832
  4. Cited By
AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
v1v2v3 (latest)

AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation

Annual Meeting of the Association for Computational Linguistics (ACL), 2025
4 March 2025
Songming Zhang
Xue Zhang
Tong Zhang
Bojie Hu
Yufeng Chen
Jinan Xu
ArXiv (abs)PDFHTML

Papers citing "AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation"

6 / 6 papers shown
Title
Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement Learning
Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement Learning
Xue Zhang
Yunlong Liang
Fandong Meng
Songming Zhang
Kaiyu Huang
Yufeng Chen
Jinan Xu
Jie Zhou
LRM
91
0
0
08 Oct 2025
CM-Align: Consistency-based Multilingual Alignment for Large Language Models
CM-Align: Consistency-based Multilingual Alignment for Large Language Models
Xue Zhang
Yunlong Liang
Fandong Meng
Songming Zhang
Yufeng Chen
Jinan Xu
Jie Zhou
105
0
0
10 Sep 2025
A Dual-Space Framework for General Knowledge Distillation of Large Language Models
A Dual-Space Framework for General Knowledge Distillation of Large Language Models
Wei Wei
Songming Zhang
Yunlong Liang
Fandong Meng
Yufeng Chen
Jinan Xu
Jie Zhou
335
0
0
15 Apr 2025
Latent Feature Mining for Predictive Model Enhancement with Large
  Language Models
Latent Feature Mining for Predictive Model Enhancement with Large Language Models
Bingxuan Li
Pengyi Shi
Amy Ward
225
24
0
06 Oct 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
569
96
0
29 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Abigail Z. Jacobs
Tatsunori Hashimoto
ALM
438
578
0
06 Apr 2024
1