ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.18640
  4. Cited By
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model

Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model

24 October 2024
Wenhong Zhu
Zhiwei He
Xiaofeng Wang
Pengfei Liu
Rui Wang
    OSLM
ArXivPDFHTML

Papers citing "Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model"

3 / 3 papers shown
Title
Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society
Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society
Feifei Zhao
Y. Wang
Enmeng Lu
Dongcheng Zhao
Bing Han
...
Chao Liu
Yaodong Yang
Yi Zeng
Boyuan Chen
Jinyu Fan
80
0
0
24 Apr 2025
Adding Alignment Control to Language Models
Wenhong Zhu
Weinan Zhang
Rui Wang
45
0
0
06 Mar 2025
Understanding the Capabilities and Limitations of Weak-to-Strong Generalization
Understanding the Capabilities and Limitations of Weak-to-Strong Generalization
Wei Yao
Wenkai Yang
Z. Wang
Yankai Lin
Yong Liu
ELM
83
1
0
03 Feb 2025
1