Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.09390
Cited By
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
14 December 2023
Collin Burns
Pavel Izmailov
Jan Hendrik Kirchner
Bowen Baker
Leo Gao
Leopold Aschenbrenner
Yining Chen
Adrien Ecoffet
Manas Joglekar
Jan Leike
Ilya Sutskever
Jeff Wu
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision"
50 / 56 papers shown
Title
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Xueyao Zhang
Y. Wang
Chaoren Wang
Z. Li
Zhuo Chen
Zhizheng Wu
46
0
0
07 May 2025
Scaling Laws For Scalable Oversight
Joshua Engels
David D. Baek
Subhash Kantamneni
Max Tegmark
ELM
70
0
0
25 Apr 2025
Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society
Feifei Zhao
Y. Wang
Enmeng Lu
Dongcheng Zhao
Bing Han
...
Chao Liu
Yaodong Yang
Yi Zeng
Boyuan Chen
Jinyu Fan
80
0
0
24 Apr 2025
The Rise of Small Language Models in Healthcare: A Comprehensive Survey
Muskan Garg
Shaina Raza
Shebuti Rayana
Xingyi Liu
Sunghwan Sohn
LM&MA
AILaw
87
0
0
23 Apr 2025
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
Jian-Yu Guan
J. Wu
J. Li
Chuanqi Cheng
Wei Yu Wu
LM&MA
69
0
0
21 Mar 2025
Is Your Video Language Model a Reliable Judge?
M. Liu
Wensheng Zhang
56
1
0
07 Mar 2025
PEO: Improving Bi-Factorial Preference Alignment with Post-Training Policy Extrapolation
Yuxuan Liu
35
0
0
03 Mar 2025
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
Andreas Opedal
Haruki Shirakami
Bernhard Schölkopf
Abulhair Saparov
Mrinmaya Sachan
LRM
54
1
0
17 Feb 2025
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao
Geyang Guo
Xingxing Zhang
Nancy F. Chen
Shafiq R. Joty
Furu Wei
LRM
99
8
0
17 Feb 2025
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Nayoung Lee
Ziyang Cai
Avi Schwarzschild
Kangwook Lee
Dimitris Papailiopoulos
ReLM
VLM
LRM
AI4CE
70
4
0
03 Feb 2025
Chain-of-Retrieval Augmented Generation
Liang Wang
Haonan Chen
Nan Yang
Xiaolong Huang
Zhicheng Dou
Furu Wei
RALM
LRM
ReLM
3DV
81
6
0
24 Jan 2025
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Weilong Dong
Xinwei Wu
Renren Jin
Shaoyang Xu
Deyi Xiong
49
6
0
31 Dec 2024
Neural Interactive Proofs
Lewis Hammond
Sam Adam-Day
AAML
84
2
0
12 Dec 2024
Weak-to-Strong Generalization Through the Data-Centric Lens
Changho Shin
John Cooper
Frederic Sala
75
5
0
05 Dec 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
71
0
0
12 Nov 2024
Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks?
Xuan He
Da Yin
Nanyun Peng
LRM
33
0
0
27 Oct 2024
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu
Zhiwei He
Xiaofeng Wang
Pengfei Liu
Rui Wang
OSLM
47
3
0
24 Oct 2024
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
M. E. Ildiz
Halil Alperen Gozeten
Ege Onur Taga
Marco Mondelli
Samet Oymak
46
2
0
24 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
60
5
0
22 Oct 2024
Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety, Toxicity, and Legal Reasoning
Ruimeng Ye
Yang Xiao
Bo Hui
ALM
ELM
OffRL
27
2
0
16 Oct 2024
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Yuancheng Xu
Udari Madhushani Sehwag
Alec Koppel
Sicheng Zhu
Bang An
Furong Huang
Sumitra Ganesh
52
5
0
10 Oct 2024
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
Yuxin Xiao
Shujian Zhang
Wenxuan Zhou
Marzyeh Ghassemi
Sanqiang Zhao
47
0
0
07 Oct 2024
Provable Weak-to-Strong Generalization via Benign Overfitting
David X. Wu
A. Sahai
55
6
0
06 Oct 2024
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
Danqing Wang
Jianxin Ma
Fei Fang
Lei Li
LLMAG
LRM
59
0
0
02 Oct 2024
Your Weak LLM is Secretly a Strong Teacher for Alignment
Leitian Tao
Yixuan Li
84
5
0
13 Sep 2024
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Buhua Liu
Shitong Shao
Bao Li
Lichen Bai
Zhiqiang Xu
Haoyi Xiong
James Kwok
Sumi Helal
Zeke Xie
37
11
0
11 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
56
23
0
10 Sep 2024
Agentic Society: Merging skeleton from real world and texture from Large Language Model
Yuqi Bai
Kun Sun
Huishi Yin
26
1
0
02 Sep 2024
ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model
Lifan Jiang
Zhihui Wang
Siqi Yin
Guangxiao Ma
Peng Zhang
Boxi Wu
DiffM
51
0
0
28 Aug 2024
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Murun Yang
...
Chunliang Zhang
Tongran Liu
Quan Du
Di Yang
Jingbo Zhu
VLM
59
4
0
22 Aug 2024
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Tianhao Wu
Weizhe Yuan
O. Yu. Golovneva
Jing Xu
Yuandong Tian
Jiantao Jiao
Jason Weston
Sainbayar Sukhbaatar
ALM
KELM
LRM
44
71
0
28 Jul 2024
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
25
0
0
30 Jun 2024
Cascade Reward Sampling for Efficient Decoding-Time Alignment
Bolian Li
Yifan Wang
A. Grama
Ruqi Zhang
Ruqi Zhang
AI4TS
47
8
0
24 Jun 2024
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Wenkai Yang
Shiqi Shen
Guangyao Shen
Zhi Gong
Yankai Lin
Zhi Gong
Yankai Lin
Ji-Rong Wen
47
13
0
17 Jun 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
29
5
0
15 Jun 2024
Stress-Testing Capability Elicitation With Password-Locked Models
Ryan Greenblatt
Fabien Roger
Dmitrii Krasheninnikov
David M. Krueger
30
12
0
29 May 2024
Trans-LoRA
\textit{Trans-LoRA}
Trans-LoRA
: towards data-free Transferable Parameter Efficient Finetuning
Runqian Wang
Soumya Ghosh
David D. Cox
Diego Antognini
Aude Oliva
Rogerio Feris
Leonid Karlinsky
30
1
0
27 May 2024
Theoretical Analysis of Weak-to-Strong Generalization
Hunter Lang
David Sontag
Aravindan Vijayaraghavan
19
19
0
25 May 2024
Bayesian WeakS-to-Strong from Text Classification to Generation
Ziyun Cui
Ziyang Zhang
Wen Wu
Wen Wu
Chao Zhang
31
1
0
24 May 2024
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
46
21
0
22 Apr 2024
Self-playing Adversarial Language Game Enhances LLM Reasoning
Pengyu Cheng
Tianhao Hu
Han Xu
Zhisong Zhang
Yong Dai
Lei Han
Nan Du
Nan Du
Xiaolong Li
SyDa
LRM
ReLM
84
28
0
16 Apr 2024
Towards Unified Alignment Between Agents, Humans, and Environment
Zonghan Yang
An Liu
Zijun Liu
Kai Liu
Fangzhou Xiong
...
Zhenhe Zhang
Fuwen Luo
Zhicheng Guo
Peng Li
Yang Liu
24
4
0
12 Feb 2024
Scheming AIs: Will AIs fake alignment during training in order to get power?
Joe Carlsmith
53
30
0
14 Nov 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
34
3
0
08 Aug 2023
Exploring The Landscape of Distributional Robustness for Question Answering Models
Anas Awadalla
Mitchell Wortsman
Gabriel Ilharco
Sewon Min
Ian H. Magnusson
Hannaneh Hajishirzi
Ludwig Schmidt
ELM
OOD
KELM
68
19
0
22 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
495
0
28 Sep 2022
An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation
Ziquan Liu
Yi Tian Xu
Yuanhong Xu
Qi Qian
Hao Li
Rong Jin
Xiangyang Ji
Antoni B. Chan
OOD
27
14
0
25 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Diversify and Disambiguate: Learning From Underspecified Data
Yoonho Lee
Huaxiu Yao
Chelsea Finn
203
64
0
07 Feb 2022
Editing a classifier by rewriting its prediction rules
Shibani Santurkar
Dimitris Tsipras
Mahalaxmi Elango
David Bau
Antonio Torralba
A. Madry
KELM
164
89
0
02 Dec 2021
1
2
Next