Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.16489
Cited By
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?
25 November 2024
Zhen Huang
Haoyang Zou
Xuefeng Li
Yixiu Liu
Yuxiang Zheng
Ethan Chern
Shijie Xia
Yiwei Qin
Weizhe Yuan
Pengfei Liu
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?"
17 / 17 papers shown
Title
T
2
^2
2
: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering
Zhengyi Zhao
Shubo Zhang
Zezhong Wang
Huimin Wang
Yutian Zhao
Bin Liang
Yefeng Zheng
Binyang Li
Kam-Fai Wong
X. Wu
LRM
58
0
0
23 May 2025
The Hallucination Tax of Reinforcement Finetuning
Linxin Song
Taiwei Shi
Jieyu Zhao
HILM
LRM
65
0
0
20 May 2025
SlangDIT: Benchmarking LLMs in Interpretative Slang Translation
Yunlong Liang
Fandong Meng
Jiaan Wang
Jie Zhou
44
0
0
20 May 2025
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models
Yakun Zhu
Zhongzhen Huang
Linjie Mu
Yutong Huang
Wei Nie
Jiaji Liu
Shaoting Zhang
Pengfei Liu
Xiaofan Zhang
LM&MA
ELM
LRM
75
0
0
20 May 2025
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
Safal Shrestha
Minwu Kim
Aadim Nepal
Anubhav Shrestha
Keith Ross
OffRL
ReLM
LRM
39
0
0
19 May 2025
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models
Bin Yu
Hang Yuan
Haotian Li
X. Xu
Yuliang Wei
Bailing Wang
Weizhen Qi
Kai Chen
LRM
65
2
0
06 May 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
Xuzhao Li
Kwan-Yee K. Wong
LLMAG
ReLM
LRM
121
2
0
27 Apr 2025
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter
Shrimai Prabhumoye
Matvei Novikov
Seungju Han
Ying Lin
...
Eric Nyberg
Yejin Choi
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ReLM
OffRL
LRM
383
2
1
15 Apr 2025
MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs
Juncheng Wu
Wenlong Deng
Xiaochen Li
Sheng Liu
Taomian Mi
...
Yihan Cao
Hui Ren
Xuzhao Li
Xiaoxiao Li
Yuyin Zhou
AI4MH
LRM
96
8
0
01 Apr 2025
ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos
Haolin Yang
Feilong Tang
Ming Hu
Yulong Li
Junjie Guo
...
Zelin Peng
Junjun He
Junjun He
Zongyuan Ge
Imran Razzak
DiffM
VGen
193
2
0
20 Mar 2025
Typhoon T1: An Open Thai Reasoning Model
Pittawat Taveekitworachai
Potsawee Manakul
Kasima Tharnpipitchai
Kunat Pipatanakul
OffRL
LRM
151
0
0
13 Feb 2025
Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning
Libo Wang
LRM
342
1
0
07 Feb 2025
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell
Jaehoon Lee
Kelvin Xu
Aviral Kumar
LRM
124
576
0
06 Aug 2024
RARR: Researching and Revising What Language Models Say, Using Language Models
Luyu Gao
Zhuyun Dai
Panupong Pasupat
Anthony Chen
Arun Tejasvi Chaganty
...
Vincent Zhao
Ni Lao
Hongrae Lee
Da-Cheng Juan
Kelvin Guu
HILM
KELM
82
257
0
17 Oct 2022
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
140
1,893
0
16 Aug 2021
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
546
41,106
0
28 May 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
434
1,664
0
18 Sep 2019
1