Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1712.01815
Cited By
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
5 December 2017
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
A. Guez
Marc Lanctot
Laurent Sifre
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"
50 / 839 papers shown
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
OffRL
AI4TS
LRM
ReLM
VLM
1.2K
5,342
0
22 Jan 2025
HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation
Hazem Taha
Ameer M. S. Abdelhadi
186
1
0
22 Jan 2025
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
Katherine M. Collins
Umang Bhatt
Ilia Sucholutsky
378
2
0
16 Jan 2025
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan
Guang Dai
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRM
SyDa
ReLM
352
244
0
08 Jan 2025
Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution
IEEE Transactions on Evolutionary Computation (TEVC), 2022
Ke Xue
Yutong Wang
Cong Guan
Lei Yuan
Haobo Fu
Qiang Fu
Chao Qian
Yang Yu
555
22
0
03 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
928
570
0
03 Jan 2025
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
Chunyu Xuan
Yazhe Niu
Yuan Pu
Shuai Hu
Yu Liu
Jing Yang
515
1
0
03 Jan 2025
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Neural Information Processing Systems (NeurIPS), 2024
Scott Geng
Cheng-Yu Hsieh
Vivek Ramanujan
Matthew Wallingford
Chun-Liang Li
Pang Wei Koh
Ranjay Krishna
DiffM
772
15
0
03 Jan 2025
Predicting Chess Puzzle Difficulty with Transformers
BigData Congress [Services Society] (BSS), 2024
Szymon Miłosz
Paweł Kapusta
159
5
0
31 Dec 2024
Training Software Engineering Agents and Verifiers with SWE-Gym
Jiayi Pan
Xingyao Wang
Graham Neubig
Navdeep Jaitly
Chenhui Xu
Alane Suhr
Yizhe Zhang
408
36
0
30 Dec 2024
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning
Huchen Jiang
Yangyang Ma
Chaofan Ding
Kexin Luan
Xinhan Di
ReLM
LRM
327
2
0
23 Dec 2024
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Sungjin Park
Xiao Liu
Yeyun Gong
Edward Choi
LRM
275
26
0
20 Dec 2024
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Junyi Li
Hwee Tou Ng
LRM
412
4
0
19 Dec 2024
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Jinhao Jiang
Jiayi Chen
Junyi Li
Ruiyang Ren
Shijie Wang
Wayne Xin Zhao
Yang Song
Tao Zhang
LRM
258
31
0
17 Dec 2024
Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps
Linfeng Zhao
Lawson L. S. Wong
341
2
0
16 Dec 2024
Monte Carlo Tree Search based Space Transfer for Black-box Optimization
Neural Information Processing Systems (NeurIPS), 2024
Shukuan Wang
Ke Xue
Lei Song
Xiaobin Huang
Chao Qian
291
6
0
10 Dec 2024
Learning World Models for Unconstrained Goal Navigation
Neural Information Processing Systems (NeurIPS), 2024
Yuanlin Duan
Wensen Mao
He Zhu
231
5
0
03 Nov 2024
Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers
Neural Information Processing Systems (NeurIPS), 2024
Kai Yan
Alex Schwing
Yu-Xiong Wang
OffRL
OnRL
242
4
0
31 Oct 2024
Enhancing Chess Reinforcement Learning with Graph Representation
Neural Information Processing Systems (NeurIPS), 2024
Tomas Rigaux
H. Kashima
GNN
164
2
0
31 Oct 2024
LLM Tree Search
Dylan Wilson
114
2
0
24 Oct 2024
NodeOP: Optimizing Node Management for Decentralized Networks
Angela Tsang
Jiankai Sun
Boo Xie
Azeem Khan
Ender Lu
Fletcher Fan
Maggie Wu
Jing Tang
88
0
0
22 Oct 2024
SoK: Dataset Copyright Auditing in Machine Learning Systems
IEEE Symposium on Security and Privacy (S&P), 2024
L. Du
Xuanru Zhou
M. Chen
Chusong Zhang
Zhou Su
Peng Cheng
Jiming Chen
Zhikun Zhang
MLAU
406
15
0
22 Oct 2024
Memory-Efficient Large Language Models for Program Repair with Semantic-Guided Patch Generation
Thanh Le-Cong
Bach Le
Toby Murray
187
0
0
22 Oct 2024
Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search
Jiamian Li
184
0
0
15 Oct 2024
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Jiayu Chen
Wentse Chen
Shiyu Huang
Jeff Schneider
OffRL
427
8
0
15 Oct 2024
Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
Bokai Hu
Sai Ashish Somayajula
Xin Pan
Zihan Huang
OffRL
395
5
0
14 Oct 2024
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
International Conference on Learning Representations (ICLR), 2024
Zhong Zheng
Haochen Zhang
Lingzhou Xue
OffRL
382
9
0
10 Oct 2024
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Cheng-rong Li
May Fung
Qingyun Wang
Chi Han
Pengfei Yu
Jindong Wang
Heng Ji
AI4MH
887
1
0
09 Oct 2024
O1 Replication Journey: A Strategic Progress Report -- Part 1
Yiwei Qin
Xuefeng Li
Haoyang Zou
Yixiu Liu
Shijie Xia
...
Yixin Ye
Weizhe Yuan
Hector Liu
Rui Wang
Pengfei Liu
VLM
330
137
0
08 Oct 2024
Human-aligned Chess with a Bit of Search
Yiming Zhang
Athul Paul Jacob
Vivian Lai
Daniel Fried
Daphne Ippolito
132
4
0
04 Oct 2024
Learning to Better Search with Language Models via Guided Reinforced Self-Training
Seungyong Moon
Bumsoo Park
Hyun Oh Song
AIFin
RALM
275
4
0
03 Oct 2024
Interpretable Contrastive Monte Carlo Tree Search Reasoning
Zitian Gao
Boye Niu
Xuzheng He
Haotian Xu
Hongzhang Liu
Aiwei Liu
Xuming Hu
Lijie Wen
LRM
469
59
0
02 Oct 2024
Maia-2: A Unified Model for Human-AI Alignment in Chess
Neural Information Processing Systems (NeurIPS), 2024
Zhenwei Tang
Difan Jiao
Reid McIlroy-Young
Jon M. Kleinberg
Siddhartha Sen
Ashton Anderson
160
13
0
30 Sep 2024
Gaze-informed Signatures of Trust and Collaboration in Human-Autonomy Teams
Computers in Human Behavior (CHB), 2024
Anthony J. Ries
Stéphane Aroca-Ouellette
Alessandro Roncone
Ewart J. de Visser
123
4
0
27 Sep 2024
Refutation of Spectral Graph Theory Conjectures with Search Algorithms)
Milo Roucairol
Tristan Cazenave
91
4
0
27 Sep 2024
Towards Efficient Neuro-Symbolic AI: From Workload Characterization to Hardware Architecture
Zishen Wan
Che-Kai Liu
Hanchen Yang
Ritik Raj
Chaojian Li
...
Yingyan Celine Lin
Mohamed Ibrahim
Jan M. Rabaey
Tushar Krishna
A. Raychowdhury
330
18
0
20 Sep 2024
A Case Study of Web App Coding with OpenAI Reasoning Models
Yi Cui
ELM
VLM
LRM
150
0
0
19 Sep 2024
Planning Transformer: Long-Horizon Offline Reinforcement Learning with Planning Tokens
Joseph Clinton
Robert Lieck
OffRL
199
6
0
14 Sep 2024
State and Action Factorization in Power Grids
Gianvito Losapio
Davide Beretta
Marco Mussi
Alberto Maria Metelli
Marcello Restelli
147
2
0
03 Sep 2024
Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL
Jihwan Lee
Woochang Sim
Sejin Kim
Sundong Kim
OffRL
199
2
0
27 Aug 2024
Localized Observation Abstraction Using Piecewise Linear Spatial Decay for Reinforcement Learning in Combat Simulations
Scotty Black
Christian J. Darken
107
0
0
23 Aug 2024
Enhancing Reinforcement Learning Through Guided Search
European Conference on Artificial Intelligence (ECAI), 2024
Jérôme Arjonilla
Abdallah Saffidine
Tristan Cazenave
OffRL
328
0
0
19 Aug 2024
ShortCircuit: AlphaZero-Driven Circuit Design
Dimitrios Tsaras
Antoine Grosnit
Lei Chen
Zhiyao Xie
Haitham Bou-Ammar
Mingxuan Yuan
218
0
0
19 Aug 2024
Perfect Information Monte Carlo with Postponing Reasoning
Jérôme Arjonilla
Abdallah Saffidine
Tristan Cazenave
149
1
0
05 Aug 2024
A Value Function Space Approach for Hierarchical Planning with Signal Temporal Logic Tasks
IEEE Control Systems Letters (L-CSS), 2024
Peiran Liu
Yiting He
Yihao Qin
Hang Zhou
Yiding Ji
OffRL
277
0
0
04 Aug 2024
TASI Lectures on Physics for Machine Learning
Jim Halverson
262
5
0
31 Jul 2024
Reinforcement Learning for Sustainable Energy: A Survey
Koen Ponse
Felix Kleuker
Márton Fejér
Álvaro Serra-Gómez
Aske Plaat
Thomas M. Moerland
OffRL
AI4CE
227
7
0
26 Jul 2024
Learning to Play Foosball: System and Baselines
Janosch Moos
Cedric Derstroff
Niklas Schröder
Debora Clever
174
1
0
23 Jul 2024
AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding
Chang Lei
Huan Lei
146
0
0
14 Jul 2024
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Gonçalo Hora de Carvalho
Oscar Knap
R. Pollice
ReLM
ELM
LRM
414
1
0
12 Jul 2024
Previous
1
2
3
4
5
...
15
16
17
Next