Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

5 December 2017

David Silver

Papers citing "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"

50 / 839 papers shown

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

...

OffRL AI4TS LRM ReLM VLM

1.2K

5,342

22 Jan 2025

HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation

Hazem Taha

Ameer M. S. Abdelhadi

186

22 Jan 2025

Revisiting Rogers' Paradox in the Context of Human-AI Interaction

Katherine M. Collins

Umang Bhatt

Ilia Sucholutsky

378

16 Jan 2025

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

352

244

08 Jan 2025

Heterogeneous Multi-agent Zero-Shot Coordination by CoevolutionIEEE Transactions on Evolutionary Computation (TEVC), 2022

555

03 Jan 2025

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

928

570

03 Jan 2025

ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze

515

03 Jan 2025

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs BetterNeural Information Processing Systems (NeurIPS), 2024

772

03 Jan 2025

Predicting Chess Puzzle Difficulty with TransformersBigData Congress [Services Society] (BSS), 2024

Szymon Miłosz

Paweł Kapusta

159

31 Dec 2024

Training Software Engineering Agents and Verifiers with SWE-Gym

408

30 Dec 2024

Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning

327

23 Dec 2024

Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex ReasoningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

275

20 Dec 2024

Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Junyi Li

Hwee Tou Ng

LRM

412

19 Dec 2024

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

258

17 Dec 2024

Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps

Linfeng Zhao

Lawson L. S. Wong

341

16 Dec 2024

Monte Carlo Tree Search based Space Transfer for Black-box OptimizationNeural Information Processing Systems (NeurIPS), 2024

291

10 Dec 2024

Learning World Models for Unconstrained Goal NavigationNeural Information Processing Systems (NeurIPS), 2024

Yuanlin Duan

Wensen Mao

He Zhu

231

03 Nov 2024

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision TransformersNeural Information Processing Systems (NeurIPS), 2024

242

31 Oct 2024

Enhancing Chess Reinforcement Learning with Graph RepresentationNeural Information Processing Systems (NeurIPS), 2024

Tomas Rigaux

H. Kashima

GNN

164

31 Oct 2024

LLM Tree Search

Dylan Wilson

114

24 Oct 2024

NodeOP: Optimizing Node Management for Decentralized Networks

22 Oct 2024

SoK: Dataset Copyright Auditing in Machine Learning SystemsIEEE Symposium on Security and Privacy (S&P), 2024

406

22 Oct 2024

Memory-Efficient Large Language Models for Program Repair with Semantic-Guided Patch Generation

Thanh Le-Cong

Bach Le

Toby Murray

187

22 Oct 2024

Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search

Jiamian Li

184

15 Oct 2024

Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning

427

15 Oct 2024

Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning

Bokai Hu

Sai Ashish Somayajula

Xin Pan

Zihan Huang

OffRL

395

14 Oct 2024

Gap-Dependent Bounds for Q-Learning using Reference-Advantage DecompositionInternational Conference on Learning Representations (ICLR), 2024

382

10 Oct 2024

MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders

Cheng-rong Li

May Fung

Qingyun Wang

Chi Han

Pengfei Yu

Jindong Wang

Heng Ji

AI4MH

887

09 Oct 2024

O1 Replication Journey: A Strategic Progress Report -- Part 1

...

330

137

08 Oct 2024

Human-aligned Chess with a Bit of Search

Yiming Zhang

132

04 Oct 2024

Learning to Better Search with Language Models via Guided Reinforced Self-Training

275

03 Oct 2024

Interpretable Contrastive Monte Carlo Tree Search Reasoning

Aiwei Liu

Xuming Hu

Lijie Wen

LRM

469

02 Oct 2024

Maia-2: A Unified Model for Human-AI Alignment in ChessNeural Information Processing Systems (NeurIPS), 2024

Siddhartha Sen

Ashton Anderson

160

30 Sep 2024

Gaze-informed Signatures of Trust and Collaboration in Human-Autonomy TeamsComputers in Human Behavior (CHB), 2024

Anthony J. Ries

Stéphane Aroca-Ouellette

Alessandro Roncone

Ewart J. de Visser

123

27 Sep 2024

Refutation of Spectral Graph Theory Conjectures with Search Algorithms)

Milo Roucairol

Tristan Cazenave

27 Sep 2024

Towards Efficient Neuro-Symbolic AI: From Workload Characterization to Hardware Architecture

...

Yingyan Celine Lin

Mohamed Ibrahim

Jan M. Rabaey

Tushar Krishna

A. Raychowdhury

330

20 Sep 2024

A Case Study of Web App Coding with OpenAI Reasoning Models

Yi Cui

ELM VLM LRM

150

19 Sep 2024

Planning Transformer: Long-Horizon Offline Reinforcement Learning with Planning Tokens

Joseph Clinton

Robert Lieck

OffRL

199

14 Sep 2024

State and Action Factorization in Power Grids

Gianvito Losapio

Davide Beretta

Marco Mussi

Alberto Maria Metelli

Marcello Restelli

147

03 Sep 2024

Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL

199

27 Aug 2024

Localized Observation Abstraction Using Piecewise Linear Spatial Decay for Reinforcement Learning in Combat Simulations

Scotty Black

Christian J. Darken

107

23 Aug 2024

Enhancing Reinforcement Learning Through Guided SearchEuropean Conference on Artificial Intelligence (ECAI), 2024

328

19 Aug 2024

ShortCircuit: AlphaZero-Driven Circuit Design

Lei Chen

Haitham Bou-Ammar

218

19 Aug 2024

Perfect Information Monte Carlo with Postponing Reasoning

Jérôme Arjonilla

Abdallah Saffidine

Tristan Cazenave

149

05 Aug 2024

A Value Function Space Approach for Hierarchical Planning with Signal Temporal Logic TasksIEEE Control Systems Letters (L-CSS), 2024

277

04 Aug 2024

TASI Lectures on Physics for Machine Learning

Jim Halverson

262

31 Jul 2024

Reinforcement Learning for Sustainable Energy: A Survey

227

26 Jul 2024

Learning to Play Foosball: System and Baselines

174

23 Jul 2024

AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding

Chang Lei

Huan Lei

146

14 Jul 2024

Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay

Gonçalo Hora de Carvalho

Oscar Knap

R. Pollice

ReLM ELM LRM

414

12 Jul 2024