Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

5 December 2017

David Silver

Papers citing "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"

50 / 839 papers shown

ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution

...

376

12 May 2025

Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs

Zijian An

Lifeng Zhou

240

08 May 2025

HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking

621

05 May 2025

Program Semantic Inequivalence Game with Large Language Models

Antonio Valerio Miceli-Barone

Vaishak Belle

Ali Payani

LRM

293

02 May 2025

Motion Generation for Food Topping Challenge 2024: Serving Salmon Roe Bowl and Picking Fried Chicken

534

28 Apr 2025

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

404

27 Apr 2025

Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning

399

24 Apr 2025

An Extended Horizon Tactical Decision-Making for Automated Driving Based on Monte Carlo Tree Search

Karim Essalmi

Fernando Garrido

F. Nashashibi

146

22 Apr 2025

Improving Human-AI Coordination through Online Adversarial Training and Generative Models

464

21 Apr 2025

SwitchMT: An Adaptive Context Switching Methodology for Scalable Multi-Task Learning in Intelligent Autonomous Agents

Avaneesh Devkota

Rachmad Vidya Wicaksana Putra

Mohamed Bennai

223

18 Apr 2025

ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition

209

17 Apr 2025

pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the WildScandinavian Conference on Image Analysis (SCIA), 2025

Jonas Myhre Schiøtt

Viktor Sebastian Petersen

Dimitrios P. Papadopoulos

VLM

205

16 Apr 2025

Reasoning without Regret

Tarun Chitra

OffRL LRM

231

14 Apr 2025

A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future

347

12 Apr 2025

AssistanceZero: Scalably Solving Assistance Games

352

09 Apr 2025

An Efficient Approach for Cooperative Multi-Agent Learning ProblemsIEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2024

Ángel Aso-Mollar

Eva Onaindia

168

07 Apr 2025

Solving Sokoban using Hierarchical Reinforcement Learning with Landmarks

Sergey Pastukhov

226

06 Apr 2025

Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning

Abdullah Vanlioglu

330

28 Mar 2025

Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control

Eloy Anguiano Batanero

Ángela Fernández

Álvaro Barbero

226

26 Mar 2025

Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic ProgrammingIntegration of AI and OR Techniques in Constraint Programming (CPAIOR), 2025

Minori Narita

Ryo Kuroiwa

J. Christopher Beck

287

20 Mar 2025

DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree TraversalAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

268

18 Mar 2025

Rapfi: Distilling Efficient Neural Network for the Game of Gomoku

Zhanggen Jin

Haobin Duan

Zhiyang Hang

213

17 Mar 2025

Deep Learning Agents Trained For Avoidance Behave Like Hawks And Doves

Aryaman Reddi

188

14 Mar 2025

Reinforcement Learning and Life Cycle Assessment for a Circular Economy -- Towards Progressive Computer Science

Johannes Buchner

106

13 Mar 2025

The Lagrangian Method for Solving Constrained Markov Games

324

13 Mar 2025

AI-driven control of bioelectric signalling for real-time topological reorganization of cells

Gonçalo Hora de Carvalho

AI4CE

384

10 Mar 2025

Automatic Curriculum Design for Zero-Shot Human-AI CoordinationIEEE Access (IEEE Access), 2025

444

10 Mar 2025

Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity

304

08 Mar 2025

PokéChamp: an Expert-level Minimax Language Agent

248

06 Mar 2025

Language Models can Self-Improve at State-Value Estimation for Better Search

Ethan Mendes

Alan Ritter

LRM

439

04 Mar 2025

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

383

03 Mar 2025

Scalable Decision-Making in Stochastic Environments through Learned Temporal AbstractionInternational Conference on Learning Representations (ICLR), 2025

333

28 Feb 2025

Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids

381

27 Feb 2025

Implicit Search via Discrete Diffusion: A Study on ChessInternational Conference on Learning Representations (ICLR), 2025

280

27 Feb 2025

General Intelligence Requires Reward-based Pretraining

810

26 Feb 2025

ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies

383

25 Feb 2025

Streaming Looking Ahead with Token-level Self-reward

Han Zhang

Ruixin Hong

Dong Yu

226

24 Feb 2025

Scaling Autonomous Agents via Automatic Reward Modeling And PlanningInternational Conference on Learning Representations (ICLR), 2025

328

17 Feb 2025

Two-Player Zero-Sum Differential Games with One-Sided Information

395

17 Feb 2025

Learning a Diffusion Model Policy from Rewards via Q-Score MatchingInternational Conference on Machine Learning (ICML), 2023

460

17 Feb 2025

A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1

Jun Wang

LRM KELM

276

15 Feb 2025

We Can't Understand AI Using our Existing Vocabulary

John Hewitt

Robert Geirhos

Been Kim

320

11 Feb 2025

LLMs Can Teach Themselves to Better Predict the Future

437

07 Feb 2025

Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks

498

06 Feb 2025

Policy Guided Tree Search for Enhanced LLM Reasoning

Yang Li

LRM

446

04 Feb 2025

Synthesis of Model Predictive Control and Reinforcement Learning: Survey and Classification

320

04 Feb 2025

Develop AI Agents for System Engineering in Factorio

Neel Kant

259

03 Feb 2025

COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models

Tobias Materzok

LRM

328

28 Jan 2025

Optimizing Automatic Differentiation with Deep Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024

Jamie Lohoff

Emre Neftci

454

28 Jan 2025

CodeMonkeys: Scaling Test-Time Compute for Software Engineering

312

24 Jan 2025