Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1712.01815
Cited By
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
5 December 2017
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
A. Guez
Marc Lanctot
Laurent Sifre
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"
50 / 839 papers shown
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
Xiaolin Huang
Weiwen Liu
Xingshan Zeng
Yanhua Huang
Xinlong Hao
...
Yirong Zeng
Chuhan Wu
Yun Wang
Ruiming Tang
Defu Lian
KELM
376
2
0
12 May 2025
Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs
Zijian An
Lifeng Zhou
240
0
0
08 May 2025
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
Runquan Gui
Liang Luo
Jun Wang
Chi Ma
Huiling Zhen
Mingxuan Yuan
Jianye Hao
Defu Lian
Tong Xu
Feng Wu
LRM
621
11
0
05 May 2025
Program Semantic Inequivalence Game with Large Language Models
Antonio Valerio Miceli-Barone
Vaishak Belle
Ali Payani
LRM
293
0
0
02 May 2025
Motion Generation for Food Topping Challenge 2024: Serving Salmon Roe Bowl and Picking Fried Chicken
Koki Inami
Masashi Konosu
Koki Yamane
Nozomu Masuya
Yunhan Li
Yu-Han Shu
Hiroshi Sato
Shinnosuke Homma
S. Sakaino
534
0
0
28 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
Xuzhao Li
Kwan-Yee K. Wong
LLMAG
ReLM
LRM
404
21
0
27 Apr 2025
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning
Lynn Cherif
Flemming Kondrup
David Venuto
Ankit Anand
Doina Precup
Khimya Khetarpal
LM&Ro
399
0
0
24 Apr 2025
An Extended Horizon Tactical Decision-Making for Automated Driving Based on Monte Carlo Tree Search
Karim Essalmi
Fernando Garrido
F. Nashashibi
146
1
0
22 Apr 2025
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
Paresh Chaudhary
Yancheng Liang
Daphne Chen
S. Du
Natasha Jaques
464
2
0
21 Apr 2025
SwitchMT: An Adaptive Context Switching Methodology for Scalable Multi-Task Learning in Intelligent Autonomous Agents
Avaneesh Devkota
Rachmad Vidya Wicaksana Putra
Mohamed Bennai
223
0
0
18 Apr 2025
ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
Haidar Khan
H. A. Alyahya
Yazeed Alnumay
M Saiful Bari
B. Yener
ELM
LRM
209
1
0
17 Apr 2025
pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild
Scandinavian Conference on Image Analysis (SCIA), 2025
Jonas Myhre Schiøtt
Viktor Sebastian Petersen
Dimitrios P. Papadopoulos
VLM
205
0
0
16 Apr 2025
Reasoning without Regret
Tarun Chitra
OffRL
LRM
231
0
0
14 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
347
28
0
12 Apr 2025
AssistanceZero: Scalably Solving Assistance Games
Cassidy Laidlaw
Eli Bronstein
Timothy Guo
Dylan Feng
Lukas Berglund
Justin Svegliato
Stuart J. Russell
Anca Dragan
352
4
0
09 Apr 2025
An Efficient Approach for Cooperative Multi-Agent Learning Problems
IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2024
Ángel Aso-Mollar
Eva Onaindia
168
0
0
07 Apr 2025
Solving Sokoban using Hierarchical Reinforcement Learning with Landmarks
Sergey Pastukhov
226
0
0
06 Apr 2025
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
Abdullah Vanlioglu
330
9
0
28 Mar 2025
Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control
Eloy Anguiano Batanero
Ángela Fernández
Álvaro Barbero
226
1
0
26 Mar 2025
Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming
Integration of AI and OR Techniques in Constraint Programming (CPAIOR), 2025
Minori Narita
Ryo Kuroiwa
J. Christopher Beck
287
2
0
20 Mar 2025
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Vaibhav Aggarwal
Ojasv Kamal
Abhinav Japesh
Zhijing Jin
Bernhard Schölkopf
268
12
0
18 Mar 2025
Rapfi: Distilling Efficient Neural Network for the Game of Gomoku
Zhanggen Jin
Haobin Duan
Zhiyang Hang
213
0
0
17 Mar 2025
Deep Learning Agents Trained For Avoidance Behave Like Hawks And Doves
Aryaman Reddi
188
0
0
14 Mar 2025
Reinforcement Learning and Life Cycle Assessment for a Circular Economy -- Towards Progressive Computer Science
Johannes Buchner
106
1
0
13 Mar 2025
The Lagrangian Method for Solving Constrained Markov Games
Soham Das
Santiago Paternain
Luiz F. O. Chamon
Ceyhun Eksin
324
0
0
13 Mar 2025
AI-driven control of bioelectric signalling for real-time topological reorganization of cells
Gonçalo Hora de Carvalho
AI4CE
384
2
0
10 Mar 2025
Automatic Curriculum Design for Zero-Shot Human-AI Coordination
IEEE Access (IEEE Access), 2025
Won-Sang You
Tae-Gwan Ha
Seo-Young Lee
Kyung-Joong Kim
444
0
0
10 Mar 2025
Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity
HyunJin Kim
Xiaoyuan Yi
Jing Yao
Muhua Huang
Jinyeong Bak
James Evans
Xing Xie
304
0
0
08 Mar 2025
PokéChamp: an Expert-level Minimax Language Agent
Seth Karten
Andy Luu Nguyen
Chi Jin
AI4MH
LLMAG
ELM
248
6
0
06 Mar 2025
Language Models can Self-Improve at State-Value Estimation for Better Search
Ethan Mendes
Alan Ritter
LRM
439
4
0
04 Mar 2025
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Kunlun Zhu
Hongyi Du
Zhaochen Hong
Xiaocheng Yang
Shuyi Guo
...
Zhenhailong Wang
Cheng Qian
Xiangru Tang
Heng Ji
Jiaxuan You
LLMAG
383
51
0
03 Mar 2025
Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction
International Conference on Learning Representations (ICLR), 2025
Baiting Luo
Ava Pettet
Aron Laszka
A. Dubey
Ayan Mukhopadhyay
OffRL
333
2
0
28 Feb 2025
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
Toru Lin
Kartik Sachdev
Linxi Fan
Jitendra Malik
Yuke Zhu
381
46
0
27 Feb 2025
Implicit Search via Discrete Diffusion: A Study on Chess
International Conference on Learning Representations (ICLR), 2025
Jiacheng Ye
Zhenyu Wu
Lei Li
Zhiyong Wu
Xin Jiang
Zhiyu Li
Dianbo Sui
DiffM
280
13
0
27 Feb 2025
General Intelligence Requires Reward-based Pretraining
Seungwook Han
Jyothish Pari
Samuel J. Gershman
Pulkit Agrawal
LRM
810
2
0
26 Feb 2025
ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies
Pedro Sequeira
Vidyasagar Sadhu
Melinda Gervasio
DiffM
383
0
0
25 Feb 2025
Streaming Looking Ahead with Token-level Self-reward
Han Zhang
Ruixin Hong
Dong Yu
226
2
0
24 Feb 2025
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
International Conference on Learning Representations (ICLR), 2025
Zhenfang Chen
Delin Chen
Rui Sun
Wenjun Liu
Chuang Gan
LLMAG
328
12
0
17 Feb 2025
Two-Player Zero-Sum Differential Games with One-Sided Information
Mukesh Ghimire
Z. Xu
Yi Ren
SyDa
395
0
0
17 Feb 2025
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
International Conference on Machine Learning (ICML), 2023
Michael Psenka
Alejandro Escontrela
Pieter Abbeel
Yi-An Ma
DiffM
460
55
0
17 Feb 2025
A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1
Jun Wang
LRM
KELM
276
17
0
15 Feb 2025
We Can't Understand AI Using our Existing Vocabulary
John Hewitt
Robert Geirhos
Been Kim
320
14
0
11 Feb 2025
LLMs Can Teach Themselves to Better Predict the Future
Benjamin Turtel
Danny Franklin
Philipp Schoenegger
LRM
437
4
0
07 Feb 2025
Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks
Niccolò Grillo
Andrea Toccaceli
Joël Mathys
Benjamin Estermann
Stefania Fresca
Roger Wattenhofer
AI4CE
LRM
498
0
0
06 Feb 2025
Policy Guided Tree Search for Enhanced LLM Reasoning
Yang Li
LRM
446
2
0
04 Feb 2025
Synthesis of Model Predictive Control and Reinforcement Learning: Survey and Classification
Rudolf Reiter
Jasper Hoffmann
D. Reinhardt
Florian Messerer
Katrin Baumgärtner
Shamburaj Sawant
Joschka Boedecker
Moritz Diehl
S. Gros
320
20
0
04 Feb 2025
Develop AI Agents for System Engineering in Factorio
Neel Kant
259
1
0
03 Feb 2025
COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models
Tobias Materzok
LRM
328
1
0
28 Jan 2025
Optimizing Automatic Differentiation with Deep Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2024
Jamie Lohoff
Emre Neftci
454
4
0
28 Jan 2025
CodeMonkeys: Scaling Test-Time Compute for Software Engineering
Ryan Ehrlich
Bradley Brown
Jordan Juravsky
Ronald Clark
Christopher Ré
Azalia Mirhoseini
312
26
0
24 Jan 2025
Previous
1
2
3
4
5
6
...
15
16
17
Next