Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1712.01815
Cited By
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
5 December 2017
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
A. Guez
Marc Lanctot
Laurent Sifre
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"
50 / 839 papers shown
Title
A Review of Nine Physics Engines for Reinforcement Learning Research
Michael Kaup
Cornelius Wolff
Hyerim Hwang
Julius Mayer
Elia Bruni
AI4CE
262
15
0
11 Jul 2024
A Review of the Applications of Deep Learning-Based Emergent Communication
Brendon Boldt
David R. Mortensen
VLM
221
17
0
03 Jul 2024
Towards Faster Matrix Diagonalization with Graph Isomorphism Networks and the AlphaZero Framework
Geigh Zollicoffer
Kshitij Bhatta
Manish Bhattarai
Phil Romero
C. Negre
Anders M. N. Niklasson
A. Adedoyin
108
0
0
30 Jun 2024
PUZZLES: A Benchmark for Neural Algorithmic Reasoning
Benjamin Estermann
Luca A. Lanzendörfer
Yannick Niedermayr
Roger Wattenhofer
319
11
0
29 Jun 2024
What Teaches Robots to Walk, Teaches Them to Trade too -- Regime Adaptive Execution using Informed Data and LLMs
Raeid Saqur
185
3
0
20 Jun 2024
Exploring and Benchmarking the Planning Capabilities of Large Language Models
Bernd Bohnet
Azade Nova
Aaron T Parisi
Kevin Swersky
Katayoon Goshvadi
Hanjun Dai
Dale Schuurmans
Noah Fiedel
Hanie Sedghi
179
15
0
18 Jun 2024
A Systematization of the Wagner Framework: Graph Theory Conjectures and Reinforcement Learning
IFIP Working Conference on Database Semantics (IWDS), 2024
Flora Angileri
Giulia Lombardi
Andrea Fois
Renato Faraone
C. Metta
...
M. Fantozzi
S. Galfrè
Daniele Pavesi
Maurizio Parton
F. Morandin
137
5
0
18 Jun 2024
Transcendence: Generative Models Can Outperform The Experts That Train Them
Edwin Zhang
Vincent Zhu
Sham Kakade
Anat Kleiman
Benjamin L. Edelman
Milind Tambe
Sham Kakade
Eran Malach
433
22
0
17 Jun 2024
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Yuan Pu
Yazhe Niu
Jiyuan Ren
Zhenjie Yang
Hongsheng Li
Yu Liu
OffRL
465
9
0
15 Jun 2024
ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
Xu Zhang
Xunjian Yin
Xiaojun Wan
165
3
0
13 Jun 2024
Planning Like Human: A Dual-process Framework for Dialogue Planning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Tao He
Lizi Liao
Yixin Cao
Yuanxing Liu
Ming Liu
Zerui Chen
Bing Qin
293
41
0
08 Jun 2024
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
Anthony Costarelli
Mat Allen
Roman Hauksson
Grace Sodunke
Suhas Hariharan
Carlson Cheng
Wenjie Li
Joshua Clymer
Arjun Yadav
ELM
ReLM
LLMAG
LRM
268
44
0
07 Jun 2024
Open-Endedness is Essential for Artificial Superhuman Intelligence
International Conference on Machine Learning (ICML), 2024
Edward Hughes
Michael Dennis
Jack Parker-Holder
Feryal M. P. Behbahani
Aditi Mavalankar
Yuge Shi
Tom Schaul
Tim Rocktaschel
LRM
334
54
0
06 Jun 2024
Discovering the curriculum with AI: A proof-of-concept demonstration with an intelligent tutoring system for teaching project selection
Lovis Heindrich
Falk Lieder
195
1
0
06 Jun 2024
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
Yifei Wang
Dizhan Xue
Shengjie Zhang
Shengsheng Qian
AAML
LLMAG
198
72
0
05 Jun 2024
Multi-Agent Transfer Learning via Temporal Contrastive Learning
Weihao Zeng
Joseph Campbell
Simon Stepputtis
Katia Sycara
OffRL
221
2
0
03 Jun 2024
Exploring the limits of Hierarchical World Models in Reinforcement Learning
Robin Schiewer
Anand Subramoney
Laurenz Wiskott
227
7
0
01 Jun 2024
VQA Training Sets are Self-play Environments for Generating Few-shot Pools
Tautvydas Misiunas
Hassan Mansoor
Jasper Uijlings
Oriana Riva
Victor Carbune
LRM
VLM
154
1
0
30 May 2024
Training-efficient density quantum machine learning
Brian Coyle
El Amine Cherrat
Nishant Jain
Natansh Mathur
Snehal Raj
Skander Kazdaghli
Iordanis Kerenidis
317
10
0
30 May 2024
No
D
train
D_{\text{train}}
D
train
: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning
Xiangyu Sun
Raquel Aoki
Kevin H. Wilson
230
1
0
28 May 2024
Competing for pixels: a self-play algorithm for weakly-supervised segmentation
Shaheer U. Saeed
Shiqi Huang
João Ramalhinho
Iani J. M. B. Gayo
Nina Montaña-Brown
...
Stephen P. Pereira
Brian R. Davidson
D. Barratt
Matthew J. Clarkson
Yipeng Hu
277
0
0
26 May 2024
SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning
Shuai Zhang
Heshan Devaka Fernando
Miao Liu
K. Murugesan
Songtao Lu
Pin-Yu Chen
Tianyi Chen
Meng Wang
220
3
0
24 May 2024
Multi-turn Reinforcement Learning from Preference Human Feedback
Lior Shani
Aviv Rosenberg
Asaf B. Cassel
Oran Lang
Daniele Calandriello
...
Bilal Piot
Idan Szpektor
Avinatan Hassidim
Yossi Matias
Rémi Munos
211
58
0
23 May 2024
Mixture of Public and Private Distributions in Imperfect Information Games
Jérôme Arjonilla
Abdallah Saffidine
Tristan Cazenave
257
1
0
23 May 2024
Deep Reinforcement Learning for 5*5 Multiplayer Go
Brahim Driss
Jérôme Arjonilla
Hui Wang
Abdallah Saffidine
Tristan Cazenave
848
0
0
23 May 2024
Pure Planning to Pure Policies and In Between with a Recursive Tree Planner
A. N. Redlich
126
0
0
21 May 2024
A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy
Zhaoxing Li
176
2
0
16 May 2024
LLMs can learn self-restraint through iterative self-reflection
Alexandre Piché
Aristides Milios
Dzmitry Bahdanau
Chris Pal
326
6
0
15 May 2024
Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based Method for Evaluating Chess Strategies from Textbooks
Haifa Alrdahi
Riza Batista-Navarro
177
1
0
10 May 2024
LLMs with Personalities in Multi-issue Negotiation Games
Sean Noh
Ho-Chun Herbert Chang
LLMAG
349
20
0
08 May 2024
Super-Exponential Regret for UCT, AlphaGo and Variants
Laurent Orseau
Rémi Munos
ELM
141
2
0
07 May 2024
Philosophy of Cognitive Science in the Age of Deep Learning
Raphaël Millière
AI4CE
NAI
203
7
0
07 May 2024
HUGO -- Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach
Malte Lehna
Clara Holzhuter
Sven Tomforde
Christoph Scholz
198
7
0
01 May 2024
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
Yuxi Xie
Anirudh Goyal
Wenyue Zheng
Min-Yen Kan
Timothy Lillicrap
Kenji Kawaguchi
Michael Shieh
ReLM
LRM
390
194
0
01 May 2024
Learning to Beat ByteRL: Exploitability of Collectible Card Game Agents
Radovan Haluška
Martin Schmid
LLMAG
196
0
0
25 Apr 2024
Playing Board Games with the Predict Results of Beam Search Algorithm
Sergey Pastukhov
90
0
0
23 Apr 2024
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
288
45
0
22 Apr 2024
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Ye Tian
Baolin Peng
Linfeng Song
Lifeng Jin
Dian Yu
Haitao Mi
Dong Yu
LRM
ReLM
235
124
0
18 Apr 2024
Monte Carlo Search Algorithms Discovering Monte Carlo Tree Search Exploration Terms
Tristan Cazenave
230
0
0
14 Apr 2024
Mitigating Cascading Effects in Large Adversarial Graph Environments
James Cunningham
Conrad S. Tucker
AI4CE
AAML
134
0
0
12 Apr 2024
Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity
Neural Information Processing Systems (NeurIPS), 2024
Vahid Balazadeh Meresht
Keertana Chidambaram
Viet Nguyen
Fahad Razak
Vasilis Syrgkanis
402
2
0
10 Apr 2024
Insights from the Use of Previously Unseen Neural Architecture Search Datasets
Computer Vision and Pattern Recognition (CVPR), 2024
Rob Geada
David Towers
M. Forshaw
Amir Atapour-Abarghouei
A. Mcgough
258
5
0
02 Apr 2024
Cooperative Evolutionary Pressure and Diminishing Returns Might Explain the Fermi Paradox: On What Super-AIs Are Like
Daniel Vallstrom
304
0
0
01 Apr 2024
STaR-GATE: Teaching Language Models to Ask Clarifying Questions
Chinmaya Andukuri
Jan-Philipp Fränken
Tobias Gerstenberg
Noah D. Goodman
SyDa
LRM
305
68
0
28 Mar 2024
A survey on Concept-based Approaches For Model Improvement
Avani Gupta
P. J. Narayanan
LRM
288
5
0
21 Mar 2024
JaxUED: A simple and useable UED library in Jax
Samuel Coward
Michael Beukman
Jakob Foerster
181
8
0
19 Mar 2024
Language Evolution with Deep Learning
Mathieu Rita
Paul Michel
Rahma Chaabouni
Olivier Pietquin
Emmanuel Dupoux
Florian Strub
185
3
0
18 Mar 2024
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
E. Zelikman
Georges Harik
Yijia Shao
Varuna Jayasiri
Nick Haber
Noah D. Goodman
LLMAG
ReLM
LRM
643
202
0
14 Mar 2024
Teaching Large Language Models to Reason with Reinforcement Learning
Alex Havrilla
Yuqing Du
Sharath Chandra Raparthy
Christoforos Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Sainbayar Sukhbaatar
Roberta Raileanu
ReLM
LRM
252
141
0
07 Mar 2024
EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
Shengjie Wang
Shaohuai Liu
Weirui Ye
Jiacheng You
Yang Gao
OffRL
222
27
0
01 Mar 2024
Previous
1
2
3
4
5
6
...
15
16
17
Next