当前位置: 首页 > news >正文

Mastering the game of Go with deep neural networks and tree search

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

Mastering the game of Go with deep neural networks and tree search

  • David Silver
  • , Aja Huang
  • , Chris J. Maddison
  • , Arthur Guez
  • , Laurent Sifre
  • , George van den Driessche
  • ,Julian Schrittwieser
  • , Ioannis Antonoglou
  • , Veda Panneershelvam
  • , Marc Lanctot
  • , Sander Dieleman
  • ,Dominik Grewe
  • , John Nham
  • , Nal Kalchbrenner
  • , Ilya Sutskever
  • , Timothy Lillicrap
  • , Madeleine Leach
  • ,Koray Kavukcuoglu
  • , Thore Graepel
  •  & Demis Hassabis
  • Nature volume529, pages484–489 (28 January 2016)
  • doi:10.1038/nature16961
  • Download Citation
    • Computational science
    • Computer science
    • Reward

Received:

11 November 2015

Accepted:

05 January 2016

Published:

27 January 2016

Abstract

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

  • Subscribe to Nature for full access:

    $199

    Subscribe

  • READCUBE ACCESS:

    $8.99rent

    $22.00buy

    Buy/Rent now

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

  • Login via Athens | 
  • Login via Shibboleth | 
  • Use a document delivery service | 
  • Purchase a site license

References

  1. 1.

    Allis, L. V. Searching for Solutions in Games and Artificial Intelligence. PhD thesis, Univ. Limburg, Maastricht, The Netherlands (1994)

    • Show context
      • Google Scholar
  2. 2.

    van den Herik, H., Uiterwijk, J. W. & van Rijswijck, J. Games solved: now and in the future. Artif. Intell. 134, 277–311 (2002)

    • Show context
      • Article
      • Google Scholar
  3. 3.

    Schaeffer, J. The games computers (and people) play. Advances in Computers 52, 189–266 (2000)

    • Show context
      • Google Scholar
  4. 4.

    Campbell, M., Hoane, A. & Hsu, F. Deep Blue. Artif. Intell. 134, 57–83 (2002)

    • Show context
      • Article
      • Google Scholar
  5. 5.

    Schaeffer, J. et al. A world championship caliber checkers program. Artif. Intell. 53, 273–289 (1992)

    • Show context
      • Article
      • Google Scholar
  6. 6.

    Buro, M. From simple features to sophisticated evaluation functions. In 1st International Conference on Computers and Games, 126–145 (1999)

    • Show context
      • Google Scholar
  7. 7.

    Müller, M. Computer Go. Artif. Intell. 134, 145–179 (2002)

    • Show context
      • Article
      • Google Scholar
  8. 8.

    Tesauro, G. & Galperin, G. On-line policy improvement using Monte-Carlo search. In Advances in Neural Information Processing, 1068–1074 (1996)

    • Show context
      • Google Scholar
  9. 9.

    Sheppard, B. World-championship-caliber Scrabble. Artif. Intell.134, 241–275 (2002)

    • Show context
      • Article
      • Google Scholar
  10. 10.

    Bouzy, B. & Helmstetter, B. Monte-Carlo Go developments. In10th International Conference on Advances in Computer Games, 159–174 (2003)

    • Show context
      • Google Scholar
  11. 11.

    Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In 5th International Conference on Computers and Games, 72–83 (2006)

    • Show context
      • Google Scholar
  12. 12.

    Kocsis, L. & Szepesvári, C. Bandit based Monte-Carlo planning. In15th European Conference on Machine Learning, 282–293 (2006)

    • Show context
      • Google Scholar
  13. 13.

    Coulom, R. Computing Elo ratings of move patterns in the game of Go. ICGA J. 30, 198–208 (2007)

    • Show context
      • Google Scholar
  14. 14.

    Baudiš, P. & Gailly, J.-L. Pachi: State of the art open source Go program. In Advances in Computer Games, 24–38 (Springer, 2012)

    • Show context
      • Google Scholar
  15. 15.

    Müller, M., Enzenberger, M., Arneson, B. & Segal, R. Fuego – an open-source framework for board games and Go engine based on Monte-Carlo tree search. IEEE Trans. Comput. Intell. AI in Games2, 259–270 (2010)

    • Show context
      • Article
      • Google Scholar
  16. 16.

    Gelly, S. & Silver, D. Combining online and offline learning in UCT. In 17th International Conference on Machine Learning, 273–280 (2007)

    • Show context
      • Google Scholar
  17. 17.

    Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 1097–1105 (2012)

    • Show context
      • Google Scholar
  18. 18.

    Lawrence, S., Giles, C. L., Tsoi, A. C. & Back, A. D. Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Netw. 8, 98–113 (1997)

    • Show context
      • PubMed
      • Article
      • Google Scholar
  19. 19.

    Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    • Show context
      • CAS
      • PubMed
      • Article
      • Google Scholar
  20. 20.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015)

    • Show context
      • CAS
      • PubMed
      • Article
      • Google Scholar
  21. 21.

    Stern, D., Herbrich, R. & Graepel, T. Bayesian pattern ranking for move prediction in the game of Go. In International Conference of Machine Learning, 873–880 (2006)

    • Show context
      • Google Scholar
  22. 22.

    Sutskever, I. & Nair, V. Mimicking Go experts with convolutional neural networks. In International Conference on Artificial Neural Networks, 101–110 (2008)

    • Show context
      • Google Scholar
  23. 23.

    Maddison, C. J., Huang, A., Sutskever, I. & Silver, D. Move evaluation in Go using deep convolutional neural networks. 3rd International Conference on Learning Representations (2015)

    • Show context
      • Google Scholar
  24. 24.

    Clark, C. & Storkey, A. J. Training deep convolutional neural networks to play go. In 32nd International Conference on Machine Learning, 1766–1774 (2015)

    • Show context
      • Google Scholar
  25. 25.

    Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)

    • Show context
      • Article
      • Google Scholar
  26. 26.

    Sutton, R., McAllester, D., Singh, S. & Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, 1057–1063 (2000)

    • Show context
      • Google Scholar
  27. 27.

    Sutton, R. & Barto, A. Reinforcement Learning: an Introduction(MIT Press, 1998)

    • Show context
      • Google Scholar
  28. 28.

    Schraudolph, N. N., Dayan, P. & Sejnowski, T. J. Temporal difference learning of position evaluation in the game of Go. Adv. Neural Inf. Process. Syst. 6, 817–824 (1994)

    • Show context
      • Google Scholar
  29. 29.

    Enzenberger, M. Evaluation in Go by a neural network using soft segmentation. In 10th Advances in Computer Games Conference, 97–108 (2003). 267

    • Show context
      • Google Scholar
  30. 30.

    Silver, D., Sutton, R. & Müller, M. Temporal-difference search in computer Go. Mach. Learn. 87, 183–219 (2012)

    • Show context
      • Article
      • Google Scholar
  31. 31.

    Levinovitz, A. The mystery of Go, the ancient game that computers still can’t win. Wired Magazine (2014)

    • Show context
      • Google Scholar
  32. 32.

    Mechner, D. All Systems Go. The Sciences 38, 32–37 (1998)

    • Show context
      • Article
      • Google Scholar
  33. 33.

    Mandziuk, J. Computational intelligence in mind games. InChallenges for Computational Intelligence, 407–442 (2007)

    • Show context
      • Google Scholar
  34. 34.

    Berliner, H. A chronology of computer chess and its literature.Artif. Intell. 10, 201–214 (1978)

    • Show context
      • Article
      • Google Scholar
  35. 35.

    Browne, C. et al. A survey of Monte-Carlo tree search methods.IEEE Trans. Comput. Intell. AI in Games 4, 1–43 (2012)

    • Show context
      • Article
      • Google Scholar
  36. 36.

    Gelly, S. et al. The grand challenge of computer Go: Monte Carlo tree search and extensions. Commun. ACM 55, 106–113 (2012)

    • Show context
      • Article
      • Google Scholar
  37. 37.

    Coulom, R. Whole-history rating: A Bayesian rating system for players of time-varying strength. In International Conference on Computers and Games, 113–124 (2008)

    • Show context
      • Google Scholar
  38. 38.

    KGS. Rating system math.http://www.gokgs.com/help/rmath.html

    • Show context
      • Google Scholar
  39. 39.

    Littman, M. L. Markov games as a framework for multi-agent reinforcement learning. In 11th International Conference on Machine Learning, 157–163 (1994)

    • Show context
      • Google Scholar
  40. 40.

    Knuth, D. E. & Moore, R. W. An analysis of alpha-beta pruning.Artif. Intell. 6, 293–326 (1975)

    • Show context
      • Article
      • Google Scholar
  41. 41.

    Sutton, R. Learning to predict by the method of temporal differences. Mach. Learn. 3, 9–44 (1988)

    • Show context
      • Article
      • Google Scholar
  42. 42.

    Baxter, J., Tridgell, A. & Weaver, L. Learning to play chess using temporal differences. Mach. Learn. 40, 243–263 (2000)

    • Show context
      • Article
      • Google Scholar
  43. 43.

    Veness, J., Silver, D., Blair, A. & Uther, W. Bootstrapping from game tree search. In Advances in Neural Information Processing Systems (2009)

    • Show context
      • Google Scholar
  44. 44.

    Samuel, A. L. Some studies in machine learning using the game of checkers II - recent progress. IBM J. Res. Develop. 11, 601–617 (1967)

    • Show context
      • Article
      • Google Scholar
  45. 45.

    Schaeffer, J., Hlynka, M. & Jussila, V. Temporal difference learning applied to a high-performance game-playing program. In 17th International Joint Conference on Artificial Intelligence, 529–534 (2001)

    • Show context
      • Google Scholar
  46. 46.

    Tesauro, G. TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994)

    • Show context
      • Article
      • Google Scholar
  47. 47.

    Dahl, F. Honte, a Go-playing program using neural nets. InMachines that learn to play games, 205–223 (Nova Science, 1999)

    • Show context
      • Google Scholar
  48. 48.

    Rosin, C. D. Multi-armed bandits with episode context. Ann. Math. Artif. Intell. 61, 203–230 (2011)

    • Show context
      • Article
      • Google Scholar
  49. 49.

    Lanctot, M., Winands, M. H. M., Pepels, T. & Sturtevant, N. R.Monte Carlo tree search with heuristic evaluations using implicit minimax backups. In IEEE Conference on Computational Intelligence and Games, 1–8 (2014)

    • Show context
      • Google Scholar
  50. 50.

    Gelly, S., Wang, Y., Munos, R. & Teytaud, O. Modification of UCT with patterns in Monte-Carlo Go. Tech. Rep. 6062, INRIA (2006)

    • Show context
      • Google Scholar
  51. 51.

    Silver, D. & Tesauro, G. Monte-Carlo simulation balancing. In 26th International Conference on Machine Learning119 (2009)

    • Show context
      • Google Scholar
  52. 52.

    Huang, S.-C., Coulom, R. & Lin, S.-S. Monte-Carlo simulation balancing in practice. In 7th International Conference on Computers and Games, 81–92 (Springer-Verlag, 2011)

    • Show context
      • Google Scholar
  53. 53.

    Baier, H. & Drake, P. D. The power of forgetting: improving the last-good-reply policy in Monte Carlo Go. IEEE Trans. Comput. Intell. AI in Games 2, 303–309 (2010)

    • Show context
      • Article
      • Google Scholar
  54. 54.

    Huang, S. & Müller, M. Investigating the limits of Monte-Carlo tree search methods in computer Go. In 8th International Conference on Computers and Games, 39–48 (2013)

    • Show context
      • Google Scholar
  55. 55.

    Segal, R. B. On the scalability of parallel UCT. Computers and Games 6515, 36–47 (2011)

    • Show context
      • Google Scholar
  56. 56.

    Enzenberger, M. & Müller, M. A lock-free multithreaded Monte-Carlo tree search algorithm. In 12th Advances in Computer Games Conference, 14–20 (2009)

    • Show context
      • Google Scholar
  57. 57.

    Huang, S.-C., Coulom, R. & Lin, S.-S. Time management for Monte-Carlo tree search applied to the game of Go. InInternational Conference on Technologies and Applications of Artificial Intelligence, 462–466 (2010)

    • Show context
      • Google Scholar
  58. 58.

    Gelly, S. & Silver, D. Monte-Carlo tree search and rapid action value estimation in computer Go. Artif. Intell. 175, 1856–1875 (2011)

    • Show context
      • Article
      • Google Scholar
  59. 59.

    Baudiš, P. Balancing MCTS by dynamically adjusting the komi value. ICGA J. 34, 131 (2011)

    • Show context
      • Google Scholar
  60. 60.

    Baier, H. & Winands, M. H. Active opening book application for Monte-Carlo tree search in 19×19 Go. In Benelux Conference on Artificial Intelligence, 3–10 (2011)

    • Show context
      • Google Scholar
  61. 61.

    Dean, J. et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, 1223–1231 (2012)

    • Show context
      • Google Scholar
  62. 62.

    Go ratings. http://www.goratings.org

    • Show context
      • Google Scholar

Download references

Acknowledgements

We thank Fan Hui for agreeing to play against AlphaGo; T. Manning for refereeing the match; R. Munos and T. Schaul for helpful discussions and advice; A. Cain and M. Cant for work on the visuals; P. Dayan, G. Wayne, D. Kumaran, D. Purves, H. van Hasselt, A. Barreto and G. Ostrovski for reviewing the paper; and the rest of the DeepMind team for their support, ideas and encouragement.

Author information

Author notes

    • David Silver
    •  & Aja Huang

    These authors contributed equally to this work.

Affiliations

  1. Google DeepMind, 5 New Street Square, London EC4A 3TW, UK

    • David Silver
    • , Aja Huang
    • , Chris J. Maddison
    • , Arthur Guez
    • , Laurent Sifre
    • ,George van den Driessche
    • , Julian Schrittwieser
    • , Ioannis Antonoglou
    • ,Veda Panneershelvam
    • , Marc Lanctot
    • , Sander Dieleman
    • , Dominik Grewe
    • ,Nal Kalchbrenner
    • , Timothy Lillicrap
    • , Madeleine Leach
    • , Koray Kavukcuoglu
    • , Thore Graepel
    •  & Demis Hassabis
  2. Google, 1600 Amphitheatre Parkway, Mountain View, California 94043, USA

    • John Nham
    •  & Ilya Sutskever

Contributions

A.H., G.v.d.D., J.S., I.A., M.La., A.G., T.G. and D.S. designed and implemented the search in AlphaGo. C.J.M., A.G., L.S., A.H., I.A., V.P., S.D., D.G., N.K., I.S., K.K. and D.S. designed and trained the neural networks in AlphaGo. J.S., J.N., A.H. and D.S. designed and implemented the evaluation framework for AlphaGo. D.S., M.Le., T.L., T.G., K.K. and D.H. managed and advised on the project. D.S., T.G., A.G. and D.H. wrote the paper.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to David Silver or Demis Hassabis.

Extended data

Extended data tables

  1. 1.

    Details of match between AlphaGo and Fan Hui

  2. 2.

    Input features for neural networks

  3. 3.

    Supervised learning results for the policy network

  4. 4.

    Input features for rollout and tree policy

  5. 5.

    Parameters used by AlphaGo

  6. 6.

    Results of a tournament between different Go programs

  7. 7.

    Results of a tournament between different variants of AlphaGo

  8. 8.

    Results of a tournament between AlphaGo and distributed AlphaGo, testing scalability with hardware

  9. 9.

    Cross-table of win rates in per cent between programs

  10. 10.

    Cross-table of win rates in per cent between programs in the single-machine scalability study

  11. 11.

    Cross-table of win rates in per cent between programs in the distributed scalability study

Supplementary information

Zip files

  1. 1.

    Supplementary Information

    This zipped file contains game records for the 5 formal match games played between AlphaGo and Fan Hui.

Rights and permissions

To obtain permission to re-use content from this article visitRightsLink.

 

Comments

By submitting a comment you agree to abide by our Terms andCommunity Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

转载于:https://my.oschina.net/ZhenyuanLiu/blog/1818286

相关文章:

  • 每周分享之JS数组的使用
  • python 内置模块
  • CSS 样式小结
  • TypeError: Cannot read property 'url' of undefined
  • centos 7 安装官方LAMP(Apache+PHP5+MySQL)
  • 6.Flask-WTForms
  • phpstrom+upupw 开启 Xdebug 调试
  • Python爬虫常用库的安装
  • 非 root 用户全局安装和配置 NodeJS
  • MYSQL性能优化的最佳20+条经验
  • 6.kotlin安卓实践课程-用kotlin写第一个activity对应P层
  • MHA源码分析——环境部署
  • 你需要了解的23种JavaScript设计模式
  • 2018-06-01Linux学习
  • 调查:市面上你知道有哪几款APP支持这个功能?
  • 深入了解以太坊
  • 10个最佳ES6特性 ES7与ES8的特性
  • js如何打印object对象
  • Netty 框架总结「ChannelHandler 及 EventLoop」
  • Python实现BT种子转化为磁力链接【实战】
  • vue:响应原理
  • Zsh 开发指南(第十四篇 文件读写)
  • 构造函数(constructor)与原型链(prototype)关系
  • 理清楚Vue的结构
  • 区块链将重新定义世界
  • 入门级的git使用指北
  • 消息队列系列二(IOT中消息队列的应用)
  • 译米田引理
  • nb
  • ​软考-高级-系统架构设计师教程(清华第2版)【第15章 面向服务架构设计理论与实践(P527~554)-思维导图】​
  • #NOIP 2014# day.1 生活大爆炸版 石头剪刀布
  • $(function(){})与(function($){....})(jQuery)的区别
  • (1)Nginx简介和安装教程
  • (Java岗)秋招打卡!一本学历拿下美团、阿里、快手、米哈游offer
  • (办公)springboot配置aop处理请求.
  • (利用IDEA+Maven)定制属于自己的jar包
  • (南京观海微电子)——I3C协议介绍
  • .NET Core Web APi类库如何内嵌运行?
  • .net 后台导出excel ,word
  • .Net 中的反射(动态创建类型实例) - Part.4(转自http://www.tracefact.net/CLR-and-Framework/Reflection-Part4.aspx)...
  • .NET委托:一个关于C#的睡前故事
  • .project文件
  • [ vulhub漏洞复现篇 ] ThinkPHP 5.0.23-Rce
  • [145] 二叉树的后序遍历 js
  • [20190416]完善shared latch测试脚本2.txt
  • [Android]常见的数据传递方式
  • [Angular] 笔记 8:list/detail 页面以及@Input
  • [C#]手把手教你打造Socket的TCP通讯连接(一)
  • [C语言]——内存函数
  • [docker]docker网络-直接路由模式
  • [EULAR文摘] 脊柱放射学持续进展是否显著影响关节功能
  • [Google Guava] 2.1-不可变集合
  • [HackMyVM]靶场 Quick3
  • [IM] [Webhook] Webhook实现IM平台机器人
  • [Kubernetes]8. K8s使用Helm部署mysql集群(主从数据库集群)