Krzysztof Krawiec

We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interleaves two search processes that operate in intra-game and inter-game mode. The intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively visited game states. For the inter-game learning component, we provide coevolutionary algorithm that maintains a sample of strategies and uses the outcomes of games played between them to iteratively modify the probability distribution, according to which new strategies are generated and added to the sample. We analyze CTDL{\textquoteright}s sensitivity to all important parameters, including the trace decay constant that controls the lookahead horizon of TDL, and the relative intensity of intra-game and inter-game learning. We investigate also how the presence of memory (an archive) affects the search performance, and find out that the archived approach is superior to other techniques considered here, and produces strategies that outperform a handcrafted weighted piece counter strategy and a simple liberty-based heuristics. This encouraging result can be potentially generalized not only to other strategy representations used for small-board Go, but also to different games and a broader class of problems, because CTDL is generic and does not rely on any problem-specific knowledge. #2011AMCSKrawiecJaskowskiSzubertBib

@ARTICLE { 2011AMCSKrawiecJaskowskiSzubert, ABSTRACT = { We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interleaves two search processes that operate in intra-game and inter-game mode. The intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively visited game states. For the inter-game learning component, we provide coevolutionary algorithm that maintains a sample of strategies and uses the outcomes of games played between them to iteratively modify the probability distribution, according to which new strategies are generated and added to the sample. We analyze CTDL{\textquoteright}s sensitivity to all important parameters, including the trace decay constant that controls the lookahead horizon of TDL, and the relative intensity of intra-game and inter-game learning. We investigate also how the presence of memory (an archive) affects the search performance, and find out that the archived approach is superior to other techniques considered here, and produces strategies that outperform a handcrafted weighted piece counter strategy and a simple liberty-based heuristics. This encouraging result can be potentially generalized not only to other strategy representations used for small-board Go, but also to different games and a broader class of problems, because CTDL is generic and does not rely on any problem-specific knowledge. }, AUTHOR = { Krzysztof Krawiec and Wojciech Ja{\'s}kowski and Marcin Szubert }, DOI = { DOI: 10.2478/v10006-011-0057-3 }, JOURNAL = { International Journal of Applied Mathematics and Computer Science }, NUMBER = { 4 }, OPTKEYWORDS = { go }, OPTNOTE = { exported from refbase (http://brainer.cs.put.poznan.pl/refbase/show.php?record=185), last updated on Sun, 24 Apr 2011 17:58:58 +0200 }, PAGES = { 717--731 }, TITLE = { Evolving Small-Board Go Players using Coevolutionary Temporal Difference Learning with Archive }, URL = { http://www.cs.put.poznan.pl/kkrawiec/pubs/2011AMCSpreprint.pdf }, VOLUME = { 21 }, YEAR = { 2011 }, 1 = { http://www.cs.put.poznan.pl/kkrawiec/pubs/2011AMCSpreprint.pdf }, 2 = { https://doi.org/10.2478/v10006-011-0057-3 }, }