Krzysztof Krawiec

SZ-Tetris, a restricted version ofTetris, is a difficult reinforcement learning task. Previous researchshowed that, similarly to the original Tetris, value function-basedmethods such as temporal difference learning, do not work well forSZ-Tetris. The best performance in this game was achieved by employingdirect policy search techniques, in particular the cross-entropymethod in combination with handcrafted features. Nonetheless, a simpleheuristic hand-coded player scores even higher. Here we show that itis possible to equal its performance with CMA-ES (Covariance MatrixAdaptation Evolution Strategy). We demonstrate that furtherimprovement is possible by employing systematic n-tuple network, aknowledge-free function approximator, and VD-CMA-ES, a linear variantof CMA-ES for high dimension optimization. Last but not least, we showthat a large systematic n-tuple network (involving more than 4 millionparameters) allows the classical temporal difference learningalgorithm to obtain similar average performance to VD-CMA-ES, but at20 times lower computational expense, leading to the best policy forSZ-Tetris known to date. These results enrich the currentunderstanding of difficulty of SZ-Tetris, and shed new light on thecapabilities of particular search paradigms when applied torepresentations of various characteristics and dimensionality.

@INPROCEEDINGS { Jaskowski2015sztetris, ABSTRACT = { SZ-Tetris, a restricted version ofTetris, is a difficult reinforcement learning task. Previous researchshowed that, similarly to the original Tetris, value function-basedmethods such as temporal difference learning, do not work well forSZ-Tetris. The best performance in this game was achieved by employingdirect policy search techniques, in particular the cross-entropymethod in combination with handcrafted features. Nonetheless, a simpleheuristic hand-coded player scores even higher. Here we show that itis possible to equal its performance with CMA-ES (Covariance MatrixAdaptation Evolution Strategy). We demonstrate that furtherimprovement is possible by employing systematic n-tuple network, aknowledge-free function approximator, and VD-CMA-ES, a linear variantof CMA-ES for high dimension optimization. Last but not least, we showthat a large systematic n-tuple network (involving more than 4 millionparameters) allows the classical temporal difference learningalgorithm to obtain similar average performance to VD-CMA-ES, but at20 times lower computational expense, leading to the best policy forSZ-Tetris known to date. These results enrich the currentunderstanding of difficulty of SZ-Tetris, and shed new light on thecapabilities of particular search paradigms when applied torepresentations of various characteristics and dimensionality. }, ACMID = { 2754783 }, ADDRESS = { New York, NY, USA }, AUTHOR = { Wojciech Ja\'skowski and Marcin Szubert and Pawe{\l} Liskowski and Krzysztof Krawiec }, BOOKTITLE = { Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation }, DOI = { 10.1145/2739480.2754783 }, ISBN = { 978-1-4503-3472-3 }, KEYWORDS = { Reinforcement learning, covariance matrix adaptation, CMA- ES, VD-CMA, function approximation, knowledge-free rep- resentations, video games, n-tuple system }, LOCATION = { Madrid, Spain }, NUMPAGES = { 7 }, PAGES = { 567--573 }, PUBLISHER = { ACM }, SERIES = { GECCO '15 }, TITLE = { High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in {SZ-Tetris} }, URL = { http://doi.acm.org/10.1145/2739480.2754783 }, YEAR = { 2015 }, 1 = { http://doi.acm.org/10.1145/2739480.2754783 }, 2 = { https://doi.org/10.1145/2739480.2754783 }, }