Krzysztof Krawiec


Home

Research:

edit SideBar

SZ-Tetris, a restricted version ofTetris, is a difficult reinforcement learning task. Previous researchshowed that, similarly to the original Tetris, value function-basedmethods such as temporal difference learning, do not work well forSZ-Tetris. The best performance in this game was achieved by employingdirect policy search techniques, in particular the cross-entropymethod in combination with handcrafted features. Nonetheless, a simpleheuristic hand-coded player scores even higher. Here we show that itis possible to equal its performance with CMA-ES (Covariance MatrixAdaptation Evolution Strategy). We demonstrate that furtherimprovement is possible by employing systematic n-tuple network, aknowledge-free function approximator, and VD-CMA-ES, a linear variantof CMA-ES for high dimension optimization. Last but not least, we showthat a large systematic n-tuple network (involving more than 4 millionparameters) allows the classical temporal difference learningalgorithm to obtain similar average performance to VD-CMA-ES, but at20 times lower computational expense, leading to the best policy forSZ-Tetris known to date. These results enrich the currentunderstanding of difficulty of SZ-Tetris, and shed new light on thecapabilities of particular search paradigms when applied torepresentations of various characteristics and dimensionality.

@INPROCEEDINGS { Jaskowski2015sztetris,
    ABSTRACT = { SZ-Tetris, a restricted version ofTetris, is a difficult reinforcement learning task. Previous researchshowed that, similarly to the original Tetris, value function-basedmethods such as temporal difference learning, do not work well forSZ-Tetris. The best performance in this game was achieved by employingdirect policy search techniques, in particular the cross-entropymethod in combination with handcrafted features. Nonetheless, a simpleheuristic hand-coded player scores even higher. Here we show that itis possible to equal its performance with CMA-ES (Covariance MatrixAdaptation Evolution Strategy). We demonstrate that furtherimprovement is possible by employing systematic n-tuple network, aknowledge-free function approximator, and VD-CMA-ES, a linear variantof CMA-ES for high dimension optimization. Last but not least, we showthat a large systematic n-tuple network (involving more than 4 millionparameters) allows the classical temporal difference learningalgorithm to obtain similar average performance to VD-CMA-ES, but at20 times lower computational expense, leading to the best policy forSZ-Tetris known to date. These results enrich the currentunderstanding of difficulty of SZ-Tetris, and shed new light on thecapabilities of particular search paradigms when applied torepresentations of various characteristics and dimensionality. },
    ACMID = { 2754783 },
    ADDRESS = { New York, NY, USA },
    AUTHOR = { Wojciech Ja\'skowski and Marcin Szubert and Pawe{\l} Liskowski and Krzysztof Krawiec },
    BOOKTITLE = { Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation },
    DOI = { 10.1145/2739480.2754783 },
    ISBN = { 978-1-4503-3472-3 },
    KEYWORDS = { Reinforcement learning, covariance matrix adaptation, CMA- ES, VD-CMA, function approximation, knowledge-free rep- resentations, video games, n-tuple system },
    LOCATION = { Madrid, Spain },
    NUMPAGES = { 7 },
    PAGES = { 567--573 },
    PUBLISHER = { ACM },
    SERIES = { GECCO '15 },
    TITLE = { High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in {SZ-Tetris} },
    URL = { http://doi.acm.org/10.1145/2739480.2754783 },
    YEAR = { 2015 },
    1 = { http://doi.acm.org/10.1145/2739480.2754783 },
    2 = { https://doi.org/10.1145/2739480.2754783 },
}


Powered by PmWiki