Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello

by Wojciech Jaśkowski, Marcin Szubert, Paweł Liskowski
Abstract:
We compare Temporal Difference Learning (TDL) with Coevolutionary Learning (CEL) on Othello. Apart from using three popular single-criteria performance measures: i) generalization performance or expected utility, ii) average results against a hand-crafted heuristic and iii) result in a head to head match, we compare the algorithms using performance profiles. This multi-criteria performance measure characterizes player’s performance in the context of opponents of various strength. The multi-criteria analysis reveals that although the generalization performance of players produced by the two algorithms is similar, TDL is much better at playing against the strong opponents, while CEL copes better against the weak ones. We also find out that TDL produces less diverse strategies than CEL. Our results confirm the usefulness of performance profiles as a tool for comparison of learning algorithms for games.
Reference:
Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello (Wojciech Jaśkowski, Marcin Szubert, Paweł Liskowski), In EvoApplications 2014 (A. I. Esparcia-Alcazar, A. M. Mora, eds.), Springer, volume 8602, 2014.
Bibtex Entry:
@InProceedings{Jaskowski2014multicriteria,
  Title                    = {Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello},
  Author                   = {Wojciech Jaśkowski and Marcin Szubert and
 Pawe{ł} Liskowski},
  Booktitle                = {EvoApplications 2014},
  Year                     = {2014},
  Editor                   = {A. I. Esparcia-Alcazar and A. M. Mora},
  Pages                    = {301--312},
  Publisher                = {Springer},
  Series                   = {Lecture Notes in Computer Science},
  Volume                   = {8602},

  Abstract                 = {We compare Temporal Difference Learning (TDL) with Coevolutionary
Learning (CEL) on Othello. Apart from using three popular
single-criteria performance measures: i) generalization performance
or expected utility, ii) average results against a hand-crafted heuristic
and iii) result in a head to head match, we compare the algorithms using
performance profiles. This multi-criteria performance measure characterizes
player’s performance in the context of opponents of various strength.
The multi-criteria analysis reveals that although the generalization performance
of players produced by the two algorithms is similar, TDL is
much better at playing against the strong opponents, while CEL copes
better against the weak ones. We also find out that TDL produces less
diverse strategies than CEL. Our results confirm the usefulness of performance
profiles as a tool for comparison of learning algorithms for games.},
  Keywords                 = {reinforcement learning, coevolutionary algorithm, Reversi, Othello, board evaluation function, weighted piece counter, interactive domain},
  Url                      = {http://www.cs.put.poznan.pl/mszubert/pub/jaskowski2014evogames.pdf}
}

This entry was posted by . Bookmark the permalink.