Marcin Szubert

Temporal Difference Learning of N-Tuple Networks for the Game 2048

The highly addictive stochastic puzzle game 2048 has recently invaded the Internet and mobile devices, stealing countless hours of players’ lives. In this project we investigate the possibility of creating a game-playing agent capable of winning this game without incorporating human expertise or performing game tree search. For this purpose we employ three variants of temporal difference learning to acquire:

action value function,
state value function,
afterstate value function,

which are used to evaluate moves at 1-ply. To represent these functions we adopt n-tuple networks, which have recently been successfully applied to Othello and Connect 4. The conducted experiments demonstrate that the learning algorithm using afterstate value functions is able to consistently produce players winning over 97% of games. These results show that n-tuple networks combined with an appropriate learning algorithm have large potential, which could be exploited in other board games.

Conference Paper

The paper on this study has been accepted for presentation at CIG 2014 (IEEE Conference on Computational Intelligence and Games). A preprint can be found here.

Source Code and Best Players

The source code of the game engine and some serialized players are available at github. The best large network (2x3 & 1x4 symmetric network) is available here.

N-Tuple Player in Action

You can watch one of the best found players in action here: http://solver2048.appspot.com.

Acknowledgment

This work was supported by the Polish National Science Centre grant no. DEC-2012/05/N/ST6/03152.

Institute of Computing Science, Poznan University of Technology

Temporal Difference Learning of N-Tuple Networks for the Game 2048

Conference Paper

Source Code and Best Players

N-Tuple Player in Action

Acknowledgment