# Competent Algorithms for Geometric Semantic Genetic Programming

Tomasz P. Pawlak

### Abstract

Genetic Programming (GP) is a machine learning technique for automatic induction of computer programs from examples. The examples typically consist of two parts: program arguments – the input and a target program output. Both input and output may be expressed in terms of numeric or textual variables or even a conglomerate of the above. This problem formulation enables formalizing

semantics of a program as a tuple of outputs it returns in effect of execution on the sample inputs. Use of semantics in GP gains an interest in research community, since the semantic methods developed so far in GP proved capable of achieving lower error, better generalization and smaller programs and quicker convergence to an optimum than the contemporary methods.

We embrace existing notions of semantics of program, semantic distance, semantic neutrality and effectiveness of genetic operators under the umbrella of common conceptual framework for Semantic Genetic Programming (SGP).

Then, we show that if a ﬁtness function is a metric, the ﬁtness landscape spanned over a space of all programs proxied by semantics becomes a cone with the optimal semantics in the apex. This provides justification for use of the recently developed Geometric Semantic Genetic Programming (GSGP), where geometric genetic operators utilize the conic shape of the landscape. We derive

properties of progress and progress bounds of geometric operators for different combinations of ﬁtness functions and semantic distances.

We present a comprehensive literature review of existing semantic methods, discuss their advantages and disadvantages and for each show how the deﬁned properties apply to it.

Next, we propose an algorithm for backpropagating semantics trough program structure and competent algorithms for operators: population initialization, parent selection, mutation and crossover that are approximately geometric, effective and free of certain drawbacks of the existing geometric methods.

Then, we experimentally assess the proposed algorithms and compare them with the existing methods in terms of training set ﬁtness, generalization on test set, probability of performing geometric and effective application, size of produced programs and computational costs. We use a suite of nine symbolic regression and nine Boolean program synthesis benchmarks. The analysis shows that the proposed algorithms achieve performance that is consistently better than that offered by other semantic GP methods for symbolic regression domain and not worse than the best other methods for Boolean domain.

Finally, we experimentally ﬁnd the proportions of competent mutation and competent crossover that lead to the optimal results in the above range of benchmarks.

### BibTeX

@phdthesis{PawlakPhD2015,

title = {Competent Algorithms for Geometric Semantic Genetic Programming},

author = {Tomasz P. Pawlak},

school = {Poznan University of Technology},

address = {Pozna'{n}, Poland},

year = 2015

}

### Suplementary material

The Java source code of Competent Algorithms for Geometric Semantic Genetic Programming and other parts of the experimental suite presented in the thesis, together with the compiled Jar are available under conditions of Academic Free License. Java 1.7+ is required to compile the Jar from source. Instructions how to create configuration files and run the experiment are available in ECJ manual. The classes under interest of this study can be found in the following namespaces:

- ec.app.semanticGP.* (problem definitions)
- ec.gp.semantic.* (operators and basic semantic stuff)
- library.* (semantic library helper classes)