1 Dataset visualizations

Two-dimensional dataset visualizations. Left column: clusters differentiated using shapes and colors. Right column: types of minority class examples in color-coded as: safe - green, borderline - orange, rare - red, outlier - black.

Clusters

Example types

clover

cloverb

cloveru

cloverr

clovers

cloversu

dis

disb

disu

disr

diss

dissu

hyp

hypb

hypu

hypr

hyps

hypsu

joined

joinedb

joinedu

joinedr

joineds

joinedsu

normal

normalb

normalu

normalr

normals

normalsu

rothyp

rothypb

rothypu

rothypr

rothyps

rothypsu

2 Result summary

3 Comparisons and statistical tests

3.1 AMI - best models


Friedman rank sum test
Friedman chi-squared = 34.039, df = 2, p-value = 4.061e-08

ImScan ImKmeans ImGrid
1.71 1.87 2.43

3.2 AMI - mean models


Friedman rank sum test
Friedman chi-squared = 2.8129, df = 2, p-value = 0.245

ImGrid ImScan ImKmeans
1.85 2.06 2.09

3.3 G-mean - best models


Friedman rank sum test
Friedman chi-squared = 95.191, df = 3, p-value < 2.2e-16

Napierala ImScan ImGrid ImKmeans
1.75 2.17 2.45 3.63

3.4 G-mean - mean models


Friedman rank sum test
Friedman chi-squared = 139.69, df = 3, p-value < 2.2e-16

Napierala ImGrid ImScan ImKmeans
1.56 1.85 2.88 3.71

3.5 Time (mean models)


Friedman rank sum test
Friedman chi-squared = 227.35, df = 3, p-value < 2.2e-16

Napierala ImGrid ImScan ImKmeans
1 2.08 2.92 4

4 Balance between AMI and G-mean

4.1 beta*AMI + (1-beta)*G-mean for best models

4.2 beta*AMI + (1-beta)*G-mean for mean models