{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Wczytujemy przykładowe zbiory danych"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn import datasets\n",
    "iris = datasets.load_iris()\n",
    "digits = datasets.load_digits()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(iris.data.shape)\n",
    "print(iris.target)\n",
    "print(iris.target_names)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Podziel dane na zbiór uczący i testujący:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "np.random.seed(0) # aby wyniki były powtarzalne\n",
    "indices = np.random.permutation(len(iris.data))\n",
    "print(indices)\n",
    "train_X = iris.data[indices[:-10]]\n",
    "train_Y = iris.target[indices[:-10]]\n",
    "test_X = iris.data[indices[-10:]]\n",
    "test_Y = iris.target[indices[-10:]]\n",
    "print(len(train_X), len(test_X))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Uczymy klasyfikator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# tworzymy klasyfikator\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "knn = KNeighborsClassifier()\n",
    "# uczymy na naszych danych\n",
    "knn.fit(train_X, train_Y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Używamy nauczony klasyfikator do predykcji na nowych danych"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print('klasyfikacja:', knn.predict(test_X))\n",
    "print('powinno byc :', test_Y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dane z cyframi:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# każemy rysować obrazki bezpośrednio w notebooku\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "plt.imshow(digits.images[0], cmap='gray', interpolation='nearest')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Tworzymy klasyfikator SVM:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from sklearn import svm\n",
    "s = svm.SVC(gamma=0.001, C=100.)\n",
    "s.fit(digits.data[:-20], digits.target[:-20])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print('klasyfikacja:', s.predict(digits.data[-20:]))\n",
    "print('powinno byc :', digits.target[-20:])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Zadanie 1\n",
    "#### Jak automatycznie policzyć liczbę pomyłek klasyfikatora? Napisz kod."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def liczba_pomylek(jest, powinno_byc):\n",
    "    return # TODO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Zadanie 2\n",
    "#### Sprawdź przy jakim rozmiarze zbioru uczącego (ilu początkowych rekordów z danych \"digits\") zaczną pojawiać się błędy na pozostałych danych ze zbioru. Napisz kod który automatycznie to sprawdzi."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Zadanie 3\n",
    "#### Spróbuj nauczyć klasyfikator kNN na części z dostępnych atrybutów zbioru iris. Podziel losowo (jak wyżej) dane iris na zbiór uczący i testujący o tym samym rozmiarze. Następnie sprawdź dla której pary atrybutów uzyskasz najlepszy wynik klasyfikacji zbioru testowego. Sprawdź czy wniosek jest taki sam dla różnych podziałów na zbiory uczące i testujące (sprawdź np.random.seed(s) dla s = 0..9). Napisz kod który automatycznie to sprawdzi."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# wizualizacja danych - atrybut 0 na osi X, atrybut 1 na osi Y, kolor w zależności od klasy\n",
    "plt.scatter(iris.data[:,0], iris.data[:,1], c=iris.target)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Zadanie 4\n",
    "#### Zaimplementuj własny klasyfikator 1NN (kNN dla k=1) używający odległości euclidesowej. Sprawdź czy Twoja implementacja klasyfikatora daje takie same wyniki jak sklearn.neighbors.NearestNeighbors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def nn(data, x):\n",
    "    \"\"\"Zwraca indeks wektora w data który jest najbliższy (odległość euklidesowa) wektorowi x.\n",
    "    data - 2D tablica numpy o n wierszach, w każdym wierszu wektor o długości m\n",
    "    x - wektor (1D tablica numpy) o długości m\"\"\"\n",
    "    \n",
    "    # TODO\n",
    "    \n",
    "\n",
    "def check(data, x):\n",
    "    def referencyjna():\n",
    "        neigh = NearestNeighbors(1, algorithm='brute', metric='euclidean')\n",
    "        neigh.fit(data)\n",
    "        return neigh.kneighbors([x], 1, return_distance=False)[0,0]\n",
    "    #print(referencyjna, x)   \n",
    "    assert nn(data, x) == referencyjna()\n",
    "\n",
    "data = np.array([[0, 0, 2], [1, 0, 0], [0, 0, 1]])\n",
    "x = np.array([0, 0, 1.3])\n",
    "check(data, x)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}