{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. K-means algorithm\n",
    "\n",
    "**Question** What are the following steps of the k-means algorithm?\n",
    "\n",
    "**Question** How can we choose the initial clusters?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise**\n",
    "Given the following examples of grades od 5 students we want to divide them into 2 groups:\n",
    "\n",
    "| Subject | A   | B   |\n",
    "|---------|-----|-----|\n",
    "| 1       | 1.0 | 1.0 |\n",
    "| 2       | 1.5 | 2.0 |\n",
    "| 3       | 3.0 | 3.0 |\n",
    "| 4       | 5.0 | 7.0 |\n",
    "| 5       | 3.5 | 5.0 |\n",
    "\n",
    "We have chosen the two furthest students (using euclidean distance) as the initial clusters' centroids:\n",
    "\n",
    "|Cluster|Centroid|A  |B  |\n",
    "|-------|--------|---|---|\n",
    "|C1     |k1      |1.0|1.0|\n",
    "|C2     |k2      |5.0|7.0|\n",
    "\n",
    "Perform the first iteration of k-means: divide all students into clusters and find the centroids of these clusters.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question** When the algorithm should stop?\n",
    "\n",
    "**Question** What advantages and disadvantages of k-means clustering can you find?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. K-means with scikit-learn\n",
    "\n",
    "### 2.1. Download files mouse.csv and lines.csv. They have multiple examples described with 2 attributes.  You are given the functions to read files and plot the data. Use these functions to plot data from both files. Can you manually determine 3 clusters in each of the files?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "import numpy as np\n",
    "from matplotlib import pyplot as plt\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "def read_file(path):\n",
    "    with open(path, newline='') as csvfile:\n",
    "        reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC)\n",
    "        data = [row for row in reader]\n",
    "        data = StandardScaler().fit_transform(data)\n",
    "    return np.array(data)\n",
    "\n",
    "def plot_data(data):\n",
    "    plt.scatter(data[:,0], data[:, 1])\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO call functions above and try to find clusters in obtained datasets\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Now, let's try to use k-means on the obtained dataset. Again, you are given a function to visualize the obtained plot. Your task is to use KMeans with propoer parameters on \"mouse\" and \"lines\" datasets and see if the clusters generated by k-means are the same that you suggested in the previous exercise.\n",
    "\n",
    "See documentation and examples: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def visualize_clusters(clusters, centroids): \n",
    "    #clusters: list of numpy arrays (each array with examples in one cluster)\n",
    "    #centroids: numpy array\n",
    "    for c in clusters:\n",
    "        plt.scatter(c[:,0], c[:,1])\n",
    "    plt.scatter(centroids[:,0], centroids[:,1], marker='+', color='black', s=100)\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.cluster import KMeans\n",
    "# TODO use KMeans to cluster mouse and lines. Visualize and analyze the obtained clusters.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}