{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from ipywidgets import *\n", "import matplotlib.pyplot as plt\n", "from IPython.display import set_matplotlib_formats\n", "set_matplotlib_formats('svg')\n", "import numpy as np\n", "import scipy.stats as stats" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def regression(slope=5, sd=5, show=False):\n", " fig, axes = plt.subplots(figsize=(7,7))\n", " x = np.linspace(0, 1, 100)\n", " yt = 1 + slope * x\n", " plt.plot(x, yt)\n", " y = yt+ np.random.normal(0,sd,100)\n", " plt.scatter(x, y)\n", " my = np.mean(y)\n", " ssr = sum((yt-my)**2)\n", " sse = sum((y-yt)**2)\n", " sst = ssr+sse\n", " plt.ylim(-30,30)\n", " \n", " if show:\n", " plt.title(\"SST=\"+str(round(sst,2))+\" SSR=\"+str(round(ssr, 2))+\" SSE=\"+str(round(sse,2)))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analiza regresji" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f4b5b1ece0f944459554ec8df7b577c0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(IntSlider(value=5, description='sd', max=10), Checkbox(value=False, description='show'),…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "interact(regression,slope=fixed(5), sd=(0,10,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rozkład zmienności Y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![SST](https://www.cs.put.poznan.pl/amensfelt/pub/SST.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\\underbrace{Y-\\bar{Y}}_{\\textrm{odchylenie całkowite}} = \\underbrace{\\hat{Y} - \\bar{Y}}_{\\substack{\\textrm{odchylenie wyjaśnione} \\\\ \\text{regresją}}} + \\underbrace{Y-\\hat{Y}}_{\\substack{\\text{odchylenie niewyjaśnione} \\\\ \\text{regresją}}}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\\underbrace{\\sum\\limits_{i=1}^n (y_i - \\bar{y})^2}_{\\substack{\\text{całkowita suma} \\\\ \\text{kwadratów odchyleń} \\\\ \\text{SST}}} = \\underbrace{\\sum\\limits_{i=1}^n (\\hat{y_i}-\\bar{y})^2}_{\\substack{\\text{regresyjna suma} \\\\ \\text{kwadratów odchyleń}\\\\ \\text{SSR}}}+\\underbrace{\\sum\\limits_{i=1}^n (y_i-\\hat{y_i})^2}_{\\substack{\\text{resztowa suma} \\\\ \\text{kwadratów odchyleń}\\\\ \\text{SSE}}}$$" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "886b21005e2641b78276a36b5c5e83ea", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(IntSlider(value=5, description='sd', max=10), Checkbox(value=False, description='show'),…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "interact(regression,slope=fixed(0), sd=(0,10,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- dla $\\hat{y}_i=y_i$:\n", "\n", " $$ SSE = \\sum\\limits_{i=1}^n (y_i-\\hat{y_i})^2 = 0 $$\n", " \n", "- dla $b_1=0$:\n", "\n", " $$SSR = \\sum\\limits_{i=1}^n (\\hat{y_i}-\\bar{y})^2 = \\sum\\limits_{i=1}^n (b_0+b_1x_i-\\bar{y})^2=n(b_0-\\bar{y})^2=n(\\bar{y}-b_1\\bar{x}-\\bar{y})^2=0$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Współczynnik determinacji" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Całkowita suma kwadratów odchyleń:\n", "\n", "$$SST = SSR + SSE$$\n", "\n", "- Współczynnik determinacji:\n", "\n", "$$R^2 = \\frac{SSR}{SST} = 1 - \\frac{SSE}{SST}$$\n", "\n", "- $R^2$ jest kwadratem współczynnika korelacji:\n", "\n", "$$R^2 = \\frac{SSR}{SST} = \\frac{\\sum\\limits_{i=1}^n (\\hat{y_i}-\\bar{y})^2}{\\sum\\limits_{i=1}^n (y_i-\\bar{y})^2} = \\frac{\\sum\\limits_{i=1}^n (b_1x_i+b_0-\\bar{y})^2}{\\sum\\limits_{i=1}^n (y_i-\\bar{y})^2} = \\frac{\\sum\\limits_{i=1}^n (b_1x_i+\\bar{y}-b_1\\bar{x}-\\bar{y})^2}{\\sum\\limits_{i=1}^n (y_i-\\bar{y})^2} = \\frac{\\sum\\limits_{i=1}^n b_1^2(x_i-\\bar{x})^2}{\\sum\\limits_{i=1}^n (y_i-\\bar{y})^2} = b_1^2\\frac{S_X^2}{S_Y^2}=r^2\\frac{S_Y^2}{S_X^2}\\frac{S_X^2}{S_Y^2}=r^2$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Regresja wieloraka" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$Y=\\beta_0+\\beta_1X_1+\\beta_2X_2+\\epsilon$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![regresja](https://www.cs.put.poznan.pl/amensfelt/pub/reg3d.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Skorygowany współczynnik determinacji:\n", "\n", "$$R^2_{adj} = 1 - (1 - R^2) \\frac{n-1}{n-k-1}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Globalny test istotności (F)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Układ hipotez:\n", "\n", "$\\;\\;\\;\\;\\;H_0: \\beta_1=\\beta_2=...=\\beta_k=0$\n", "
$\\;\\;\\;\\;\\;H_1:$ Nie wszystkie $\\beta_i$ (i=1, 2, ..., k) sa równe $0$\n", "\n", "- Statystyka testowa:\n", " - $n$ - liczba obserwacji\n", " - $k$ - liczba zmiennych objaśniających\n", " \n", "|Suma kwadratów odchyleń | df | Średnie odchylenie kwadratowe |\n", "|-|-|-|\n", "|SSR | k | $MSR=\\frac{SSR}{k}$ | \n", "|SSE | n-(k+1) | $MSE = \\frac{SSE}{n-(k+1)}$ |\n", " \n", "$$F=\\frac{MSR}{MSE} \\sim F(k, n-(k+1))$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Z próbki o liczności $n=24$ zbudowano model regresji wielorakiej wykorzystując 3 zmienne objaśniające. Uzyskano $SSR=36$ i $SSE=20$. Zweryfikuj statytyczną istotność modelu na poziomie $\\alpha=0.05$. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$H_0:$\n", "
$H_1:$\n", "\n", "$C_{kr}$ = (3.098, inf)\n", "\n", "MSR=\n", "\n", "MSE=\n", "\n", "F=" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test istotności parametru modelu (t)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Układ hipotez:\n", "\n", "$\\;\\;\\;\\;\\;H_0: \\beta_i=0$\n", "
$\\;\\;\\;\\;\\;H_1: \\beta_i\\neq0$\n", "\n", "\n", "- Statystyka testowa:\n", "\n", "\t$$t = \\frac{b_i}{S(b_i)} \\sim t(n-(k+1))$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Standardowy błąd oszacowania" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Suma kwadratów błędów (rezyduów):\n", "\n", "$$SSE = \\sum(Y-\\hat{Y})^2$$\n", "\n", "- MSE:\n", "\n", "$$MSE=\\frac{SSE}{n-(k+1)}$$\n", "\n", "- Standardowy błąd oszacowania:\n", "\n", "$$S=\\sqrt{MSE}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Błędy standardowe parametrów" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$S(b_0) = S * \\sqrt{\\frac{\\sum\\limits_{i=1}^nx_i^2}{n*\\sum\\limits_{i=1}^n(x_i-\\bar{x})^2}}$$\n", "\n", "$$S(b_1) = S*\\frac{1}{\\sqrt{\\sum\\limits_{i=1}^n(x_i-\\bar{x})}}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Wykresy diagnostyczne](https://gallery.shinyapps.io/slr_diag/)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false } }, "nbformat": 4, "nbformat_minor": 4 }