Compare revisions

manimeun · 0cac7793
--- a/Assignment02_nb.ipynb
+++ b/Assignment02_nb.ipynb
+%% Cell type:markdown id: tags:
+
+# ML4DS - Notebook for Assignment 2
+
+%% Cell type:markdown id: tags:
+
+Author: Manuel Heurich
+Credit: Maximilian Idahl
+
+%% Cell type:markdown id: tags:
+
+This notebook is part of the 2nd assignment. We will train a Decision Tree classifier using the sklearn library on the Iris dataset.
+
+Your task is to complete the missing code, where marked with a **TODO**.
+
+%% Cell type:markdown id: tags:
+
+## Imports
+
+%% Cell type:code id: tags:
+
+``` python
+import seaborn as sns
+import pandas as pd
+import matplotlib.pyplot as plt
+import graphviz
+
+from sklearn import datasets, tree
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import classification_report
+from os import system
+from IPython.display import Image
+
+%matplotlib inline
+```
+
+%% Cell type:markdown id: tags:
+
+## Load dataset
+
+%% Cell type:code id: tags:
+
+``` python
+# For further info https://archive.ics.uci.edu/ml/datasets/iris
+iris = datasets.load_iris()
+X = iris.data
+y = iris.target
+```
+
+%% Cell type:markdown id: tags:
+
+## Exploratory data analysis
+
+%% Cell type:markdown id: tags:
+
+#### Quick look into the data structure
+
+%% Cell type:code id: tags:
+
+``` python
+print(X.shape)
+print(y.shape)
+```
+
+%% Output
+
+    (150, 4)
+    (150,)
+
+%% Cell type:code id: tags:
+
+``` python
+print(X[:5,:])
+```
+
+%% Output
+
+    [[5.1 3.5 1.4 0.2]
+     [4.9 3.  1.4 0.2]
+     [4.7 3.2 1.3 0.2]
+     [4.6 3.1 1.5 0.2]
+     [5.  3.6 1.4 0.2]]
+
+%% Cell type:code id: tags:
+
+``` python
+# Using pandas
+data = pd.concat([pd.DataFrame(X),pd.DataFrame(y)], axis=1)
+data.columns=['a','b','c','d','target']
+data.head(5)
+```
+
+%% Output
+
+         a    b    c    d  target
+    0  5.1  3.5  1.4  0.2       0
+    1  4.9  3.0  1.4  0.2       0
+    2  4.7  3.2  1.3  0.2       0
+    3  4.6  3.1  1.5  0.2       0
+    4  5.0  3.6  1.4  0.2       0
+
+%% Cell type:markdown id: tags:
+
+#### Exemplary plots
+
+%% Cell type:code id: tags:
+
+``` python
+plt.figure(figsize=(8,6))
+sns.scatterplot(x=X[:,0], y=X[:,1], hue=y)
+plt.xlabel('Sepal length')
+plt.ylabel('Sepal width')
+```
+
+%% Output
+
+    Text(0, 0.5, 'Sepal width')
+
+
+
+%% Cell type:code id: tags:
+
+``` python
+# Univariate hist_plot 'sepal_length'
+class0_index = [i for i, j in enumerate(y) if j==0]
+class1_index = [i for i, j in enumerate(y) if j==1]
+class2_index = [i for i, j in enumerate(y) if j==2]
+
+sns.histplot(data=X, x=X[:,0], hue=y, element='step')
+plt.xlabel('Sepal length')
+plt.legend(('class1', 'class2','class3'))
+```
+
+%% Output
+
+    <matplotlib.legend.Legend at 0x139cc7f10>
+
+
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Barplot over 'sepal-width'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Boxplot of all features
+```
+
+%% Cell type:markdown id: tags:
+
+## Classification using decision trees
+
+%% Cell type:markdown id: tags:
+
+#### Data preparation
+
+%% Cell type:code id: tags:
+
+``` python
+# Split data
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
+X_train.shape
+```
+
+%% Output
+
+    (120, 4)
+
+%% Cell type:markdown id: tags:
+
+#### Train DT classifier using sklearn + Visualization + Evaluation
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Train a DT classifier
+clf = None
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Visualize clf: Export to .png image file
+
+# tree.export_graphviz(clf, out_file='tree.dot')
+# system("dot -Tpng tree.dot -o tree1.png")
+# Image("tree1.png")
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Evaluation of the classifier's performance
+```
+
+%% Cell type:markdown id: tags:
+
+#### Train a second DT classifier using the Entropy instead of the Gini-Index (default)
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Train the second classifier
+clf2 = None
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Visualize clf #2
+
+# tree.export_graphviz(clf2, out_file='tree2.dot')
+# system("dot -Tpng tree2.dot -o tree2.png")
+# Image("tree2.png")
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Evaluation of the classifier's performance
+```
+%% Cell type:markdown id: tags:
+
+# ML4DS - Notebook for Assignment 2
+
+%% Cell type:markdown id: tags:
+
+Author: Manuel Heurich
+Credit: Maximilian Idahl
+
+%% Cell type:markdown id: tags:
+
+This notebook is part of the 2nd assignment. We will train a Decision Tree classifier using the sklearn library on the Iris dataset.
+
+Your task is to complete the missing code, where marked with a **TODO**.
+
+%% Cell type:markdown id: tags:
+
+## Imports
+
+%% Cell type:code id: tags:
+
+``` python
+import seaborn as sns
+import pandas as pd
+import matplotlib.pyplot as plt
+import graphviz
+
+from sklearn import datasets, tree
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import classification_report
+from os import system
+from IPython.display import Image
+
+%matplotlib inline
+```
+
+%% Cell type:markdown id: tags:
+
+## Load dataset
+
+%% Cell type:code id: tags:
+
+``` python
+# For further info https://archive.ics.uci.edu/ml/datasets/iris
+iris = datasets.load_iris()
+X = iris.data
+y = iris.target
+```
+
+%% Cell type:markdown id: tags:
+
+## Exploratory data analysis
+
+%% Cell type:markdown id: tags:
+
+#### Quick look into the data structure
+
+%% Cell type:code id: tags:
+
+``` python
+print(X.shape)
+print(y.shape)
+```
+
+%% Output
+
+    (150, 4)
+    (150,)
+
+%% Cell type:code id: tags:
+
+``` python
+print(X[:5,:])
+```
+
+%% Output
+
+    [[5.1 3.5 1.4 0.2]
+     [4.9 3.  1.4 0.2]
+     [4.7 3.2 1.3 0.2]
+     [4.6 3.1 1.5 0.2]
+     [5.  3.6 1.4 0.2]]
+
+%% Cell type:code id: tags:
+
+``` python
+# Using pandas
+data = pd.concat([pd.DataFrame(X),pd.DataFrame(y)], axis=1)
+data.columns=['a','b','c','d','target']
+data.head(5)
+```
+
+%% Output
+
+         a    b    c    d  target
+    0  5.1  3.5  1.4  0.2       0
+    1  4.9  3.0  1.4  0.2       0
+    2  4.7  3.2  1.3  0.2       0
+    3  4.6  3.1  1.5  0.2       0
+    4  5.0  3.6  1.4  0.2       0
+
+%% Cell type:markdown id: tags:
+
+#### Exemplary plots
+
+%% Cell type:code id: tags:
+
+``` python
+plt.figure(figsize=(8,6))
+sns.scatterplot(x=X[:,0], y=X[:,1], hue=y)
+plt.xlabel('Sepal length')
+plt.ylabel('Sepal width')
+```
+
+%% Output
+
+    Text(0, 0.5, 'Sepal width')
+
+
+
+%% Cell type:code id: tags:
+
+``` python
+# Univariate hist_plot 'sepal_length'
+class0_index = [i for i, j in enumerate(y) if j==0]
+class1_index = [i for i, j in enumerate(y) if j==1]
+class2_index = [i for i, j in enumerate(y) if j==2]
+
+sns.histplot(data=X, x=X[:,0], hue=y, element='step')
+plt.xlabel('Sepal length')
+plt.legend(('class1', 'class2','class3'))
+```
+
+%% Output
+
+    <matplotlib.legend.Legend at 0x139cc7f10>
+
+
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Barplot over 'sepal-width'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Boxplot of all features
+```
+
+%% Cell type:markdown id: tags:
+
+## Classification using decision trees
+
+%% Cell type:markdown id: tags:
+
+#### Data preparation
+
+%% Cell type:code id: tags:
+
+``` python
+# Split data
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
+X_train.shape
+```
+
+%% Output
+
+    (120, 4)
+
+%% Cell type:markdown id: tags:
+
+#### Train DT classifier using sklearn + Visualization + Evaluation
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Train a DT classifier
+clf = None
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Visualize clf: Export to .png image file
+
+# tree.export_graphviz(clf, out_file='tree.dot')
+# system("dot -Tpng tree.dot -o tree1.png")
+# Image("tree1.png")
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Evaluation of the classifier's performance
+```
+
+%% Cell type:markdown id: tags:
+
+#### Train a second DT classifier using the Entropy instead of the Gini-Index (default)
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Train the second classifier
+clf2 = None
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Visualize clf #2
+
+# tree.export_graphviz(clf2, out_file='tree2.dot')
+# system("dot -Tpng tree2.dot -o tree2.png")
+# Image("tree2.png")
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: Evaluation of the classifier's performance
+```
No results found