Skip to content
Snippets Groups Projects
Commit 94b1d233 authored by Heurich's avatar Heurich
Browse files

Add Assignment02

parent 2f19575e
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Assignment Sheet 2
%% Cell type:markdown id: tags:
This notebook is part of the 2nd assignment. We will train a Decision Tree classifier using the sklearn library on the Iris dataset.
Your task is to complete the missing code, where marked with a **TODO**.
%% Cell type:markdown id: tags:
## Imports
%% Cell type:code id: tags:
``` python
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import graphviz
from sklearn import datasets, tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from os import system
from IPython.display import Image
%matplotlib inline
```
%% Cell type:markdown id: tags:
## Load dataset
%% Cell type:code id: tags:
``` python
# For further info https://archive.ics.uci.edu/ml/datasets/iris
iris = datasets.load_iris()
X = iris.data
y = iris.target
```
%% Cell type:markdown id: tags:
## Exploratory data analysis
%% Cell type:markdown id: tags:
#### Quick look into the data structure
%% Cell type:code id: tags:
``` python
print(X.shape)
print(y.shape)
```
%% Cell type:code id: tags:
``` python
print(X[:5,:])
```
%% Cell type:code id: tags:
``` python
# Using pandas
data = pd.concat([pd.DataFrame(X),pd.DataFrame(y)], axis=1)
data.columns=['a','b','c','d','target']
data.head(5)
```
%% Cell type:markdown id: tags:
#### Exemplary plots
%% Cell type:code id: tags:
``` python
plt.figure(figsize=(8,6))
sns.scatterplot(x=X[:,0], y=X[:,1], hue=y)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
```
%% Cell type:code id: tags:
``` python
# Univariate hist_plot 'sepal_length'
class0_index = [i for i, j in enumerate(y) if j==0]
class1_index = [i for i, j in enumerate(y) if j==1]
class2_index = [i for i, j in enumerate(y) if j==2]
sns.histplot(data=X, x=X[:,0], hue=y, element='step')
plt.xlabel('Sepal length')
plt.legend(('class1', 'class2','class3'))
```
%% Cell type:code id: tags:
``` python
# TODO: Barplot over 'sepal-width'
```
%% Cell type:code id: tags:
``` python
# TODO: Boxplot of all features
```
%% Cell type:markdown id: tags:
## Classification using decision trees
%% Cell type:markdown id: tags:
#### Data preparation
%% Cell type:code id: tags:
``` python
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train.shape
```
%% Cell type:markdown id: tags:
#### Train DT classifier using sklearn; Visualization; Evaluation
%% Cell type:code id: tags:
``` python
# TODO: Train a DT classifier
```
%% Cell type:code id: tags:
``` python
# Visualize: Export to .png image file
tree.export_graphviz(clf, out_file='tree.dot')
system("dot -Tpng tree.dot -o tree1.png")
Image("tree1.png")
```
%% Cell type:code id: tags:
``` python
# TODO: Evaluation the classifier's performance
```
%% Cell type:markdown id: tags:
#### Train a second DT classifier using the Entropy instead of the Gini-Index (default)
%% Cell type:code id: tags:
``` python
# TODO: Train the second classifier
```
%% Cell type:code id: tags:
``` python
# Visualize #2
tree.export_graphviz(clf2, out_file='tree2.dot')
system("dot -Tpng tree2.dot -o tree2.png")
Image("tree2.png")
```
%% Cell type:code id: tags:
``` python
# TODO: Evaluation
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment