Skip to content
Snippets Groups Projects
Commit 9a60c190 authored by manueh51's avatar manueh51
Browse files

Delete Assignment02_nb.ipynb

parent 94b1d233
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Assignment Sheet 2
%% Cell type:markdown id: tags:
This notebook is part of the 2nd assignment. We will train a Decision Tree classifier using the sklearn library on the Iris dataset.
Your task is to complete the missing code, where marked with a **TODO**.
%% Cell type:markdown id: tags:
## Imports
%% Cell type:code id: tags:
``` python
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import graphviz
from sklearn import datasets, tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from os import system
from IPython.display import Image
%matplotlib inline
```
%% Cell type:markdown id: tags:
## Load dataset
%% Cell type:code id: tags:
``` python
# For further info https://archive.ics.uci.edu/ml/datasets/iris
iris = datasets.load_iris()
X = iris.data
y = iris.target
```
%% Cell type:markdown id: tags:
## Exploratory data analysis
%% Cell type:markdown id: tags:
#### Quick look into the data structure
%% Cell type:code id: tags:
``` python
print(X.shape)
print(y.shape)
```
%% Cell type:code id: tags:
``` python
print(X[:5,:])
```
%% Cell type:code id: tags:
``` python
# Using pandas
data = pd.concat([pd.DataFrame(X),pd.DataFrame(y)], axis=1)
data.columns=['a','b','c','d','target']
data.head(5)
```
%% Cell type:markdown id: tags:
#### Exemplary plots
%% Cell type:code id: tags:
``` python
plt.figure(figsize=(8,6))
sns.scatterplot(x=X[:,0], y=X[:,1], hue=y)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
```
%% Cell type:code id: tags:
``` python
# Univariate hist_plot 'sepal_length'
class0_index = [i for i, j in enumerate(y) if j==0]
class1_index = [i for i, j in enumerate(y) if j==1]
class2_index = [i for i, j in enumerate(y) if j==2]
sns.histplot(data=X, x=X[:,0], hue=y, element='step')
plt.xlabel('Sepal length')
plt.legend(('class1', 'class2','class3'))
```
%% Cell type:code id: tags:
``` python
# TODO: Barplot over 'sepal-width'
```
%% Cell type:code id: tags:
``` python
# TODO: Boxplot of all features
```
%% Cell type:markdown id: tags:
## Classification using decision trees
%% Cell type:markdown id: tags:
#### Data preparation
%% Cell type:code id: tags:
``` python
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train.shape
```
%% Cell type:markdown id: tags:
#### Train DT classifier using sklearn; Visualization; Evaluation
%% Cell type:code id: tags:
``` python
# TODO: Train a DT classifier
```
%% Cell type:code id: tags:
``` python
# Visualize: Export to .png image file
tree.export_graphviz(clf, out_file='tree.dot')
system("dot -Tpng tree.dot -o tree1.png")
Image("tree1.png")
```
%% Cell type:code id: tags:
``` python
# TODO: Evaluation the classifier's performance
```
%% Cell type:markdown id: tags:
#### Train a second DT classifier using the Entropy instead of the Gini-Index (default)
%% Cell type:code id: tags:
``` python
# TODO: Train the second classifier
```
%% Cell type:code id: tags:
``` python
# Visualize #2
tree.export_graphviz(clf2, out_file='tree2.dot')
system("dot -Tpng tree2.dot -o tree2.png")
Image("tree2.png")
```
%% Cell type:code id: tags:
``` python
# TODO: Evaluation
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment