Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • main
1 result

Target

Select target project
  • littledevil/ml4ds_tutorials
  • manueh51/ml4ds_tutorials
2 results
Select Git revision
  • main
1 result
Show changes
Commits on Source (1)
%% Cell type:markdown id: tags:
# ML4DS - Notebook for Assignment 2
%% Cell type:markdown id: tags:
Author: Manuel Heurich
Credit: Maximilian Idahl
%% Cell type:markdown id: tags:
This notebook is part of the 2nd assignment. We will train a Decision Tree classifier using the sklearn library on the Iris dataset.
Your task is to complete the missing code, where marked with a **TODO**.
%% Cell type:markdown id: tags:
## Imports
%% Cell type:code id: tags:
``` python
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import graphviz
from sklearn import datasets, tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from os import system
from IPython.display import Image
%matplotlib inline
```
%% Cell type:markdown id: tags:
## Load dataset
%% Cell type:code id: tags:
``` python
# For further info https://archive.ics.uci.edu/ml/datasets/iris
iris = datasets.load_iris()
X = iris.data
y = iris.target
```
%% Cell type:markdown id: tags:
## Exploratory data analysis
%% Cell type:markdown id: tags:
#### Quick look into the data structure
%% Cell type:code id: tags:
``` python
print(X.shape)
print(y.shape)
```
%% Output
(150, 4)
(150,)
%% Cell type:code id: tags:
``` python
print(X[:5,:])
```
%% Output
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]]
%% Cell type:code id: tags:
``` python
# Using pandas
data = pd.concat([pd.DataFrame(X),pd.DataFrame(y)], axis=1)
data.columns=['a','b','c','d','target']
data.head(5)
```
%% Output
a b c d target
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
%% Cell type:markdown id: tags:
#### Exemplary plots
%% Cell type:code id: tags:
``` python
plt.figure(figsize=(8,6))
sns.scatterplot(x=X[:,0], y=X[:,1], hue=y)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
```
%% Output
Text(0, 0.5, 'Sepal width')
%% Cell type:code id: tags:
``` python
# Univariate hist_plot 'sepal_length'
class0_index = [i for i, j in enumerate(y) if j==0]
class1_index = [i for i, j in enumerate(y) if j==1]
class2_index = [i for i, j in enumerate(y) if j==2]
sns.histplot(data=X, x=X[:,0], hue=y, element='step')
plt.xlabel('Sepal length')
plt.legend(('class1', 'class2','class3'))
```
%% Output
<matplotlib.legend.Legend at 0x139cc7f10>
%% Cell type:code id: tags:
``` python
# TODO: Barplot over 'sepal-width'
```
%% Cell type:code id: tags:
``` python
# TODO: Boxplot of all features
```
%% Cell type:markdown id: tags:
## Classification using decision trees
%% Cell type:markdown id: tags:
#### Data preparation
%% Cell type:code id: tags:
``` python
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train.shape
```
%% Output
(120, 4)
%% Cell type:markdown id: tags:
#### Train DT classifier using sklearn + Visualization + Evaluation
%% Cell type:code id: tags:
``` python
# TODO: Train a DT classifier
clf = None
```
%% Cell type:code id: tags:
``` python
# Visualize clf: Export to .png image file
# tree.export_graphviz(clf, out_file='tree.dot')
# system("dot -Tpng tree.dot -o tree1.png")
# Image("tree1.png")
```
%% Cell type:code id: tags:
``` python
# TODO: Evaluation of the classifier's performance
```
%% Cell type:markdown id: tags:
#### Train a second DT classifier using the Entropy instead of the Gini-Index (default)
%% Cell type:code id: tags:
``` python
# TODO: Train the second classifier
clf2 = None
```
%% Cell type:code id: tags:
``` python
# Visualize clf #2
# tree.export_graphviz(clf2, out_file='tree2.dot')
# system("dot -Tpng tree2.dot -o tree2.png")
# Image("tree2.png")
```
%% Cell type:code id: tags:
``` python
# TODO: Evaluation of the classifier's performance
```