Credit: BecomingHuman
Going to the Application Part of the Article on a Practical Practice Dataset. The Dataset contains marks obtained by 100 students in 2 exams and the label (0/1), that indicates whether the student will be admitted to a university (1 or negative) or not (0 or positive). The Dataset is available at
Problem Statement: “Given the marks obtained in 2 exams, predict whether the student will be admitted to the university or not using Logistic Regression”
Data Reading into Numpy Arrays :
data = np.loadtxt('dataset.txt', delimiter=',')
X_train = data[:,[0,1]] # feature-set
y_train = data[:,2] # label-set
Scatter Plot Visualization of the Dataset:
x0 = np.ones((np.array([x for x in y_train if x == 0]).shape[0],
X_train.shape[1]))
x1 = np.ones((np.array([x for x in y_train if x == 1]).shape[0],
X_train.shape[1]))
#x0 and x1 are matrices containing +ve and -ve examples from the
#dataset, initialized to 1
k0 = k1 = 0
for i in range(0,y_train.shape[0]):
if y_train[i] == 0:
x0[k0] = X_train[i]
k0 = k0 + 1
else:
x1[k1] = X_train[i]
k1 = k1 + 1
X = [x0, x1]
colors = ["green", "blue"] # 2 distinct colours for 2 classes
import matplotlib.pyplot as plt
for x, c in zip(X, colors):
if c == "green":
plt.scatter(x[:,0],x[:,1],color = c,label = "Not Admitted")
else:
plt.scatter(x[:,0], x[:,1], color = c, label = "Admitted")
plt.xlabel("Marks obtained in 1st Exam")
plt.ylabel("Marks obtained in 2nd Exam")
plt.legend()
Running the 3-module-Logistic Regression:
# calling the principal function with learning_rate = 0.001 and
# num_iters = 100000
theta,theta_history,cost=logistic_regression(X_train,y_train,0.001)
The theta output comes out to be:
The gradual reduction in Cost Function is also visualized using Line Plot:
import matplotlib.pyplot as plt
cost = list(cost)
n_iterations = [x for x in range(1,100001)]
plt.plot(n_iterations, cost)
plt.xlabel('No. of iterations')
plt.ylabel('Cost')
The Line-Curve turns out to be:
Model Performance Analysis:
In classification, model performance analysis is done on the following metrics:
Getting the predictions…
X_train = np.concatenate((np.ones((X_train.shape[0],1)), X_train)
, axis = 1)
h = hypothesis(theta, X_train, X_train.shape[1] - 1)
# Taking 0.5 as threshold:
for i in range(0, h.shape[0]):
if h[i] > 0.5:
h[i] = 1
else:
h[i] = 0
=>Accuracy: Ratio of no. of correctly predicted samples to total no. of samples.
Finding Accuracy:
k = 0
for i in range(0, h.shape[0]):
if h[i] == y_train[i]:
k = k + 1
accuracy = k/y_train.shape[0]
The output of accuracy comes out to be:
=>Precision: Ratio of no. of correctly predicted positive observations to the total no. of predicted positive observations.
Finding Precision:
tp = fp = 0
# tp -> True Positive, fp -> False Positive
for i in range(0, h.shape[0]):
if h[i] == y_train[i] == 0:
tp = tp + 1
elif h[i] == 0 and y_train[i] == 1:
fp = fp + 1
precision = tp/(tp + fp)

=>Recall: Proportion of correctly identified positives.
Finding Recall:
fn = 0
# fn -> False Negatives
for i in range(0, h.shape[0]):
if h[i] == 1 and y_train[i] == 0:
fn = fn + 1
recall = tp/(tp + fn)

=>F1-Score: Harmonic Mean of Precision and Recall
Finding F1-Score:
f1_score = (2 * precision * recall)/(precision + recall)

Confusion Matrix:
tn = 0
# tn -> True Negative
for i in range(0, h.shape[0]):
if h[i] == y_train[i] == 1
tn = tn + 1
cm = np.array([[tp, fn], [fp, tn]])
# MODULE FOR CONFUSION MATRIX
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import itertools
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')
print(cm)
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, format(cm[i, j], fmt),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.figure()
# Un-Normalized Confusion Matrix...
plot_confusion_matrix(cm, classes=[0,1], normalize=False,
title='Unnormalized Confusion Matrix')
# Normalized Confusion Matrix...
plot_confusion_matrix(cm, classes=[0,1], normalize=True,
title='Normalized Confusion Matrix')
Credit: BecomingHuman By: Navoneel Chakrabarty