Credit: BecomingHuman

Going to the Application Part of the Article on a Practical Practice Dataset. The Dataset contains marks obtained by 100 students in 2 exams and the label (0/1), that indicates whether the student will be admitted to a university (1 or negative) or not (0 or positive). The Dataset is available at

**Problem Statement**: “*Given the marks obtained in 2 exams, predict whether the student will be admitted to the university or not using Logistic Regression*”

Data Reading into Numpy Arrays :

data = np.loadtxt('dataset.txt', delimiter=',')

X_train = data[:,[0,1]] # feature-set

y_train = data[:,2] # label-set

Scatter Plot Visualization of the Dataset:

x0 = np.ones((np.array([x for x in y_train if x == 0]).shape[0],

X_train.shape[1]))

x1 = np.ones((np.array([x for x in y_train if x == 1]).shape[0],

X_train.shape[1]))

#x0 and x1 are matrices containing +ve and -ve examples from the

#dataset, initialized to 1

k0 = k1 = 0

for i in range(0,y_train.shape[0]):

if y_train[i] == 0:

x0[k0] = X_train[i]

k0 = k0 + 1

else:

x1[k1] = X_train[i]

k1 = k1 + 1

X = [x0, x1]

colors = ["green", "blue"] # 2 distinct colours for 2 classes

import matplotlib.pyplot as plt

for x, c in zip(X, colors):

if c == "green":

plt.scatter(x[:,0],x[:,1],color = c,label = "Not Admitted")

else:

plt.scatter(x[:,0], x[:,1], color = c, label = "Admitted")

plt.xlabel("Marks obtained in 1st Exam")

plt.ylabel("Marks obtained in 2nd Exam")

plt.legend()

Running the 3-module-Logistic Regression:

# calling the principal function withlearning_rate = 0.001and

#num_iters = 100000theta,theta_history,cost=logistic_regression(X_train,y_train,0.001)

The theta output comes out to be:

The gradual reduction in Cost Function is also visualized using Line Plot:

import matplotlib.pyplot as plt

cost = list(cost)

n_iterations = [x for x in range(1,100001)]

plt.plot(n_iterations, cost)

plt.xlabel('No. of iterations')

plt.ylabel('Cost')

The Line-Curve turns out to be:

**Model Performance Analysis**:

In classification, model performance analysis is done on the following metrics:

Getting the predictions…

X_train = np.concatenate((np.ones((X_train.shape[0],1)), X_train)

, axis = 1)

h = hypothesis(theta, X_train, X_train.shape[1] - 1)

# Taking0.5 as threshold:

for i in range(0, h.shape[0]):

if h[i] > 0.5:

h[i] = 1

else:

h[i] = 0

=>Accuracy: Ratio of no. of correctly predicted samples to total no. of samples.

Finding Accuracy:

k = 0

for i in range(0, h.shape[0]):

if h[i] == y_train[i]:

k = k + 1

accuracy = k/y_train.shape[0]

The output of accuracy comes out to be:

=>Precision: Ratio of no. of correctly predicted positive observations to the total no. of predicted positive observations.

Finding Precision:

tp = fp = 0

# tp -> True Positive, fp -> False Positive

for i in range(0, h.shape[0]):

if h[i] == y_train[i] == 0:

tp = tp + 1

elif h[i] == 0 and y_train[i] == 1:

fp = fp + 1

precision = tp/(tp + fp)

=>Recall: Proportion of correctly identified positives.

Finding Recall:

fn = 0

# fn -> False Negatives

for i in range(0, h.shape[0]):

if h[i] == 1 and y_train[i] == 0:

fn = fn + 1

recall = tp/(tp + fn)

=>F1-Score: Harmonic Mean of Precision and Recall

Finding F1-Score:

f1_score = (2 * precision * recall)/(precision + recall)

Confusion Matrix:

tn = 0

# tn -> True Negative

for i in range(0, h.shape[0]):

if h[i] == y_train[i] == 1

tn = tn + 1

cm = np.array([[tp, fn], [fp, tn]])

# MODULE FOR CONFUSION MATRIX

import matplotlib.pyplot as plt

%matplotlib inline

import numpy as np

import itertools

defplot_confusion_matrix(cm, classes,

normalize=False,

title='Confusion matrix',

cmap=plt.cm.Blues):

"""

This function prints and plots the confusion matrix.

Normalization can be applied by setting `normalize=True`.

"""

if normalize:

cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

print("Normalized confusion matrix")

else:

print('Confusion matrix, without normalization')

print(cm)

plt.imshow(cm, interpolation='nearest', cmap=cmap)

plt.title(title)

plt.colorbar()

tick_marks = np.arange(len(classes))

plt.xticks(tick_marks, classes, rotation=45)

plt.yticks(tick_marks, classes)

fmt = '.2f' if normalize else 'd'

thresh = cm.max() / 2.

for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):

plt.text(j, i, format(cm[i, j], fmt),

horizontalalignment="center",

color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()

plt.ylabel('True label')

plt.xlabel('Predicted label')

plt.figure()

# Un-Normalized Confusion Matrix...

plot_confusion_matrix(cm, classes=[0,1], normalize=False,

title='Unnormalized Confusion Matrix')

# Normalized Confusion Matrix...

plot_confusion_matrix(cm, classes=[0,1], normalize=True,

title='Normalized Confusion Matrix')

Credit: BecomingHuman By: Navoneel Chakrabarty