Shengbin's Studio.

Support Vector Machine Implementation With Python

2018/03/01

In this post, a SVM classifier is implemented. The SVM is implemented with “Hard Margin” and “Soft Margin”. The “Hard Margin” is used to classify separable data, while the soft margin is used to classifier inseparable data. In addition, kernel can be used in this SVM classifier. In this post, the SVM is implemented with linear kernel and Gaussian Kernel(RBF kernel).

SVM Classifier

We will use a Quadratic Program Solver CVXOPT to solve the Lagrangian of SVM. A tutorial of CVXOPT can be found here (cvsopt).

1
2
3
4
5
import numpy as np
import matplotlib.pyplot as plt
import cvxopt
from cvxopt import matrix
from cvxopt import solvers

Hard Margin SVM

The Primal problem of SVM is

The Lagrangian of SVM is

The $KKT$ condition is

The Dual Problem is

This dual problem can be solved by a quadratic program solver.
That is

The optimal value of $\alpha_i$ can be obtained by solving the above quadratic problem using CVXOPT.

Making prediction using Linear Kernel

For linear kernel, we make prediction by

we can compute the $\lambda^{\star}$ by

Then we can use the $\lambda^{\star}$ and support vectors to compute $\lambda_0^{\star}$ by

For this question, all the support vectors are used to compute the $\lambda_0^{\star}$ and then take the average as the final $\lambda_0^{\star}$

Gaussian Kernel(RBF)

For a non-linear kernel, if we do not know how the kernel map features to a new feature space, the $\lambda^{\star}$ can not be computed by $\lambda^{\star} = \sum_{i=1}^n \alpha_i^{\star} y_i \mathbf{x_i^{\mathcal{H_k}}}$ if we do not know the $\mathbf{x_i^{\mathcal{H_k}}}$
However, all we need to know is the value of inner product a $\mathbf{x_i^{\mathcal{H_k}}}$ in the new feature space.
For solving the Dual problem,

we just need to know the value of $\langle x_i, x_k \rangle$
For making prediction,

We know that we just need to know the inner product value of $x$ in the new feature space to make a prediction.
Thus, for the SVM using Gaussian kernel, all we need is the value of two vector in the Gaussian kernel. Then we can use this inner product value to solve the Lagrangian and make prediction.

Soft Margin

The Primal problem of SVM is

The Lagrangian form of this prime is

The Dual problem is

The only difference is $0 \leq \alpha_i \leq C \text{ }\forall i$. To solve the quadratic problem, all we need to change is the matrix $\mathbf{P}$, $\mathbf{G}$ and $\mathbf{h}$.

The code of SVM implemented in Python is shown as below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
class SVM:
## default kernel is linear
## default sigma for gaussian kernel is 5.0
def __init__(self, kernel='linear', margin='hard', C=1, sigma=5):
self.k = kernel
self.a = None
self.sv_x = None
self.sv_y = None
self.w = None
self.w_0 = None
self.sigma = sigma
self.margin = margin
self.C = C

def kernel(self, x1, x2):
if self.k == 'linear':
return np.dot(x1, x2)
elif self.k == 'guassian':
return np.exp(-np.linalg.norm(x1-x2)**2 / (2 * (self.sigma ** 2)))

def train(self, X, y):
caches = {}
## y is (1,-1)
self.train_x = X ## point to the training data
self.train_y = y ## point to the training data
n, f = X.shape
P = np.zeros((n, n))
K = np.zeros((n,n))
for i in range(n):
for j in range(n):
k_ij = self.kernel(X[i], X[j])
P[i,j] = y[i] * y[j] * k_ij
K[i, j] = k_ij
P = matrix(P)
q = matrix(-1 * np.ones((n, 1)))
if self.margin == 'hard':
G = matrix(-1 * np.identity(n))
h = matrix(np.zeros((n, 1)))
if self.margin == 'soft':
G = matrix(np.vstack((-1 * np.identity(n), 1 * np.identity(n))))
h = matrix(np.vstack((np.zeros((n,1)), self.C * np.ones((n, 1)))))

A = matrix(y.reshape(1, n))
b = matrix(0.0)
sol = solvers.qp(P,q,G,h,A,b)
a = np.ravel(sol['x'])

sv_idx = np.where(a > 1e-5)
self.sv_x = X[sv_idx]
self.sv_y = y[sv_idx]
self.a = a[sv_idx]

if self.k == 'linear':
self.w = np.dot((self.a * self.sv_y), self.sv_x)
self.w_0 = np.mean(self.sv_y - np.dot(self.sv_x, self.w))
print("The computed intercept term is {0:f}".format(self.w_0))
else:
## can not compute w
w_0s = np.zeros_like(self.a)
for i in range(len(self.a)):
first_term = 0
first_term = np.dot(self.a * self.sv_y, K[sv_idx, sv_idx[0][i]].T)
w_0s[i] = self.sv_y[i] - first_term
self.w_0 = np.mean(w_0s)
print("The computed intercept term is {0:f}".format(self.w_0))

if self.margin == 'hard' and self.evaluate_acc(X, y) < 1.0:
raise Exception('Hard margin, but data not separable')

print ("The rate of support vectors: {0:0.6f}".format(float(len(self.a))/n))
caches['sol'] = sol
caches['K'] = K
caches['sv_idx'] = sv_idx
return caches

def predict_no_bias(self, x):
if self.k == 'linear':
return np.dot(x, self.w)
else:
n, f = x.shape
preds = []
for i in range(n):
pred = 0
for j in range(len(self.a)):
pred += self.sv_y[j] * self.a[j] * self.kernel(self.sv_x[j], x[i])
preds.append(pred)
return np.array(preds)
def predict(self, x):
## add bias to the prediction
pred = self.predict_no_bias(x)
return np.sign(pred + self.w_0)

def evaluate_acc(self, x, y):
n , f = x.shape
pred = self.predict(x)
acc = np.sum(pred == y) / len(y)
return acc

def plot_Roc(self, x_input, y_input):
n, f = x_input.shape
pred = self.predict_no_bias(x_input)
## argsort
p = np.argsort(pred, axis=0)
x = x_input[p]
y = y_input[p] ## labels after sorting
pre = pred[p] ## prediction without bias after sorting
min_dis = min(pred)
max_dis = max(pred)

true_lbs = np.sum(y == 1)
false_lbs = np.sum(y == -1)

b = np.linspace(min_dis, max_dis, 200)
b = b[::-1]
tprs = []
fprs = []

for i, t in enumerate(b):
TP = 0
FP = 0
for j in range(n):
if pre[j] > t:
if y[j] == 1:
TP += 1
if y[j] == -1:
FP += 1

tprs.append(TP/(true_lbs))
fprs.append(FP/(false_lbs))
auc = np.trapz(tprs, fprs)
plt.figure(figsize=(5,5))
plt.plot(fprs, tprs, color='darkorange', label='ROC curve: AUC={0:0.8f}'.format(auc))
plt.xlabel('False Positive Rate',fontsize=12)
plt.ylabel('True Positive Rate',fontsize=12)
plt.legend(loc="lower right", fontsize=12)
plt.show()
return tprs, fprs

That’s it.

CATALOG
  1. 1. SVM Classifier
    1. 1.1. Hard Margin SVM
      1. 1.1.1. Making prediction using Linear Kernel
      2. 1.1.2. Gaussian Kernel(RBF)
    2. 1.2. Soft Margin