This text is a bit technical. It is the result of a discussion with Andrea Lodi concerning the robustness of the SVM.
It is well-known that the mean is more robust compared to the median. Mean is a result of an L2 minimization (least squares), and median is the result of L1 minimization or (least absolute deviation). Quantile regression coincides with the median regression for tau=0.5.
Here I examine support vector machines, and compare it with these two regression techniques.
The subject of robustness of the SVM is rarely studied, and we do not know much about it. It is interesting to see how much outliers affect the support vector machine (SVM) classifier. As the SVM classification does not correspond to a statistical model, it is difficult to say what outlier means in this context but at least we can talk about the influcential observations and make a sort of cook’s distance by removing data and re-fitting.
The linear SVM is the hinge loss, but with
The support vector machines is a hinge loss regression while data are separable. The penalization in the case of separable data is there only to take care of maximizing the margin. For non-separable data it will play a totally different role, it maximizes a hypothetical generalized margin which has no geometrical interpretation.
For non-separable data the L2
Let’s start with generating some data and fitting a separating line
that maximizes the margin. #SVM separable data Now lets fit a SVM
classifier. The support vectors are highlighted in blue, least squares
in green and least absolute in magenta.
As you see there are more blue dots now. Since misclassified data now fall on the hypothetical generalized margin. I seems SVM is more sensitive to data separability compared with least squares, or median regression.
rm(list=ls(all=TRUE))
set.seed(150)
n <-50
x1s <- c(rnorm(n,-3), rnorm(n,3))
x2s <- c(rnorm(n,-3), rnorm(n,3))
ys <- c(rep(+1,n), rep(-1,n))
my.data <- data.frame(x1=x1s, x2=x2s, type=ys)
plot(my.data[,-3],col=(ys+3)/2, pch=19, xlim=c(-10,10), ylim=c(-10,10))
library('e1071')
library('quantreg')
## Warning: package 'quantreg' was built under R version 3.2.5
## Loading required package: SparseM
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
##
## backsolve
svm.model <- svm(type ~ ., data=my.data,
type='C-classification',
kernel='linear',scale=FALSE)
w <- t(svm.model$coefs) %*% svm.model$SV
b <- -svm.model$rho
p <- svm.model$SV
#Fit Least Squares and Median Regression
lscoef <- coef(lm(type ~., data=my.data))
medcoef <- coef(rq(type ~., data=my.data,tau=0.5))
# Plot
plot(my.data[,-3],col=(ys+3)/2,
pch=19, xlim=c(-10,10), ylim=c(-10,10))
points(my.data[svm.model$index,c(1,2)],cex=2, col="blue",lwd=2)
abline(a=-lscoef[1]/lscoef[3],b=-lscoef[2]/lscoef[3],
col="green")
abline(a=-medcoef[1]/medcoef[3],b=-medcoef[2]/medcoef[3],
col="magenta")
abline(a=-b/w[1,2], b=-w[1,1]/w[1,2], col="blue", lty=1)
legend(-5,10, legend=c("SVM", "L2", "L1"),
col=c("blue","green","magenta"), lwd=3)
SVM Robustness
Lets add some contamination that are correctly classified and see if the SVM line changes. As long as the support vecotrs (blue data) remain on the margin, the SVM fit remain the same.outlier<-5
x1s <- c(x1s, rnorm(outlier,-10))
x2s <- c(x2s, rnorm(outlier,0))
ys <- c(ys,rep(+1,outlier))
my.data <- data.frame(x1=x1s, x2=x2s, type=ys)
# SVM
svm.model <- svm(type ~ ., data=my.data,
type='C-classification',
kernel='linear',scale=FALSE)
w <- t(svm.model$coefs) %*% svm.model$SV
b <- -svm.model$rho
p <- svm.model$SV
#Fit Least Squares and Median Regression
lscoef <- coef(lm(type ~., data=my.data))
medcoef <- coef(rq(type ~., data=my.data,tau=0.5))
# Plot
plot(my.data[,-3],col=(ys+3)/2,
pch=19, xlim=c(-10,10), ylim=c(-10,10))
points(my.data[svm.model$index,c(1,2)],cex=2, col="blue",lwd=2)
abline(a=-lscoef[1]/lscoef[3],b=-lscoef[2]/lscoef[3],
col="green")
abline(a=-medcoef[1]/medcoef[3],b=-medcoef[2]/medcoef[3],
col="magenta")
abline(a=-b/w[1,2], b=-w[1,1]/w[1,2], col="blue", lty=1)
legend(-5,10, legend=c("SVM", "L2", "L1"),
col=c("blue","green","magenta"), lwd=3)
Non-separable data
Of course most of classifying data are not separable. It is also possible that somebody made a mistake and swaped +1 class with -1. Therefore it is more realistic to consider a non-separable data set. Therefore, we modify our simulated data a bit.#make data non-separable
x1s[1] <- 2
x2s[1] <- 2
x1s[n+1] <- -2
x2s[n+1] <- -2
my.data <- data.frame(x1=x1s, x2=x2s, type=ys)
svm.model <- svm(type ~ ., data=my.data,
type='C-classification',
kernel='linear',scale=FALSE)
w <- t(svm.model$coefs) %*% svm.model$SV
b <- -svm.model$rho
p <- svm.model$SV
#Fit Least Squares and Median Regression
lscoef <- coef(lm(type ~., data=my.data))
medcoef <- coef(rq(type ~., data=my.data,tau=0.5))
# Plot
plot(my.data[,-3],col=(ys+3)/2,
pch=19, xlim=c(-10,10), ylim=c(-10,10))
points(my.data[svm.model$index,c(1,2)],cex=2, col="blue",lwd=2)
abline(a=-lscoef[1]/lscoef[3],b=-lscoef[2]/lscoef[3],
col="green")
abline(a=-medcoef[1]/medcoef[3],b=-medcoef[2]/medcoef[3],
col="magenta")
abline(a=-b/w[1,2], b=-w[1,1]/w[1,2], col="blue", lty=1)
legend(-5,10, legend=c("SVM", "L2", "L1"),
col=c("blue","green","magenta"), lwd=3)
As you see there are more blue dots now. Since misclassified data now fall on the hypothetical generalized margin. I seems SVM is more sensitive to data separability compared with least squares, or median regression.
SVM Breakdown
Now lets see if we can add some contamination to our data and influence the separating line.outlier <-10
x1s <- c(x1s, rnorm(outlier,10))
x2s <- c(x2s, rnorm(outlier,5))
ys <- c(ys,rep(+1,outlier))
my.data <- data.frame(x1=x1s, x2=x2s, type=ys)
svm.model <- svm(type ~ ., data=my.data,
type='C-classification',
kernel='linear',scale=FALSE)
w <- t(svm.model$coefs) %*% svm.model$SV
b <- -svm.model$rho
p <- svm.model$SV
#Fit Least Squares and Median Regression
lscoef <- coef(lm(type ~., data=my.data))
medcoef <- coef(rq(type ~., data=my.data,tau=0.5))
# Plot
plot(my.data[,-3],col=(ys+3)/2,
pch=19, xlim=c(-10,10), ylim=c(-10,10))
points(my.data[svm.model$index,c(1,2)],cex=2, col="blue",lwd=2)
abline(a=-lscoef[1]/lscoef[3],b=-lscoef[2]/lscoef[3],
col="green")
abline(a=-medcoef[1]/medcoef[3],b=-medcoef[2]/medcoef[3],
col="magenta")
abline(a=-b/w[1,2], b=-w[1,1]/w[1,2], col="blue", lty=1)
legend(-5,10, legend=c("SVM", "L2", "L1"),
col=c("blue","green","magenta"), lwd=3)
No comments:
Post a Comment