Angular Based Outlier Detection
(ABOD)
Before
starting ABOD method let’s try to understand what is outlier, different types
of methods to detect outliers and how ABOD is different from other outlier
detection methods.
As per
Hawkins definition “An outlier is an observation which deviates so much from
the other observations as to arouse suspicions that it was generated by a
different mechanism”
There are mainly
3 types of methods:-1) Statistical or Model-based Methods: It includes Parametric and Non-parametric approach.
2) Proximity based methods: It can be classified in 3 category
- Cluster based methods
- Distance based methods
- Density based methods
Statistical
models are relatively simple way of identifying an abnormal data point.
Abnormal data points are outliers which can be identified even by Box- Plot,
Extreme values in normal distribution etc.
Model based and Proximity based approaches, however, are based on an assessment of distances in the full-dimensional Euclidean data space. In high-dimensional data, these approaches are bound to deteriorate due to the notorious “curse of dimensionality”.You can read this article to know more about it- Distance & Density Based Clustering
In above
figure for an outlier point P the angle between PX and PY for any two X Y from the database is
substantially smaller than angles of other points Q and R.Angle
between farthest data point is less than the angle between nearer data points.
If you think deeper, the variance ( of all the possible angles to rest of the
data points) for the farthest data
points will be lesser as compared to the nearer data points. Thus the data
point with less variance of angle will be considered as an outlier.
Angles are
more stable than distances in high dimensional
•
Object
o is an outlier if most other objects are located in similar directions ( less
variance of angles)
• Object o is no outlier if many other
objects are located in varying directions (Higher variance of angles)
In actual implementation, not just the angle
but the distance between the point is also divided so that distance is also
taken into account.( Nearby points may also have very less angle but might not be outlier) So angular distance=
(AB,AC) - dot product of AB
AB, AC - distance between A and B, A and C
So cosine= (AB, AC)/AB*AC
cosine /distances= (AB, AC)/(AB^2*AC^2)
to calculate angle based outlier factor of A, variance of all possible cosine/distance is taken. Lower value means more outlier-ness.
Implementation of AOBD method in R
# Sub-setting the data
iris_dataset
<- iris[,1:4]
# Running
ABOD code
angular_distance
<- abodOutlier::abod(iris_dataset, method = "complete")
# plotting the data
library(ggplot2)
gg <- ggplot(data = iris_dataset,
aes(x=Sepal.Length, y= Sepal.Width)) + geom_point(aes(col=angular_distance))
plot(gg)
Here the darker points (smaller angular
distance) are clearly visible as outliers(Abnormal data points).
Connectivity based Outlier Detection method
Connectivity based Outlier Detection method
Read more about other interesting ML topics-
No comments:
Post a Comment