forked from pranab/beymani
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
44 lines (32 loc) · 1.6 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Introduction
============
Beymani consists of set of Hadoop based tools for outlier and anamoly
detection, which can be used for fraud detection.
Blogs
=====
The following blogs of mine are good source of details of beymani
http://pkghosh.wordpress.com/2012/01/02/fraudsters-outliers-and-big-data-2/
http://pkghosh.wordpress.com/2012/02/18/fraudsters-are-not-model-citizens/
http://pkghosh.wordpress.com/2012/06/18/its-a-lonely-life-for-outliers/
http://pkghosh.wordpress.com/2012/10/18/relative-density-and-outliers/
Distribution Method
===================
Use the MR class MultiVarHistogram from the project chombo. As the name
suggests it calculates multivariate distribution and detects outliers.
Here is my blog post on this
http://pkghosh.wordpress.com/2012/02/18/fraudsters-are-not-model-citizens/
Average Distance
================
Use SameTypeSimilarity MR from the sifarish, to find pair wise distance
for all data points. The outout of this MR is used as input to
AverageDistance MR in this project. Here is the relevanr blog post on this
http://pkghosh.wordpress.com/2012/06/18/its-a-lonely-life-for-outliers/
Relative Density
================
This approach is appropriate when the feature space is not homogegeneous
and density varie. First, use SameTypeSimilrity to find pairwise distance.
Then use AverageDistance MR to find density. Use AverageDistance MR again
to find neighborhood groups. Use the results of the last two steps and and
run NeighborDensity to find group wise density of each data point. Finally
run the MR RelativeDensity to find relative density of each point
More Coming.......