-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
81 lines (46 loc) · 1.69 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
mKD-GBWT(multiple KD-tree GBWT) is an external memory full-text index.
The basis of mKD-GBWT is Geometric Burrows-Wheeler Transform. The biggest improvement of mKD-GBWT is that it uses multiple KD-tree as its orthogonal range searching data structure, so it has a good I/O performance.
Install:
1. First you need install SAscan and LCPscan.
[1] https://www.cs.helsinki.fi/group/pads/SAscan.html
[2] https://www.cs.helsinki.fi/group/pads/LCPscan.html
2. Down load mKD-GBWT index from https://github.com/xidianzyh/mKD-GBWT.
modify the location of index:
2.1 Open file sbt_util.h, modify the `#define GBWT_INDEX_POSITION "/media/软件/GBWT-Index-position/"` into location of yourself.
2.2 Open file tools/sam_sa_lcp.cpp, modify the `#define GBWT_INDEX_POSITION "/media/软件/GBWT-Index-position/"` into location of yourself.
$ cd mKD-GBWt
$ make
$ cd test
$ make
Generate index of data
1. $ cd SAscan-???
$ cd src
$ ./sascan <data>
2. $ cd LCPscan-???
$ cd build
$ ./construct_lcp <data>
3. $ cd mKD-GBWT
$ ./gen_sa_lcp <data>
4. $ cd mKD-GBWT
$ ./sam_sa_lcp <data> <step>
$ ./build <data> <disk-page-size-in-bytes> <step>
Pattern-matching
$ cd mKD-GBWT
$ ./mygbwt <data> <disk-page-size-in-bytes> <step>
Example
the data file is `/data/english`, disk-page is 4096 bytes, step = 4
First generate the index of data
1. $ cd SAscan-???
$ cd src
$ ./sascan /data/english
2. $ cd LCPscan-???
$ cd build
$ ./construct_lcp /data/english
3. $ cd mKD-GBWT
$ ./gen_sa_lcp /data/english
4. $ cd mKD-GBWT
$ ./sam_sa_lcp /data/english 4
$ ./build /data/english 4096 4
Second, pattern mathcing
$ cd mKD-GBWT
$ ./mygbwt /data/english 4096 4