-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
95 lines (64 loc) · 2.68 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
SS MaxEnt¤òÍѤ¤¤¿Sentence Splitter
* How to use
1) make
2) ./geniass arg1 arg2
arg1 is a target file to split.
arg2 is an output file name.
You need to run geniass in the directory which has
EventExtracter.rb, Classifying2Splitting.rb, model1-1.0.
If you want to get stand-off format file,
please run
3) ruby sentence2standOff.rb arg1 arg2 arg3
arg1 and arg2 are same with 2).
arg3 is an output stand-off file name.
------------
SS MaxEnt
This is a simple C++ class library for maximum entropy classifiers.
If you are familiar with C++ and STL, you will easily understand how
to use the library by having a look at the sample code.
The main features of this library are:
- fast parameter estimation using the BLMVM algorithm (Benson and More, 2001)
- smoothing with Gausian prior (Chen and Rosenfeld, 1999)
- modelling with inequality constraints (Kazama and Tsujii, 2003)
- saving/loading the model to/from a file
- can integrate the model data into your source code.
* How to use
1) make
- if you encounter errors with hash, try commenting out
#define USE_HASH_MAP
in "maxent.h".
2) ./a.out
3) see sample.cpp and maxent.h
* Tips
1) If you have many samples for training, use a portion of the data
as held-out data to see if overfitting is happening or not.
ex.) model.set_heldout(1000);
2) If you see overfitting, try one of the followings:
- feature cut-off ex.) model.train(3);
- Gausian prior ex.) model.train(0, 1000, 0);
- inequality constrains ex.) model.train(0, 0, 1.0);
* I like the third one because it produces a compact model and
gives equally good performance with gausian prior.
3) If you want to integrate the generated model file into your code,
see model2c.cpp.
* References
[1] Jun'ichi Kazama and Jun'ichi Tsujii, Evaluation and Extension of
Maximum Entropy Models with Inequality Constraints, In the
Proceedings of EMNLP 2003, pp. 137-144.
[2] Steven J. Benson and Jorge J. More, A Limited-Memory Variable-Metric
Method for Bound-Constrained Minimization, Preprint ANL/MCS-P909-0901
http://www-unix.mcs.anl.gov/~benson/blmvm/
[3] Stanley F. Chen and Ronald Rosenfeld, A Gaussian Prior for Smoothing
Maximum Entropy Models, Technical Report CMU-CS-99-108, Computer
Science Department, Carnegie Mellon University, 1999.
* History
2005 Jul. 8 version 1.2.2
- initial public release
2005 Sep. 13 version 1.3
- requires less memory in training
2005 Sep. 13 version 1.3.1
- update README
2005 Oct. 28 version 1.3.2
- fix for overflow (thanks to Ming Li)
-------------------------------------------------------------------------
Yoshimasa Tsuruoka ([email protected])