Skip to content

Commit b6352f0

Browse files
committed
Initial commit.
0 parents  commit b6352f0

File tree

6 files changed

+56968
-0
lines changed

6 files changed

+56968
-0
lines changed

LICENSE

+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
Copyright (c) 1995-2001 University of Sheffield, University of Sussex
2+
All rights reserved.
3+
4+
Redistribution and use of source and derived binary forms are permitted
5+
provided that:
6+
- they are not used in commercial products
7+
- the above copyright notice and this paragraph are duplicated in
8+
all such forms
9+
- any documentation, advertising materials, and other materials
10+
related to such distribution and use acknowledge that the software
11+
was developed by Kevin Humphreys <[email protected]> and John
12+
Carroll <[email protected]> and Guido Minnen
13+
<[email protected]> and refer to the following related
14+
publication:
15+
16+
Guido Minnen, John Carroll and Darren Pearce. 2000. Robust, Applied
17+
Morphological Generation. In Proceedings of the First International
18+
Natural Language Generation Conference (INLG), Mitzpe Ramon, Israel.
19+
201-208.
20+
21+
The name of University of Sheffield may not be used to endorse or
22+
promote products derived from this software without specific prior
23+
written permission.
24+
25+
This software is provided "as is" and without any express or
26+
implied warranties, including, without limitation, the implied
27+
warranties of merchantibility and fitness for a particular purpose.
28+
29+
If you make any changes, the authors would appreciate it
30+
if you sent them details of what you have done.

README

+199
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
MORPHA STEMMER
2+
3+
http://www.informatics.sussex.ac.uk/research/groups/nlp/carroll/morph.html
4+
5+
A fast and robust morphological analyser for English based on finite-state
6+
techniques that returns the lemma and inflection type of a word, given the word
7+
form and its part of speech. (The latter is optional but accuracy is degraded
8+
if it is not present).
9+
10+
Converted to Java using JFlex. The class is not threadsafe.
11+
12+
13+
README FROM ORIGINAL MORPHA DISTRIBUTION
14+
15+
University of Sussex 8 Sep 2003
16+
17+
This directory contains software for morphological processing of English
18+
as developed by Kevin Humphreys <[email protected]>, John Carroll
19+
<[email protected]> and Guido Minnen.
20+
21+
To be used for research purposes only (see section 4 below). If you make
22+
any changes, the authors would appreciate it if you sent them details of
23+
what you have done.
24+
25+
Covers the English inflectional suffixes:
26+
27+
-s plural of nouns, 3rd person singular present of verbs
28+
-ed past tense
29+
-en past participle
30+
-ing progressive of verbs
31+
32+
1. Usage
33+
--------
34+
35+
morpha [-a] [-c] [-t] [-u] [-f verbstem-file]
36+
morphg [-c] [-t] [-u] [-f verbstem-file]
37+
38+
The commands operate as filters, reading from the standard input and
39+
writing to the standard output.
40+
41+
They may be invoked with the following command-line options:
42+
43+
-a Output affixes (morpha only).
44+
45+
-c Preserve case distinctions wherever possible.
46+
47+
-t Output part-of-speech tags if they are in the input.
48+
49+
-u Indicate that the words in the input are not tagged with
50+
part-of-speech labels. N.B. This mode of use is not recommended
51+
since the resulting ambiguity in the input is likely to lead to
52+
incorrect output.
53+
54+
-f By default, the commands attempt to read a file called
55+
'verbstem.list' in the user's current directory which is expected
56+
to contain a list of stems of verbs that undergo doubling of
57+
their final consonant, as occurs in British English spelling.
58+
This option allows the user to specify a different file of verb
59+
stems (for example if American English behaviour is required).
60+
If this option is specified then it must be the last one on
61+
the command-line.
62+
63+
See the file doc.txt for specifications of input and output formats,
64+
and examples of usage.
65+
66+
2. Files
67+
--------
68+
69+
Makefile makefile for compiling the flex sources; can be
70+
used for compiling both flex descriptions by
71+
the command `make flex-description-file'
72+
README this file
73+
doc.txt specifications of input/output formats, and usage
74+
examples
75+
gpost postamble file used in deriving morphg.lex
76+
gpre preamble file used in deriving morphg.lex
77+
invert.sh unix shell program that derives morphg.lex from
78+
morpha.lex
79+
minnen.pdf pre-final PDF version of the NLE article by Minnen,
80+
Carroll and Pearce (2001)
81+
morpha.{ix86_linux|ppc_darwin|sun4_sunos}
82+
executables for the morphological analyser; for
83+
details of usage see above
84+
morpha.lex flex description constituting the source of the
85+
morphological analyser
86+
morphg.{ix86_linux|ppc_darwin|sun4_sunos}
87+
executables for the morphological generator; for
88+
details of usage see above
89+
morphg.lex flex description constituting the source of the
90+
morphological generator
91+
verbstem.list list of verb stems that allow for consonant doubling
92+
in British English
93+
94+
The file morphg.lex is derived automatically from the file morpha.lex
95+
using invert.sh, as described in the paper by Minnen, Carroll and
96+
Pearce (2001) -- full reference below.
97+
98+
3. Compilation
99+
--------------
100+
101+
To recompile the morph tools, either type the following commands
102+
(making sure that you use the 2.5.4a version of flex recompiled with
103+
larger internal limits -- see below), or (more conveniently) use the
104+
Makefile in this directory by typing `make morpha' or `make morphg'.
105+
106+
flex -i -Cfe -8 -omorpha.yy.c morpha.lex
107+
gcc -o morpha morpha.yy.c
108+
109+
or
110+
111+
flex -i -Cfe -8 -omorphg.yy.c morphg.lex
112+
gcc -o morphg morphg.yy.c
113+
114+
The executables included in this release were built omitting the
115+
Flex options -Cfe -8, resulting in a reduction in binary file size
116+
of two thirds (and a reduction in processing speed of around 20%).
117+
These options also have to be left out and the option -Dinteractive
118+
added to gcc (resulting in a further decrease in throughput) in order
119+
to get the morph tools to return results immediately when used via
120+
unix pipes inside other programs.
121+
122+
N.B. Recompiling the morph tools requires an adapted version of Flex.
123+
The Flex source code is freely available from:
124+
125+
http://www.go.dlr.de/fresh/unix/src/misc/.warix/flex-2.5.4a.tar.gz.html
126+
127+
The Flex source should be changed to allow for more internal states by
128+
increasing the definitions in flexdef.h of:
129+
130+
#define JAMSTATE -32766
131+
...
132+
#define MAXIMUM_MNS 31999
133+
...
134+
#define BAD_SUBSCRIPT -32767
135+
136+
to:
137+
138+
#define JAMSTATE -800000
139+
...
140+
#define MAXIMUM_MNS 800000
141+
...
142+
#define BAD_SUBSCRIPT -800000
143+
144+
and recompiling Flex. When recompiling the morph tools ensure that the
145+
Makefile points to the new version of Flex.
146+
147+
4. Acknowledgements, copyrights etc.
148+
------------------------------------
149+
150+
Copyright (c) 1995-2000 University of Sheffield, University of Sussex
151+
All rights reserved.
152+
153+
Redistribution and use of source and derived binary forms are
154+
permitted without fee provided that:
155+
156+
- they are not used in commercial products
157+
- the above copyright notice and this paragraph are duplicated in
158+
all such forms
159+
- any documentation, advertising materials, and other materials
160+
related to such distribution and use acknowledge that the software
161+
was developed by Kevin Humphreys <[email protected]>, John
162+
Carroll <[email protected]> and Guido Minnen
163+
and refer to the following related publication:
164+
165+
Guido Minnen, John Carroll and Darren Pearce. 2001. `Applied
166+
morphological processing of English'. Natural Language Engineering,
167+
7(3). 207-223.
168+
169+
The name of University of Sheffield may not be used to endorse or
170+
promote products derived from this software without specific prior
171+
written permission.
172+
173+
This software is provided "as is" and without any express or implied
174+
warranties, including, without limitation, the implied warranties of
175+
merchantibility and fitness for a particular purpose.
176+
177+
The exception lists were derived semi-automatically from WordNet 1.5,
178+
and various other corpora and MRDs.
179+
180+
Many thanks to Tim Baldwin, Chris Brew, Bill Fisher, Gerald Gazdar,
181+
Dale Gerdemann, Adam Kilgarriff and Ehud Reiter for suggested
182+
improvements.
183+
184+
WordNet 1.5 Copyright 1995 by Princeton University.
185+
All rights reseved.
186+
187+
THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON
188+
UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED.
189+
BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON UNIVERSITY MAKES NO
190+
REPRESENTATIONS OR WARRANTIES OF MERCHANT- ABILITY OR FITNESS FOR ANY
191+
PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE, DATABASE
192+
OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS,
193+
COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.
194+
195+
The name of Princeton University or Princeton may not be used in
196+
advertising or publicity pertaining to distribution of the software
197+
and/or database. Title to copyright in this software, database and
198+
any associated documentation shall at all times remain with Princeton
199+
University and LICENSEE agrees to preserve same.

0 commit comments

Comments
 (0)