|
| 1 | +MORPHA STEMMER |
| 2 | + |
| 3 | +http://www.informatics.sussex.ac.uk/research/groups/nlp/carroll/morph.html |
| 4 | + |
| 5 | +A fast and robust morphological analyser for English based on finite-state |
| 6 | +techniques that returns the lemma and inflection type of a word, given the word |
| 7 | +form and its part of speech. (The latter is optional but accuracy is degraded |
| 8 | +if it is not present). |
| 9 | + |
| 10 | +Converted to Java using JFlex. The class is not threadsafe. |
| 11 | + |
| 12 | + |
| 13 | +README FROM ORIGINAL MORPHA DISTRIBUTION |
| 14 | + |
| 15 | + University of Sussex 8 Sep 2003 |
| 16 | + |
| 17 | +This directory contains software for morphological processing of English |
| 18 | +as developed by Kevin Humphreys < [email protected]>, John Carroll |
| 19 | +< [email protected]> and Guido Minnen. |
| 20 | + |
| 21 | +To be used for research purposes only (see section 4 below). If you make |
| 22 | +any changes, the authors would appreciate it if you sent them details of |
| 23 | +what you have done. |
| 24 | + |
| 25 | +Covers the English inflectional suffixes: |
| 26 | + |
| 27 | + -s plural of nouns, 3rd person singular present of verbs |
| 28 | + -ed past tense |
| 29 | + -en past participle |
| 30 | + -ing progressive of verbs |
| 31 | + |
| 32 | +1. Usage |
| 33 | +-------- |
| 34 | + |
| 35 | + morpha [-a] [-c] [-t] [-u] [-f verbstem-file] |
| 36 | + morphg [-c] [-t] [-u] [-f verbstem-file] |
| 37 | + |
| 38 | +The commands operate as filters, reading from the standard input and |
| 39 | +writing to the standard output. |
| 40 | + |
| 41 | +They may be invoked with the following command-line options: |
| 42 | + |
| 43 | + -a Output affixes (morpha only). |
| 44 | + |
| 45 | + -c Preserve case distinctions wherever possible. |
| 46 | + |
| 47 | + -t Output part-of-speech tags if they are in the input. |
| 48 | + |
| 49 | + -u Indicate that the words in the input are not tagged with |
| 50 | + part-of-speech labels. N.B. This mode of use is not recommended |
| 51 | + since the resulting ambiguity in the input is likely to lead to |
| 52 | + incorrect output. |
| 53 | + |
| 54 | + -f By default, the commands attempt to read a file called |
| 55 | + 'verbstem.list' in the user's current directory which is expected |
| 56 | + to contain a list of stems of verbs that undergo doubling of |
| 57 | + their final consonant, as occurs in British English spelling. |
| 58 | + This option allows the user to specify a different file of verb |
| 59 | + stems (for example if American English behaviour is required). |
| 60 | + If this option is specified then it must be the last one on |
| 61 | + the command-line. |
| 62 | + |
| 63 | +See the file doc.txt for specifications of input and output formats, |
| 64 | +and examples of usage. |
| 65 | + |
| 66 | +2. Files |
| 67 | +-------- |
| 68 | + |
| 69 | + Makefile makefile for compiling the flex sources; can be |
| 70 | + used for compiling both flex descriptions by |
| 71 | + the command `make flex-description-file' |
| 72 | + README this file |
| 73 | + doc.txt specifications of input/output formats, and usage |
| 74 | + examples |
| 75 | + gpost postamble file used in deriving morphg.lex |
| 76 | + gpre preamble file used in deriving morphg.lex |
| 77 | + invert.sh unix shell program that derives morphg.lex from |
| 78 | + morpha.lex |
| 79 | + minnen.pdf pre-final PDF version of the NLE article by Minnen, |
| 80 | + Carroll and Pearce (2001) |
| 81 | + morpha.{ix86_linux|ppc_darwin|sun4_sunos} |
| 82 | + executables for the morphological analyser; for |
| 83 | + details of usage see above |
| 84 | + morpha.lex flex description constituting the source of the |
| 85 | + morphological analyser |
| 86 | + morphg.{ix86_linux|ppc_darwin|sun4_sunos} |
| 87 | + executables for the morphological generator; for |
| 88 | + details of usage see above |
| 89 | + morphg.lex flex description constituting the source of the |
| 90 | + morphological generator |
| 91 | + verbstem.list list of verb stems that allow for consonant doubling |
| 92 | + in British English |
| 93 | + |
| 94 | +The file morphg.lex is derived automatically from the file morpha.lex |
| 95 | +using invert.sh, as described in the paper by Minnen, Carroll and |
| 96 | +Pearce (2001) -- full reference below. |
| 97 | + |
| 98 | +3. Compilation |
| 99 | +-------------- |
| 100 | + |
| 101 | +To recompile the morph tools, either type the following commands |
| 102 | +(making sure that you use the 2.5.4a version of flex recompiled with |
| 103 | +larger internal limits -- see below), or (more conveniently) use the |
| 104 | +Makefile in this directory by typing `make morpha' or `make morphg'. |
| 105 | + |
| 106 | + flex -i -Cfe -8 -omorpha.yy.c morpha.lex |
| 107 | + gcc -o morpha morpha.yy.c |
| 108 | + |
| 109 | +or |
| 110 | + |
| 111 | + flex -i -Cfe -8 -omorphg.yy.c morphg.lex |
| 112 | + gcc -o morphg morphg.yy.c |
| 113 | + |
| 114 | +The executables included in this release were built omitting the |
| 115 | +Flex options -Cfe -8, resulting in a reduction in binary file size |
| 116 | +of two thirds (and a reduction in processing speed of around 20%). |
| 117 | +These options also have to be left out and the option -Dinteractive |
| 118 | +added to gcc (resulting in a further decrease in throughput) in order |
| 119 | +to get the morph tools to return results immediately when used via |
| 120 | +unix pipes inside other programs. |
| 121 | + |
| 122 | +N.B. Recompiling the morph tools requires an adapted version of Flex. |
| 123 | +The Flex source code is freely available from: |
| 124 | + |
| 125 | + http://www.go.dlr.de/fresh/unix/src/misc/.warix/flex-2.5.4a.tar.gz.html |
| 126 | + |
| 127 | +The Flex source should be changed to allow for more internal states by |
| 128 | +increasing the definitions in flexdef.h of: |
| 129 | + |
| 130 | + #define JAMSTATE -32766 |
| 131 | + ... |
| 132 | + #define MAXIMUM_MNS 31999 |
| 133 | + ... |
| 134 | + #define BAD_SUBSCRIPT -32767 |
| 135 | + |
| 136 | +to: |
| 137 | + |
| 138 | + #define JAMSTATE -800000 |
| 139 | + ... |
| 140 | + #define MAXIMUM_MNS 800000 |
| 141 | + ... |
| 142 | + #define BAD_SUBSCRIPT -800000 |
| 143 | + |
| 144 | +and recompiling Flex. When recompiling the morph tools ensure that the |
| 145 | +Makefile points to the new version of Flex. |
| 146 | + |
| 147 | +4. Acknowledgements, copyrights etc. |
| 148 | +------------------------------------ |
| 149 | + |
| 150 | +Copyright (c) 1995-2000 University of Sheffield, University of Sussex |
| 151 | +All rights reserved. |
| 152 | + |
| 153 | +Redistribution and use of source and derived binary forms are |
| 154 | +permitted without fee provided that: |
| 155 | + |
| 156 | + - they are not used in commercial products |
| 157 | + - the above copyright notice and this paragraph are duplicated in |
| 158 | + all such forms |
| 159 | + - any documentation, advertising materials, and other materials |
| 160 | + related to such distribution and use acknowledge that the software |
| 161 | + was developed by Kevin Humphreys < [email protected]>, John |
| 162 | + Carroll < [email protected]> and Guido Minnen |
| 163 | + and refer to the following related publication: |
| 164 | + |
| 165 | + Guido Minnen, John Carroll and Darren Pearce. 2001. `Applied |
| 166 | + morphological processing of English'. Natural Language Engineering, |
| 167 | + 7(3). 207-223. |
| 168 | + |
| 169 | +The name of University of Sheffield may not be used to endorse or |
| 170 | +promote products derived from this software without specific prior |
| 171 | +written permission. |
| 172 | + |
| 173 | +This software is provided "as is" and without any express or implied |
| 174 | +warranties, including, without limitation, the implied warranties of |
| 175 | +merchantibility and fitness for a particular purpose. |
| 176 | + |
| 177 | +The exception lists were derived semi-automatically from WordNet 1.5, |
| 178 | +and various other corpora and MRDs. |
| 179 | + |
| 180 | +Many thanks to Tim Baldwin, Chris Brew, Bill Fisher, Gerald Gazdar, |
| 181 | +Dale Gerdemann, Adam Kilgarriff and Ehud Reiter for suggested |
| 182 | +improvements. |
| 183 | + |
| 184 | +WordNet 1.5 Copyright 1995 by Princeton University. |
| 185 | +All rights reseved. |
| 186 | + |
| 187 | +THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON |
| 188 | +UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. |
| 189 | +BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON UNIVERSITY MAKES NO |
| 190 | +REPRESENTATIONS OR WARRANTIES OF MERCHANT- ABILITY OR FITNESS FOR ANY |
| 191 | +PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE, DATABASE |
| 192 | +OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS, |
| 193 | +COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS. |
| 194 | + |
| 195 | +The name of Princeton University or Princeton may not be used in |
| 196 | +advertising or publicity pertaining to distribution of the software |
| 197 | +and/or database. Title to copyright in this software, database and |
| 198 | +any associated documentation shall at all times remain with Princeton |
| 199 | +University and LICENSEE agrees to preserve same. |
0 commit comments