-
Notifications
You must be signed in to change notification settings - Fork 4
LogonProcessing_BatchGeneration
This document describes how to perform batch generation within LOGON. In extension of this it also describes how to produce a generation treebank on the basis of an existing ERG treebank. We first give a step-by-step descripton of how to do this using only menu choices from the podium. Then show how the same steps can be carried out from the command-line using the generate script provided in the LOGON source tree (i.e. $LOGONROOT/generate).
For the podium approach, there are two main steps; 1) generate and 2) update. In the first step we exhaustively generate all "paraphrases" for the input MRS. In the update step we identify and label the references among these alternative realizations by matching them against the references in the original parse treebank.
1) Generation
-
Load the grammar: (from the LKB top panel) Load | Complete grammar (and choose for example logon/lingo/erg/lkb/script to load the latest ERG grammar)
-
Initialize the generator: (from the LKB top panel): Options | Expand menu, and then Generate | Index
-
Select appropriate skeletons (e.g. english): (from the [incr tsdb()] podium) Options | Skeleton Root
-
Select the skeleton you want to use and create the target profile: File | Create
-
Select the corresponding gold profile (assumed to be thinned, i.e. containing the MRSs for the preferred parses).
-
Select generation as batch processing mode: Process | Switches | Generation
-
Optionally set the maximum number of edges (e.g. 50000): Process | Variables | Chart size limit
-
Generate: Process | All Items
2) Update
-
In this step we identify and label the references among the newly generated sets of paraphrases. First we set some switches controlling how the realizations are matched against the references of the original parse treebank:
-
Trees | Switches | Update Exact Match
Trees | Switches | Preterminal Yield
-
-
Then perform the labeling step: Trees | Update
(ps: Some [incr tsdb()] Lisp variables that are relevant for the matching/labeling of references include the following:BR *redwoods-update-exact-p*, *derivations-comparison-level*, and *derivations-yield-skews*.)
The procedure described above can also be performed by using the $LOGONROOT/generate script.
generate [ skeleton ]
The skeleton argument should name the skeleton you want to use for the new target profile that will be created. We document the available command-line options below, as well as some related and relevant Lisp variables.
--source
- Compile the LOGON system from source.
--is
- Surpress MRS specification about information structure in generation. Controlled by way of the variable mrs::*ignored-sem-features*.
--suffix string
- Append string to the name for the newly created profile (e.g. when more than one run per day needs to be recorded).
--jacy BR --gg
- Changes the grammar from the default ERG and sets the appropriate language for skeletons correspondingly.
--gold string
- Specify the gold profile (the default being gold/${grammar}/${skeleton})
--update
- Do not perform the update step (i.e. automatically identifying and labeling references).
--cache
- In the same pass, create a feature cache for the newly created generation treebank using default feature settings (this can take a while) and then train a MaxEnt realization ranker (again using only default estimation settings).
--count n
- Parallelize processing and start-up n full instantiations of the parser client.
--limit n
- Sets tsdb::*tsdb-maximal-number-of-results*.
--best n
- Sets tsdb::*tsdb-maximal-number-of-analyses*.
(PS: In order to control the maximum number of edges allowed in the chart during generation, look for tsdb::*tsdb-maximal-number-of-edges* in the generate script (currently defaults to 100,000)).
Home | Forum | Discussions | Events