AWE_LanguageTool

Python embedding of the LanguageTool error checker, with new classification of error types. To be required by AWE Workbench, NLP tool to support the Writing Observer project.

Note about repository

This repository has a major hack. The full LanguageTool has been committed into version history. This resolves a lot of issues with working in different settings (dev, deployment, etc.) and being able to find files, but needs to be cleaned up at some point, probably with dependencies, pkg_resources, or otherwise. When this happens, we will rebase and rewrite git commit history. All downstream users will see git security warnings when this happens, and it may created additional cleanup work for all clones/forks with unmerged changes.

Installation

Clone repo from Github
cd AWE_LanguageTool
pip install -e .

LanguageTool Configuration & Running

The system uses a client-server model where the LanguageTool server processes requests, and the client adds further processing and error classification. The server runs a Java command to start languagetool-server.jar (provided by LanguageTool), while the client manages and interprets the server's responses.

Configure LanguageTool Server (optional)

Before starting the LanguageTool server, ensure that a server configuration file exists in awe_languagetool/LanguageTool5_5/. We provide a sample configuration file called languagetool.tmp.cfg. To use it, copy and rename this file to languagetool.cfg. Create a copy with the following:
```
python -m awe_languagetool.setup.copy_config
```
To improve performance, particularly under high request loads, adjust the server settings in awe_languagetool/LanguageTool5_5/languagetool.cfg. For suggested settings, see this forum post on server configurations. For a full list of configuration options, refer to the LanguageTool server code.

Start the server with either

You can start the server in Python or from the command line.

In Python:

from awe_languagetool import languagetoolServer
languagetoolServer.runServer()
# or with a config file
languagetoolServer.runServer(config_file='/path/to/config')

In Bash:

python -m awe_languagetool.languagetoolServer
// or with a config file
python -m awe_languagetool.languagetoolServer --config /path/to/config

Directly using Java (if necessary):

The server can also be run directly with Java, which may require adjusting the .jar file path based on your working directory. For more details, see languagetoolServer.py.

java -cp awe_languagetool/LanguageTool5_5/languagetool-server.jar org.languagetool.server.HTTPServer --config {/path/to/config} --port {port} --allow-origin "*"

Connect the client

In a separate terminal, you can now connect the LanguageTool client to the running server. The languagetoolClient class accepts host and port parameters, defaulting to localhost and 8081. Alternatively, you can use the server_url parameter, which will override host and port.

from awe_languagetool import languagetoolClient
import asyncio

client = languagetoolClient.languagetoolClient()
text_to_process = '...'
output = asyncio.run(client.summarizeText(text_to_process))
# Example output
# {
#     'wordcounts': {
#         'tokens': 0,
#         'types': 0,
#         'counts': [('word', 0), ...]
#     },
#     'category_counts': {
#         'Capitalization': 0,
#         'Grammar': 0,
#         ...
#     },
#     'subcategory_counts': {
#         'Capitalization: Abbreviations': 0,
#         'Capitalization: Acronyms': 0,
#         ...,
#         'Grammar: Article Error': 0,
#         ...,
#     },
#     'matches': [
#         {
#             'message': 'Possible spelling mistake found.',
#             'shortMessage': 'Spelling mistake',
#             'replacements': [{'value': 'new_word'}, ...],
#             'offset': 0,
#             'length': 0,
#             'context': {
#                 'text': '...',
#                 'offset': 0,
#                 'length': 0,
#             },
#             'sentences': 'Sentence where spelling error occured.',
#             'type': {'typeName': 'Other'},
#             'rule': {
#                 'id': 'MORFOLOGIK_RULE_EN_US',
#                 'description': 'Possible spelling mistake',
#                 'issueType': 'misspelling',
#                 'category': {'id': 'TYPOS', 'name': 'Possible Typo'}
#             },
#             'ignoreForIncompleteSentences': False,
#             'contextForSureMatch': 0,
#             'label': 'Spelling',
#             'detail': 'Unknown word'
#         },
#         ...
#     ]
# }

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
awe_languagetool		awe_languagetool
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AWE_LanguageTool

Note about repository

Installation

LanguageTool Configuration & Running

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

ETS-Next-Gen/AWE_LanguageTool

Folders and files

Latest commit

History

Repository files navigation

AWE_LanguageTool

Note about repository

Installation

LanguageTool Configuration & Running

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages