textometry

A simple standalone textometry/lexicometry applet, inspired by platforms such as TXM [https://txm.gitpages.huma-num.fr/textometrie/], Hyperbase [https://hyperbase.unice.fr/], among others. This applet is intended as a simple turn-key solution to compare texts from different authors, genres, periods, topics, etc. The specificity analysis is inspired by P. Lafon's proposal, and thus relies on the hypergeometric distribution law, and not the normal distribution (as in R Stylo). In this applet, a simplified version of the hypergeometric distribution is implemented. Moreover, tokenization rules are extremely simple, and are French-specific (although they also work for English, Spanish, etc.). Therefore, results and scores as compared to TXM and Hyperbase will differ (specificity scores). Nevertheless, users will still be able to retrieve subsets of Corpus A / Corpus B specific tokens: specificity ranges will be slightly different between TXM and this applet, but the overall specific tokens will still be identified.

Feel free to use this applet to get a feeling of the specificity analysis, then move on to TXM and Hyperbase for more robust results.

As said before, tokenization rules are extremely simple, developers are advised to modify these to suit their specific needs.

Caveat: this app is a very simple textometry/lexicometry tool, which has been put together (on a deadline) with much help from LLM coders. Many features are pretty much experimental.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

textometry

About

Releases

Packages

Languages

License

abalvet/textometry

Folders and files

Latest commit

History

Repository files navigation

textometry

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages