In case you want to contribute something to Lingua, then I encourage you to do so. Do you have ideas for improving the API? Are there some specific languages that you want to have supported early? Or have you found any bugs so far? Feel free to open an issue or send a pull request. It's very much appreciated.
For pull requests, please make sure that all unit tests pass and that the code is formatted according to
the official Go style guide with go fmt
.
All kinds of pull requests are welcome. The pull requests I favor the most are new language additions. If you want to contribute new languages to Lingua, here comes a detailed manual explaining how to accomplish that.
Thank you very much in advance for all contributions, however small they may be.
- Clone Lingua's repository to your own computer.
- Open enums
IsoCode639_1
andIsoCode639_3
and add the language's iso codes. Among other sites, Wikipedia provides a comprehensive list. - Open enum
Language
and add a new entry for your language. If the language is written with a script that is not yet supported by Lingua'salphabet
enum, then add a new entry for it there as well. - If your language's script contains characters that are completely unique to it, then add them to the
respective method in the
Language
enum. However, if the characters occur in more than one language but not in all languages, then add them to thecharsToLanguagesMapping
constant instead. - Use the function
CreateAndWriteLanguageModelFiles
to create the language model files. The training data file used for ngram probability estimation is not required to have a specific format other than to be a valid txt file with UTF-8 encoding. Do not rename the language model files. - Use the function
CreateAndWriteTestDataFiles
to create the test data files used for accuracy report generation. The input file from which to create the test data should have each sentence on a separate line. Do not rename the test data files. - Create a new directory in
/language-models
named after the new language's ISO 639-1 code and put the language model files into it. Look at the other languages' directories to see how it looks like. It should be pretty self-explanatory. - Put the test data files in
/language-testdata
. - Add the new language to
/cmd/accuracy_reporter.go
as well. - Fix the existing unit tests by adding your new language.
- For accuracy report generation, run
cd cmd && go run accuracy_reporter.go
. - Be happy! :-) You have successfully contributed a new language and have thereby significantly widened this library's fields of application.