Skip to content

detectlanguages should detect languages in FoLiA #22

@kosloot

Description

@kosloot

The --detectlanguages option is confusing:
On plain text it means: Detect the language, tokenize according to that language and assign it to the FoLiA output.
On FoLiA input it means: check the language tag of elements and when it is in the provided list, tokenize it, according to the language.

I think that in FoLiA input it should be possible to really detect the language too.
This probably only will work correctly on input documents without any language info, but still that is useful.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions