Skip to content

v2.4.0

Compare
Choose a tag to compare
@tsproisl tsproisl released this 23 Dec 20:32
· 10 commits to master since this release
  • New feature: SoMaJo can output character offsets for tokens, allowing for stand-off tokenization. Pass character_offsets=True to the constructor or use the option --character-offsets on the command line to enable the feature. The character offsets are determined by aligning the tokenized output with the input, therefore activating the feature incurs a noticeable increase in processing time.