Releases · OpenPecha/Botok

RDR rules parser to convert them into pybo's CQL ReplaceMatcher format
integrate it in WordTokenizer and Config (same options as for the trie data and profiles)
add a CLI option using parse_rdr_rules().

Assets 2

16 Aug 14:57

drupchen

v0.6.5

6d5ad81

Bugfix

0.6.5 - 20190816

Fixed

particles not in the list were bugging

Assets 2

15 Aug 20:21

drupchen

v0.6.4

f2d783a

Basic CLI

0.6.4 - 20190815

Added

CLI interface for basic tokenization of strings and files

Assets 2

14 Aug 23:15

drupchen

v0.6.3

bd70586

Bugfix

0.6.3 - 20190814

Fixed

remove print() that was executed at every added word

Assets 2

14 Aug 22:55

drupchen

v0.6.2

4999580

Add sentence and paragraph tokenizers

0.6.2 - 20190814

Added

implemented sentence and paragraph tokenizers + Text properties
meaning field in the entries attribute of Token objects

Changed

reduced the amount of times WordTokenizers were loaded in the test suite (for Travis)
improve names for higher consistency

Fixed

a few remaining bugs from previous release

Assets 2

13 Aug 21:23

drupchen

v0.6.1

ac0dc99

Multiple meanings per inflected form/trie entry

0.6.1 - 20190813

Fixed

affixed particles were inflected
pos, lemma and frequency are brought together: a single inflected form can be two different words, thus different POS and different frequency.
various bugs related to the refactoring

Added

support for more than one meaning for every trie entry (inflected form)

A meanings attribute is added in the Token objects. They hold as many meanings as found in the trie data.
A default meaning is chosen, then the pos, lemma and freq fields are copied from the meanings attribute to the attributes bearing these names.
When only one meaning is available, it is chosen, otherwise, the meaning with the highest amount of attributes is chosen from the following groups, in this order:
meanings that are unaffixed words, meanings that don't have the affixed attribute, meanings that are affixed words.

adjustments required by the above in the different parts of pybo

Assets 2

01 Jul 12:26

drupchen

v0.6.0

53857ca

Refactoring: making an intuitive interface to pybo

Changed

refactoring the Pipeline class into the Text class. check test_text.py to have an overview of what it does.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.6.9 - 20190901

Added

0.6.7 - 20190826

Fixed

0.6.6 - 20190821

Added

0.6.6 - 20190820

Added

0.6.5 - 20190816

Fixed

0.6.4 - 20190815

Added

0.6.3 - 20190814

Fixed

0.6.2 - 20190814

Added

Changed

Fixed

0.6.1 - 20190813

Fixed

Added

Changed

Releases: OpenPecha/Botok

pybo/botok split

0.6.9 - 20190901

Added

Bugfix

0.6.7 - 20190826

Fixed

Batch regex

0.6.6 - 20190821

Added

Add RDR rule adjustments

0.6.6 - 20190820

Added

Bugfix

0.6.5 - 20190816

Fixed

Basic CLI

0.6.4 - 20190815

Added

Bugfix

0.6.3 - 20190814

Fixed

Add sentence and paragraph tokenizers

0.6.2 - 20190814

Added

Changed

Fixed

Multiple meanings per inflected form/trie entry

0.6.1 - 20190813

Fixed

Added

Refactoring: making an intuitive interface to pybo

Changed