Skip to content

bug: Warning: Duplicate word in word2vec file #887

@bact

Description

@bact

Description

There are hundreds of warnings like this during unit test:

2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'ต่าง' in word2vec file, ignoring all but first

Expected results

No warning.

Current results

(partial)

2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'ต่าง' in word2vec file, ignoring all but first
2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word '	' in word2vec file, ignoring all but first
...
2023-12-11:03:40:57 WARNING  [gensim.models.keyedvectors:1909] duplicate word '' in word2vec file, ignoring all but first
2023-12-11:03:40:58 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'หยับ' in word2vec file, ignoring all but first

Steps to reproduce

Run unit test

PyThaiNLP version

dev

Python version

3.8

Operating system and version

n/a

More info

No response

Possible solution

No response

Files

No response

Metadata

Metadata

Assignees

Labels

bugbugs in the library

Type

No type

Projects

Status

To do

Relationships

None yet

Development

No branches or pull requests

Issue actions