A library for finding the most similar word from a list of words, supporting Japanese (including kanji).
Note
It may be a little slow because it uses morphological analysis. By adopting worker_threads, the processing speed is slightly improved compared to the standard.
日本語のドキュメントは README-ja.md を参照してください。
npm install closewords| Parameter | Type | Default | Description |
|---|---|---|---|
word |
string | { word: string; pronounce?: string } |
— | The reference word or object. |
candidates |
Array<string | { word: string; pronounce?: string }> |
— | List of candidate words or objects. |
raw |
boolean |
false |
Whether to include similarity scores. |
Returns: Promise<string[]> when raw is false, Promise<Array<{ word: string; score: number }>> when raw is true.
The highest score is 1 (perfect match), the lowest is 0.
The optional pronounce field accepts alphabetic strings only and is used as a pronunciation hint for morphological analysis.
import { closeWords } from 'closewords';
const word = '東京';
const candidates = ['東京', 'とっこう', '東きょう', 'とう京', 'とうきょう', 'とーきょー'];
const result = await closeWords(word, candidates);
console.log(result);
// => [ '東京' ]
const resultWithScores = await closeWords(word, candidates, true);
console.log(resultWithScores);
// => [
// { word: '東京', score: 1 },
// { word: 'とう京', score: 0.693... },
// ...
// ]Rewrote in TypeScript. Introduced tsdown for bundling.
Moved source files to src/. Separated README into English / Japanese.
Improved Hepburn-style completion. Format code.
word.pronounce and pronounce in candidates[] are completed Hepburn-style.
Fixed bugs.
Fixed score calculation.
Fixed score calculation.
Fixed README. Fixed the issue that only a string could be specified in word. Fixed the issue that word.pronounce was ignored. Fixed the issue that non-alphabet could be specified for word.pronounce and pronounce in candidates[]. word.pronounce and pronounce in candidates[] are now optional. Fixed a few other bugs.
Added a way to specify the pronunciation of words.
Introduced fast-levenshtein and fixed score calculation. The similarity of the original strings is also evaluated.
Introduced jaro-winkler and optimized.
Fixed score calculation.
Package released! Introducing morphological analysis.