Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fuzzy matching by extending static data #2

Open
birneamstiel opened this issue Dec 16, 2018 · 1 comment
Open

Improve fuzzy matching by extending static data #2

birneamstiel opened this issue Dec 16, 2018 · 1 comment
Labels
enhancement New feature or request

Comments

@birneamstiel
Copy link
Owner

Description:

Right now data/lines.json contains station names following this sample S+U Alexanderplatz Bhf (Berlin). Messages contain usually significantly shorter station names (e.g alexanderplatz) which increases the Levenshtein distance and decreases the accuracy.

Proposal:

A quick fix would be adding shortened station names to the lines.json file. Removing the U/S prefix and (Berlin) suffix would decrease the Levenshtein distance significantly.

"U9": [
        "S+U Rathaus Steglitz (Berlin) [U9]",
        "Rathaus Steglitz",
        "U Walther-Schreiber-Platz (Berlin)",
        "Walther-Schreiber-Platz",
        ...
    ]
@birneamstiel birneamstiel added the enhancement New feature or request label Dec 16, 2018
@derhuerst
Copy link

I built vbb-short-station-name (which shortens common parts like (Berlin) and -Platz), tokenize-vbb-station-name (which expands and normalises these parts) and vbb-stations-autocomplete (which provides a fuzzy search over all VBB stations).

While the implementation is in JavaScript, we could

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants