Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I originally set out to extend this project to support matching of DB-HAFAS stations against GTFS/OSM stations. For that, I had to alter the script quite a bit (see the fork here and the results here).
I chose this path because I thought that data from the OSM matching (specifically,
uic_ref
andref:IBNR
tags, and the fact that OSM names are usually more similar to DB-HAFAS names) might be beneficial to the HAFAS matching, but also vice versa, the HAFAS matching might be beneficial to OSM matching (since for train stations, there is an official mapping to IFOPT-IDs). While I still think this is true, it adds more problems than it resolves, in particular, because OSM/GTFS and your matching is much more fine-grained (platform level) than the DB-HAFAS-IDs (station/stop-level).While I have kept the behaviour and structure for OSM matching as stable as possible, only adding features/options for HAFAS matching, I still think the changes might be out of scope for this project.
So, instead, for now in this PR only some generic, cherry-picked improvements that I think are definitely valuable upstream:
NO_MATCH_BUT_OTHER_PLATFORM_MATCHED
)route_type
101 and 102 in DELFI GTFS.)route_short_name
SEV or similar, but sometimes, there will be entries for e.g. "RE3" that haveroute_type
3, i.e. bus.The last two changes are IMO important because possible matches having a different, non-NULL mode are immediately discarded. This effect is much greater for stop provider GTFS, since in the DELFI GTFS, many (most?) trains aren't assigned to the correct platforms that exist in the ZHV, but to the top-level station IFOPT-ID, possibly postfixed with
_G
(compare this issue). So they are not taken into account for stop provider DELFI and most train platforms won't have any mode anyways.So for stop provider DELFI, this doesn't impact a lot of matches:
Running
python compare_stops.py -g gtfs_2023-04-18.zip -o germany-latest.osm.pbf -s zHV_aktuell_csv.2023-04-17.csv -p DELFI -d out/stops.db
match_stats before|after:
Some notable examples:
All diffs: matches_diff.csv (most being due to random toggling across runs if the rating for two candidates is identical)
SQL query to obtain the diff
Let me know if you're interested in also merging the DB-HAFAS matching part; or if you prefer separate PRs for some of the changes in this PR.