Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTFS-Flex #388
GTFS-Flex #388
Changes from 2 commits
e359750
2efafbf
778069e
6a95ac2
86472bd
4ee86b9
58018b9
231b071
8dcbbcd
858fa7b
820289c
0450331
fca0f08
1f6c463
d69e157
a42aa78
af97543
eafa3e1
f71374a
fe6b4c6
af0c0f8
9457322
33da9b9
c97df8e
757d676
352f09e
fdcf875
df038dd
e428c8e
6314252
7d10ffa
1fc0e61
7dee929
c3929b3
eacf043
5e22e10
1419292
13c858b
088d1e3
3dc6aaa
709066b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the need to have distinct values between area_id, stop_id, geojson.id, but this is not something managable (even validation is not simple, but to do this in the DB level is very hard if not impossible)
I'm not sure how can this be improved. If the id-s are integers then it's not really possible. But if they are strings then maybe a good practice could be to use some prefix, ie: for area_id: a_123, for stop_id: s_123, for geojson_id: g_123.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've managed to keep these values distinct at the DB level with no issue. Our
area_id
s are generated in the same table as ourstop_id
s in ascending order, so there's no risk of overlap there, and we do use a prefix for locationid
s.Using a prefix to ensure these values don't overlap is good advice, but I do not think it needs to be a requirement of the spec.
Logically, I guess I don't see the validation hurdle: "Does this id appear in stops.txt? Does this id appear in locations.geojson? Does this id appear in areas.txt? If no to all, passes validation" seems fairly straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a consumer (OBA, OTP) I've also had no issue with these IDs sharing a global ID space. Yes, it does make validation and lookup a bit more complicated but considering all the other complexities of the Flex spec, this is the easy part.
It's probably a good idea to prefix your IDs but that should really be left to the producer. I would not want to add any kind of ID requirements to the spec. IMO GTFS should treat them as transparent identifiers with no meta data encoded into them, which it currently does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This again is really impossible in DB, which means it could only be implemented by not using foreign keys and do something else that is more complicated if even possible. I think it should be clear from the data what it is meant. Using the prefixes as I suggested in area_id is better, but even that doesn't really solve the problem that one field should be foreign key to different tables. Maybe we should have an additional table that is referenced from here, and there we can have all the necessary information. Something like:
geographical_ids.csv:
geographical_id, type, foreign_id
1, stop, 123
2, area, 123
3, geojson, 123
Or:
geographical_ids.csv:
geographical_id, area_id, stop_id, geojson_id
1, , 123,
2, 123, ,
3, , , 123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be interested to hear about the experience of consumers already using these fields as they are currently designed. Where exactly does the impossibility lie?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, not impossible, but it sounds like a bad design or a hack. From the sentences "Foreign ID referencing
stops.stop_id
,stop_areas.area_id
, orid
fromlocations.geojson
" and "It is forbidden to define anarea_id
with the same value as astop_id
orid
fromlocations.geojson
." these are the things come to my mind: 1. this is some afterthought. 2. Is this restriction also explained in stop_areas.area_id, stops.stop_id, locations.geojson? Almost, except that in locations.id only collision with stops is forbidden, but not with stop_areas.area_id (I guess it should be added there?)Ah, just noticed now: the field is called stop_id. It's misleading. I'm agains having a field that is "foreign ID" to 3 different fields (I don't think this counts as a foreign id, but maybe I'm too conservative) but if everyone else agrees on this, then at least let's give it a proper name. It is NOT a stop_id, so let's not call it that. Maybe geographical_reference or stop_location? BTW the same goes for stop_areas.stop_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch, it should be added. @tzujenchanmbd
The original iteration of Flex (v1) had a separate field in stop_times to refer only to polygons, but you still had to define
stop_times.stop_id
since it is a required field. Because of this, you had to use a "dummy" id instop_times.stop_id
which, the v2 drafting community decided, was worse than just havingstop_times.stop_id
able to reference ids from multiple files.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But naming conventions are important. Can't the field name be changed to something less confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot change
stop_times.stop_id
, as that is part of the core GTFS standard.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it looks like a breaking change anyway, because until now stop_times.stop_id was referencing only stops.stop_id. If you allow anything else here (as this change does) then it's a breaking change, in which case it could also be renamed. Or if you want to be semi backwards compatible then you still could add a new field with some better name, and add to stop_id and the new field that they are conditionally forbidden and only one of them can be defined. This would enable current producers to continue to produce their feed with stop_id (with only referencing stop_ids) OR use the new field instead if they also want to reference areas and/or locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is a breaking change to add to what a field can reference. If you have established that A can reference X, there is no contradiction to later say A can also reference Y and Z.
I do not see how this is an advantage for producers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I accept Gavriel's point about this being a little difficult to model in a relational database but I don't believe that it must be the goal of the spec to make it so.
I also agree that using the name "stop_location" (which we do in our internal model) would have been preferable if this was part of the first iteration of GTFS ~15 years ago.
However, given that GTFS has no versions and pretty ironclad backwards-compatibility guarantees, I think using stop_id rather than introducing a new field is the least bad solution given the constraints that we face. For my part, I have been consuming the stop_id that refers to three different types of entities for years and it has not given me any issues that are worse than all the other problems that exist when you mix static and flexible transit data.
Having said that, I would be open to do a consensus finding exercise to figure out what the community thinks.
Lastly, I would like to emphasize that I don't want to dismiss Gavriel's points but rather argue that GTFS is in many ways highly pragmatic which does lead these kinds of design decisions. On balance, I think I'm happier with this approach compared to what other standards follow which say that they follow more "robust" engineering principles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: this comment
Restriction added 0450331, thanks!