Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTFS-Flex #388

Closed
Closed
Changes from 2 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
e359750
Modify stop_areas.txt
tzujenchanmbd Jul 12, 2023
2efafbf
Modify stop_times.stop_id
tzujenchanmbd Jul 12, 2023
778069e
Modify stop_times.stop_sequence
tzujenchanmbd Jul 12, 2023
6a95ac2
Modify stop_times.arrival_time
tzujenchanmbd Jul 12, 2023
86472bd
Modify stop_times.departure_time
tzujenchanmbd Jul 12, 2023
4ee86b9
Modify stop_times.pickup_type/drop_off_type
tzujenchanmbd Jul 12, 2023
58018b9
Extend stop_times with start/end_pickup_dropoff_window
tzujenchanmbd Jul 13, 2023
231b071
Extend stop_times with pickup/drop_off_booking_rule_id
tzujenchanmbd Jul 13, 2023
8dcbbcd
Add locations.geojson file
tzujenchanmbd Jul 13, 2023
858fa7b
Add booking_rules.txt file
tzujenchanmbd Jul 13, 2023
820289c
Add general description for locations.geojson
tzujenchanmbd Jul 13, 2023
0450331
Add forbidden location.geojson.id with the same area_id
tzujenchanmbd Jul 21, 2023
fca0f08
Removed "consecutive values" in stop_sequence
tzujenchanmbd Jul 25, 2023
1f6c463
Change presence for pickup_type & drop_off_type
tzujenchanmbd Aug 3, 2023
d69e157
Clarification on stop_times.stop_id
tzujenchanmbd Oct 12, 2023
a42aa78
Revert changes in stop_areas.txt
tzujenchanmbd Nov 8, 2023
af97543
Add location_groups.txt back
tzujenchanmbd Nov 8, 2023
eafa3e1
Modify stop_times.stop_id
tzujenchanmbd Nov 8, 2023
f71374a
Change name of location_groups.location_id
tzujenchanmbd Nov 9, 2023
fe6b4c6
Add stop_times.location_group_id & location_id
tzujenchanmbd Nov 9, 2023
af0c0f8
Change presence of stop_times.stop_id
tzujenchanmbd Nov 9, 2023
9457322
Modify stop_times.start/end_pickup_drop_off_window
tzujenchanmbd Nov 9, 2023
33da9b9
Modify stop_times.pickup_type/drop_off_type
tzujenchanmbd Nov 9, 2023
c97df8e
Modify stop_times.stop_sequence
tzujenchanmbd Nov 9, 2023
757d676
Change requirement condition for stop_times.stop_id
tzujenchanmbd Nov 9, 2023
352f09e
Modify locations.geojson
tzujenchanmbd Nov 9, 2023
fdcf875
Modify stop_times.continuous_pickup/drop_off
tzujenchanmbd Nov 10, 2023
df038dd
Modify routes.continuous_pickup/drop_off
tzujenchanmbd Nov 10, 2023
e428c8e
Dedicated clarifications for stop and location group
tzujenchanmbd Nov 17, 2023
6314252
Remove same id value restriction
tzujenchanmbd Nov 17, 2023
7d10ffa
Add unique ID restriction
tzujenchanmbd Nov 29, 2023
1fc0e61
Add location_groups.txt to dataset files
tzujenchanmbd Nov 29, 2023
7dee929
Simplify language in stop_times
tzujenchanmbd Nov 29, 2023
c3929b3
Change an incorrect dot to an underscore
tzujenchanmbd Dec 7, 2023
eacf043
not "indicate" groups, but "are" groups
tzujenchanmbd Jan 22, 2024
5e22e10
Update location_groups and location_group_stops
tzujenchanmbd Jan 24, 2024
1419292
Modify locations.geojson file description
tzujenchanmbd Jan 24, 2024
13c858b
Add "Zone Overlap Constraint"
tzujenchanmbd Jan 24, 2024
088d1e3
Add travel time clarification
tzujenchanmbd Jan 24, 2024
3dc6aaa
Modify conditions for 4 fields
tzujenchanmbd Feb 13, 2024
709066b
Remove unnecessary table name & editorial changes
tzujenchanmbd Feb 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions gtfs/spec/en/reference.md
tzujenchanmbd marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ Primary key (`trip_id`, `stop_sequence`)
| `trip_id` | Foreign ID referencing `trips.trip_id` | **Required** | Identifies a trip. |
| `arrival_time` | Time | **Conditionally Required** | Arrival time at the stop (defined by `stop_times.stop_id`) for a specific trip (defined by `stop_times.trip_id`). <br><br>If there are not separate times for arrival and departure at a stop, `arrival_time` and `departure_time` should be the same. <br><br>For times occurring after midnight on the service day, enter the time as a value greater than 24:00:00 in HH:MM:SS local time for the day on which the trip schedule begins.<br><br> If exact arrival and departure times (`timepoint=1` or empty) are not available, estimated or interpolated arrival and departure times (`timepoint=0`) should be provided.<br><br>Conditionally Required:<br>- **Required** for the first and last stop in a trip (defined by `stop_times.stop_sequence`). <br>- **Required** for `timepoint=1`.<br>- Optional otherwise.|
| `departure_time` | Time | **Conditionally Required** | Departure time from the stop (defined by `stop_times.stop_id`) for a specific trip (defined by `stop_times.trip_id`).<br><br>If there are not separate times for arrival and departure at a stop, `arrival_time` and `departure_time` should be the same. <br><br>For times occurring after midnight on the service day, enter the time as a value greater than 24:00:00 in HH:MM:SS local time for the day on which the trip schedule begins.<br><br> If exact arrival and departure times (`timepoint=1` or empty) are not available, estimated or interpolated arrival and departure times (`timepoint=0`) should be provided.<br><br>Conditionally Required:<br>- **Required** for `timepoint=1`.<br>- Optional otherwise.| |
| `stop_id` | Foreign ID referencing `stops.stop_id` | **Required** | Identifies the serviced stop. All stops serviced during a trip must have a record in [stop_times.txt](#stop_timestxt). Referenced locations must be stops/platforms, i.e. their `stops.location_type` value must be `0` or empty. A stop may be serviced multiple times in the same trip, and multiple trips and routes may service the same stop. |
| `stop_id` | Foreign ID referencing `stops.stop_id`, `stop_areas.area_id`, or `id` from `locations.geojson` | **Required** | Identifies the serviced stop. All stops serviced during a trip must have a record in [stop_times.txt](#stop_timestxt). Referenced locations must be stops/platforms, i.e. their `stops.location_type` value must be `0` or empty. A stop may be serviced multiple times in the same trip, and multiple trips and routes may service the same stop.<br><br>If service is on demand, a GeoJSON location or stop area can be referenced:<br>-&nbsp;`id` from `locations.geojson`<br>-&nbsp;`stop_areas.area_id` |
| `stop_sequence` | Non-negative integer | **Required** | Order of stops for a particular trip. The values must increase along the trip but do not need to be consecutive.<hr>*Example: The first location on the trip could have a `stop_sequence`=`1`, the second location on the trip could have a `stop_sequence`=`23`, the third location could have a `stop_sequence`=`40`, and so on.* |
| `stop_headsign` | Text | Optional | Text that appears on signage identifying the trip's destination to riders. This field overrides the default `trips.trip_headsign` when the headsign changes between stops. If the headsign is displayed for an entire trip, `trips.trip_headsign` should be used instead. <br><br> A `stop_headsign` value specified for one `stop_time` does not apply to subsequent `stop_time`s in the same trip. If you want to override the `trip_headsign` for multiple `stop_time`s in the same trip, the `stop_headsign` value must be repeated in each `stop_time` row. |
| `pickup_type` | Enum | Optional | Indicates pickup method. Valid options are:<br><br>`0` or empty - Regularly scheduled pickup. <br>`1` - No pickup available.<br>`2` - Must phone agency to arrange pickup.<br>`3` - Must coordinate with driver to arrange pickup. |
Expand Down Expand Up @@ -472,8 +472,8 @@ Assigns stops from [stops.txt](#stopstxt) to areas.

| Field Name | Type | Presence | Description |
| ------ | ------ | ------ | ------ |
| `area_id` | Foreign ID referencing `areas.area_id` | **Required** | Identifies an area to which one or multiple `stop_id`s belong. The same `stop_id` may be defined in many `area_id`s. |
| `stop_id` | Foreign ID referencing `stops.stop_id` | **Required** | Identifies a stop. If a station (i.e. a stop with `stops.location_type=1`) is defined in this field, it is assumed that all of its platforms (i.e. all stops with `stops.location_type=0` that have this station defined as `stops.parent_station`) are part of the same area. This behavior can be overridden by assigning platforms to other areas. |
| `area_id` | Foreign ID referencing `areas.area_id` | **Required** | Identifies an area to which one or multiple `stop_id`s belong. The same `stop_id` may be defined in many `area_id`s.<br><br>May also identify a group of stops and/or GeoJSON locations that together indicate locations where a rider may request pickup or drop off.<br><br>It is forbidden to define an `area_id` with the same value as a `stop_id` or `id` from `locations.geojson`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the need to have distinct values between area_id, stop_id, geojson.id, but this is not something managable (even validation is not simple, but to do this in the DB level is very hard if not impossible)

I'm not sure how can this be improved. If the id-s are integers then it's not really possible. But if they are strings then maybe a good practice could be to use some prefix, ie: for area_id: a_123, for stop_id: s_123, for geojson_id: g_123.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've managed to keep these values distinct at the DB level with no issue. Our area_ids are generated in the same table as our stop_ids in ascending order, so there's no risk of overlap there, and we do use a prefix for location ids.

Using a prefix to ensure these values don't overlap is good advice, but I do not think it needs to be a requirement of the spec.

Logically, I guess I don't see the validation hurdle: "Does this id appear in stops.txt? Does this id appear in locations.geojson? Does this id appear in areas.txt? If no to all, passes validation" seems fairly straightforward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a consumer (OBA, OTP) I've also had no issue with these IDs sharing a global ID space. Yes, it does make validation and lookup a bit more complicated but considering all the other complexities of the Flex spec, this is the easy part.

It's probably a good idea to prefix your IDs but that should really be left to the producer. I would not want to add any kind of ID requirements to the spec. IMO GTFS should treat them as transparent identifiers with no meta data encoded into them, which it currently does.

| `stop_id` | Foreign ID referencing `stops.stop_id` or `id` from `locations.geojson` | **Required** | Identifies a stop or GeoJSON location. If a station (i.e. a stop with `stops.location_type=1`) is defined in this field, it is assumed that all of its platforms (i.e. all stops with `stops.location_type=0` that have this station defined as `stops.parent_station`) are part of the same area. This behavior can be overridden by assigning platforms to other areas. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This again is really impossible in DB, which means it could only be implemented by not using foreign keys and do something else that is more complicated if even possible. I think it should be clear from the data what it is meant. Using the prefixes as I suggested in area_id is better, but even that doesn't really solve the problem that one field should be foreign key to different tables. Maybe we should have an additional table that is referenced from here, and there we can have all the necessary information. Something like:

geographical_ids.csv:
geographical_id, type, foreign_id
1, stop, 123
2, area, 123
3, geojson, 123

Or:
geographical_ids.csv:
geographical_id, area_id, stop_id, geojson_id
1, , 123,
2, 123, ,
3, , , 123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be interested to hear about the experience of consumers already using these fields as they are currently designed. Where exactly does the impossibility lie?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, not impossible, but it sounds like a bad design or a hack. From the sentences "Foreign ID referencing stops.stop_id, stop_areas.area_id, or id from locations.geojson " and "It is forbidden to define an area_id with the same value as a stop_id or id from locations.geojson." these are the things come to my mind: 1. this is some afterthought. 2. Is this restriction also explained in stop_areas.area_id, stops.stop_id, locations.geojson? Almost, except that in locations.id only collision with stops is forbidden, but not with stop_areas.area_id (I guess it should be added there?)

Ah, just noticed now: the field is called stop_id. It's misleading. I'm agains having a field that is "foreign ID" to 3 different fields (I don't think this counts as a foreign id, but maybe I'm too conservative) but if everyone else agrees on this, then at least let's give it a proper name. It is NOT a stop_id, so let's not call it that. Maybe geographical_reference or stop_location? BTW the same goes for stop_areas.stop_id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost, except that in locations.id only collision with stops is forbidden, but not with stop_areas.area_id (I guess it should be added there?)

Yes, good catch, it should be added. @tzujenchanmbd

It is NOT a stop_id, so let's not call it that. Maybe geographical_reference or stop_location?

The original iteration of Flex (v1) had a separate field in stop_times to refer only to polygons, but you still had to define stop_times.stop_id since it is a required field. Because of this, you had to use a "dummy" id in stop_times.stop_id which, the v2 drafting community decided, was worse than just having stop_times.stop_id able to reference ids from multiple files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But naming conventions are important. Can't the field name be changed to something less confusing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot change stop_times.stop_id, as that is part of the core GTFS standard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks like a breaking change anyway, because until now stop_times.stop_id was referencing only stops.stop_id. If you allow anything else here (as this change does) then it's a breaking change, in which case it could also be renamed. Or if you want to be semi backwards compatible then you still could add a new field with some better name, and add to stop_id and the new field that they are conditionally forbidden and only one of them can be defined. This would enable current producers to continue to produce their feed with stop_id (with only referencing stop_ids) OR use the new field instead if they also want to reference areas and/or locations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is a breaking change to add to what a field can reference. If you have established that A can reference X, there is no contradiction to later say A can also reference Y and Z.

This would enable current producers to continue to produce their feed with stop_id (with only referencing stop_ids) OR use the new field instead if they also want to reference areas and/or locations.

I do not see how this is an advantage for producers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accept Gavriel's point about this being a little difficult to model in a relational database but I don't believe that it must be the goal of the spec to make it so.

I also agree that using the name "stop_location" (which we do in our internal model) would have been preferable if this was part of the first iteration of GTFS ~15 years ago.

However, given that GTFS has no versions and pretty ironclad backwards-compatibility guarantees, I think using stop_id rather than introducing a new field is the least bad solution given the constraints that we face. For my part, I have been consuming the stop_id that refers to three different types of entities for years and it has not given me any issues that are worse than all the other problems that exist when you mix static and flexible transit data.

Having said that, I would be open to do a consensus finding exercise to figure out what the community thinks.

Lastly, I would like to emphasize that I don't want to dismiss Gavriel's points but rather argue that GTFS is in many ways highly pragmatic which does lead these kinds of design decisions. On balance, I think I'm happier with this approach compared to what other standards follow which say that they follow more "robust" engineering principles.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: this comment
Restriction added 0450331, thanks!


### shapes.txt

Expand Down