Hacker News new | ask | show | jobs
by zimpenfish 2572 days ago
Needs a discussion of the methodology of calculation.

The distances I'm getting from the downloaded "bus-sequences.csv" (from TfL's API site) differ substantially; e.g my 53 has averages of 317m and 357m vs his 201m and 219m.

Additionally, my numbers match up with those of 'superqwert from https://news.ycombinator.com/item?id=20029476 for the 389 and 631 routes.

(I measured a bunch of stops of the 78 route with GPS earlier this week. My calculations correlated closely with the real* distances when the looseness of GPS from downstairs on a bus is taken into account.)

2 comments

The numbers on 389 and 631 agree with my post linked above to 3SF... Rerunning my previous query on 53 gets me:

min: 11.176937891670061

lq: 69.77607205565486

med: 147.79705350080133

avg: 219.3995096050874

uq: 291.84606245281907

max: 1121.239174918762

I wonder where you got your csv data from exactly? I found a csv (https://tfl.gov.uk/cdn/static/cms/documents/stop-sequences-e...) that is labelled "The example feeds below are not updated and for demonstration purposes only".

For my query I called the route sequence API directly:

https://api.tfl.gov.uk/line/53/Route/Sequence/inbound

The CSV is this one - http://tfl.gov.uk/tfl/syndication/feeds/bus-sequences.csv?ap...

I get the same sequence of stops from that API link that the CSV has for route 53, run 2, albeit with northing/easting location instead of lat/long.

(I'm extremely sceptical about the 11m minimum distance too - having done the entire 53 route a few times, I can't remember any stops that are basically on top of each other.)

I think I've found the issue with the 11m distance - the 11m distance is:

[-0.126102, 51.502769] to [-0.126018, 51.502714]

What is happening is that the TfL sequence API is starting the bus route sequence on one side of the road of Parliament Street, before turning it around at the corner with Whitehall Place. The bus stops on either side of the road are very close together. Other maps/sequences elsewhere start the sequence at the end of Parliament Street, meaning the short distance is avoided.

Ah, you're parsing the `lineStrings` blob? I'm looking at the lat/long pairs in the `stopPointSequences` structure[1] which gives a different set of coords (in particular, 51.502769, -0.126102 doesn't exist as a stop for the 53.)

[1] `.stopPointSequences | .[0] | .stopPoint | .[] | [.name, .id, (.lat|tostring), (.lon|tostring)]` in jq parlance

Interesting! I wonder why the Line strings blob differs... Maybe it doesn't represent bus stops at all, but points at which the route changes for the purpose of drawing?
It does seem to be "the route as you'd show on a map" - if I plot the two sets points on GPSVisualizer, it's fairly different.

https://imgur.com/a/BUSW3C2

I guess the 389/631 routes are reasonably simple and don't need more "drawing points" than you'd get from the bus stops anyway?

[edit: added the 389 bus stops and line points to the imgur gallery]

In fact, it seems for 53 that none of the lineStrings match the stopPointSequence lat/lons
but then... why would my query have matched your query in the cases of 389 and 631?!
I think he is using bird's-eye view distances, as opposed to bus travel distances. Roads are rarely directly straight between stops in London
Hmm, I'm using straight-line distances (pythagorean of the northing/easting difference). Maybe he's using "real world" distances somehow? That might explain some of the discrepancies. Although I'd expect the straight-line numbers to be lower, on the whole, because, as you say, the roads are rarely straight.
Seems to be a confusion over the data from the API - it returns a `lineStrings` array of GPS coords but those aren't the stops; it's the "plot these to visualise the route as it goes on the road". The actual bus stops are further down in a different structure.