Hacker News new | ask | show | jobs
by higginsc 1734 days ago
Ha! A friend sent me this comment when he recognized this project. Unless there happens to be another firm who did the exact same thing we did, I was a part of this project (see this blog post https://www.svds.com/introduction-to-trainspotting/).

You misunderstood the point of the presentation. The company was a consulting firm that specialized in data science and engineering. Our clients wanted to kick the tires and see what our technical chops were before hiring us but they didn't want to let us use their proprietary and confidential data for our own tech demos.

We didn't want to just use the same open source datasets everyone else did, so we got to thinking about novel datasets we could create that might have applications for industries we sold our services to. From this, the Trainspotting project was born.

Many of us commuted via the Caltrain, which was right next to our office, and we were frequently frustrated with the unreliability (this was in ~2016 or so when car and pedestrian strikes were happening seemingly every week), so we made an app that tried to provide more accurate scheduling.

We used the official API for station:train arrival times, but we found that it was unreliable, so we wanted some ground truth data on whether a train was passing. Since our office was right next to the Castro MTV station, I had the idea to use a microphone (attached to a raspberry pi) to just listen for when the train went by. In addition to ground-truth data for validating arrival times, this gave us a chance to show off some IoT applications. It actually worked pretty well, but it had false positives (e.g. the garbage truck would set it off). So we added a camera.

We pointed it at the tracks and started streaming data off of it. At first we used very simple techniques, processing the raw stream on-device with classic computer vision algos (e.g. Haar cascades) in openCV. We discovered that the VTA, which had a track parallel to the Caltrain and was "behind" the Caltrain in our camera's shot, could cause false positives. Gradually we used more and more complex techniques like deep learning, but the raspberry pi couldn't handle it (IIRC it could only process a single frame in like 6 seconds). So we used a two-stage validation whereby the simpler, faster detectors that could run on the raw stream in real time detected a positive and then we'd send a single frame to run deep learning.

TL,DR: The whole point was to be a tech demo, not to gauge the speed. The trains were either stopping or pulling out of the station, so speed would have been useless.

1 comments

Really enjoyed this post and explanation, thank you! I work in ML and used to live on Alma St in Palo Alto so it really hit home for me :).

I also acutely enjoy the notion that a pithy critique of people who refused to simplify the problem they were solving is in itself grossly oversimplified!