Hacker News new | ask | show | jobs
by joshguthrie 4596 days ago
Simple.

Recently, I've been tasked with mapping all of our clients addresses to lat/long. I could've read the CSV and appended the results to each line. Or used a JSON file. That I would have to read/write every time.

Instead, I wrote some pseudo-helper to dump all the CSV data into a SQLite DB. Then I ran my script. Every time I found a lat/long, I could mark the client as "done" and add the lat/long for that client and every client that shared this address. When I had to cut my script because I saw one result from Google Maps was wrong, I could just edit it straight in SQL, mark it as "invalid" and relaunch my script: it started right back at the first undone row. Then I just had to select all the "invalid" results and search them manually or refine them so Google Maps would give me a proper result.

Dataset is useful for small data that is constantly being worked on.

(This answer is from a Ruby POV and the dataset I was working on had about 4K rows, which explains why a) some Python magic wasn't available to me, maybe it would have been perfect in Python world and b) I didn't want to play with streams on my files)

Of course I still need some automation to correctly use my "DataMiner" (as I called it) to the fullest. I'll use Dataset's API as a basis to rewite it correctly.

1 comments

I know very little about what's available in Ruby, but I would have used the Pandas library to accomplish this task in python. Their in-memory data structure, a DataFrame, is more than capable of handling those operations.