Hacker News new | ask | show | jobs
by Natsu 1901 days ago
That's the ID number you just grabbed. The phone number is the second field :)

If that's literally all you want, yes, it's not that hard. But a non-trivial number of people decided to put commas or colons in their names and other nonsense like that, there are lots of commas in the hometown or location fields which makes parsing those a pain, etc.

1 comments

Aha we must be looking at different data then, possibly someones already done much of the corrections on the version I'm looking at.
Possible, but there are also different files with different schemas, so it's hard to even say that.

There only ones that actually define the data are the 9 or so CSV files that have a header like:

id,phone,first_name,last_name,email,birthday,gender,locale,hometown,location,link

Those are what I looked at and those are super annoying because several have commas in both the first & last name. I don't know why, but a handful of people listed their names as some, guy, some, guy which I assume should be split into firstname: some, guy and lastname: some, guy. Then a lot of people have None for a birthday, some have something like May 8, and others have something like May 8, 1990. Both locale & hometown can be either None, or have several commas in them.

I had to reformat all that data and validate that each field made sense to parse it. There are helpful "Location" and "link" markers in the CSV but it's still super annoying to parse this stuff.