Hacker News new | ask | show | jobs
by quickthrower2 1840 days ago
I’d probably knock one up in nodejs ad follows:

Import a csv reader library that can stream.

Read each line and apply a series of regex, each one classifying on match.

Eg

    ^0\d{8}$
Means string

Then have a reduction rule e.g.

If so far we think it’s a numeric column and we get a string then treat as string.

If so far we think it’s a numeric column and we get a number it is still a numeric column.

Then doing the regex and reduce in a loop will give you the final answers.

Happy to knock up some example code if you wish.

1 comments

Hey that's pretty awesome, I'm ok at the regex stuff, but not that familiar with NodeJS. If you don't mind throwing me a snippet to develop from, that would be much appreciated.
It doesn’t have to be nodejs I think this would be just as easy in python or Java etc.

I’ll put something together though when I get some time

Hey, I got it working in Python with pandas, thanks so much for the suggestions