Hacker News new | ask | show | jobs
by echochar 3821 days ago
"No external imports required..."

I convert HTML to CSV regularly (multiple times a day). But I do not use Python; I use C. Actually I use flex to make filters which I compile as static binaries that read from stdin. This is in fact how I read HN. The HTML is converted to CSV and then the CSV is imported into a database.

Prior to using flex I primarily used sed. For many sites I still do; it's faster than having to compile, test, recompile.

If anyone has a website they want in CSV, and need something faster than Python or Ruby, just post the url. I like to think I am reasonably good at this, but I only do it for personal use on sites I'm interested in so who knows. For me HTML conversion to CSV and plain text is an art - I practice it every day.

2 comments

>This is in fact how I read HN.

Out of curiosity, why?

1. Practice with new programming language and database.

2. Turn unstructured, difficult to parse data adorned with HTML, and other window dressing into structured data that is easier to parse.

I'd like to see HN, actually.