Hacker News new | ask | show | jobs
by frontiersummit 1611 days ago
I think it cuts both ways, as anyone who has needed to mine an existing data set for a new purpose can attest. Having the data sanitized can may your parsing job infinitely easier, while it can simultaneously destroy data which would have been extremely helpful to the new project.
1 comments

If it doesn't fit into a data standard you are enforcing, it shouldn't exist in the database. There is nothing wrong with capturing the original text in a field or separate table.