Hacker News new | ask | show | jobs
by screye 2172 days ago
It is only boring if you do it the boring way.

If the data cleaning is follows standard patterns, you should already have scripts to offload that kind of work to. If not, then there some incredibly interesting decisions hidden underneath. Like in text: Should character casing be preserved ? What should be the unit of representation (word/character) ? How should data be filtered: Quality vs quantity trade-off ?

All of those are non-trivial questions which involve a lot of thought to reason through. You are correct that the modelling is only a small part of DS's day to day job.

But, the rest of it is boring in the same way that coding is boring. It is doesn't involve some grand epiphanies or discoveries, but there is joy similar to the daily grind of "code -> get bug/ violate constraints -> follow trace/problem -> figure a sensible solution" that a lot of software engineers love.