|
|
|
|
|
by hu3
196 days ago
|
|
Interesting. I'm not experienced in data cleaning.
About Python vs Excel:
Isn't manual cleanning of data in Excel prone to permanent error? Because: - it's hard to version control/diff - it's done by a human fat fingering spreadsheet cells - it's not reproducible. Like if you need to redo the cleaning of all the dates, in a Python script you could just fix the data parsing part and rerun the script to parse source again. And you can easily control changes with git In practice I think the speed tradeoff could be worth the ocasional mistake. But it would depend on the field I guess. |
|
> - it's done by a human fat fingering spreadsheet cells No one is entering anything into the cells, please reread the message.
> - it's not reproducible. Like if you need to redo the cleaning of all the dates, in a Python script you could just fix the data parsing part and rerun the script to parse source again. And you can easily control changes with git And that's what I said above. That it takes longer. Why use git/python when I can do it in a few clicks and quickly see a visual representation at every step?
> In practice I think the speed tradeoff could be worth the ocasional mistake. But it would depend on the field I guess. Another sentence that shows once again that you haven't read what was written.