|
|
|
Ask HN: How Do You Clean and Structure Data at Work?
|
|
5 points
by _xsbz
500 days ago
|
|
Curious to hear from different industries—what’s the most frustrating or repetitive data task you deal with and how are you solving it? I do software implementations, and we get customer data exports from legacy systems as CSV or XLSX. Cleaning and mapping them for import is always a pain. Anyone else constantly structuring, formatting, or fixing data? How do you deal with it, any good tools or workarounds? |
|
Convert the xlsx to csv. Every database system out there has blazing fast import of csv files.
As a sql wizard, I prefer to use sql to clean and re-shape data. So my first goal is to get the data into a sql DB as quickly as possible, no cleaning, no re-shaping. Just a raw dump. Now the data is in my house. I clean and re-shape the data with batch update/insert statements. Finally I batch insert to the target tables.
> what’s the most frustrating
Every import job is a custom scenario. I feel special tools don't give you much. You have to understand both the source and destination data to clean and re-shape it. Tools don't have that understanding. AI is less than worthless. At the end of the day you have to roll up your sleeves and start shaping data.