| > Like, no human is going to read 50k rows, much less 10m rows. Well, its 2AM, some dork has checked in code which breaks production, and it absolutely positively has to be fixed by 6:00am before the customer comes in. Your bleary eyes are scaring through log files and data files, trying to find the answer.. ... believe me, you will appreciate human-readable formats for both of those. You just want to cat out the the entries in the db which the new code can't handle... the last thing you want to do is to have to invoke some other tool or write some other script to make the data human readable. And when you find the problem, you will want to just be able to edit a text file containing test cases to verify the fix. You don't want to write some script to generate and insert the data....at 2am, you are likely to write a buggy script which may keep you from realizing that you've already fixed the problem....or worse, indicate that you have fixed the problem when you haven't. Fewer moving parts is always better. |
This is a classic XY problem. The issue isn't the data format, it's the fact that your organizational processes allow random code pushes at 2am that can break the whole thing.
Parquet, used by basically everyone, isn't human readable (and for good reason): it's for big data storage, retrieval, and processing. CSV is human readable (and for good reason): people use that data in Excel or other spreadsheeting software.