|
|
|
|
|
by eth0up
667 days ago
|
|
Question: I started with a deliberately convoluted PDF which after much effort I filtered, sorted, reorganized and transferred the 18000 useful lines to a csv. These lines are simple, with dates, indicator and corresponding numbers. The purpose is to statically analyze the numbers for anomalies or any signs of deviation from expected randomness. I do this all in python3 with various libraries. It seems to be working, but... What is a more efficient format than csv for this kind of operation? Edit: I have also preserved all leading zeros by conversion to strings -- csv readers don't care much for leading zeros and simply disappear them, but quotes fix that. |
|
My rule of thumb is that anything that fits into Excel (approx 1M lines) is "small data" and can be analysed with Pandas in memory.