|
|
|
|
|
by halfcat
586 days ago
|
|
Is it just the generic, non-descriptive naming, or what do you think is the root of your distaste for pandas? Like if we have a dataclass: obj.thing == value
Or SQL: SELECT * FROM table WHERE thing = ‘value’
We don’t know what the types are, either, without looking it up.The fact the dataframe often changes halfway through the program is, I think, more to do with the task at hand, that often pandas is being used to perform data transformation (the T in ETL), where some raw data is read in, and the goal is literally to change the structure to clean it up and normalize it, so the data can be ingested into a SQL table in a consistent form with other data points. But if transformation is not what you are doing, then yes, that might not be the right use of dataframes. |
|
With all you need to do some work, but I find the Pandas one more involved because you don't have an authoritative "reference", just an initial state then some transformations. With the Pandas example I have to run the program (in my head or actually). The program might need to pull in test data (hopefully some has been provided). The worst is when the structure of the DF is derived from the source data rather than stated in code (e.g. reading in a CSV). It's much more to do than looking at a class definition or declarative database schema; there's a "sequence" to it, there are transformation steps happening to the DF that I need to keep track of.