|
|
|
|
|
by devin-petersohn
1466 days ago
|
|
There are loads of things that are not possible or are very cumbersome to write in SQL, but that pandas and many other dataframe systems allow. Examples are dropping null values based on some threshold, one-hot encoding, covariance, and certain data cleaning operations. These are possible in SQL but very cumbersome to write. There are also things that are outright impossible in a relational database related to metadata manipulation. SQL is super expressive, but I think pandas gets a bad rap. At it's core the data model and language can be more expressive than relational databases (see [1]). I co-authored a paper that explained these differences with a theoretical foundation[1]. [1] https://arxiv.org/abs/2001.00888 |
|