|
|
|
|
|
by dmitrykoval
1872 days ago
|
|
Following similar observations I was wondering if one can actually execute SQL queries inside of a Python process with the access to native Python functions and Numpy as UDFs. Thanks to Apache Arrow one can essentially combine DataFrame API with SQL within data analysis workflows, without the need to copy the data and write operators in a mix of C++ and Python, all within the confines of the same Python process. So I implemented Vinum, which allows to execute queries which may invoke Numpy or Python functions as UDFs available to the interpreter.
For example: "SELECT value, np.log(value) FROM t WHERE ..". https://github.com/dmitrykoval/vinum Finally, DuckDB makes a great progress integrating pandas dataframes into the API, with UDFs support coming soon. I would certainly recommend giving it a shot for OLAP workflows. |
|