|
|
|
|
|
by jzwinck
2179 days ago
|
|
The subtitle is "How can you process more data quicker?" NumPy. It scores an A in Maturity and Popularity, and either an A or a B in Ease of Adoption depending on which Pandas features you use (e.g. GroupBy). When you're using NumPy as the main show instead of an implementation detail inside Pandas, it is easier to adopt Numba or Cython, and there are huge gains to be made there. Most Pandas workloads on small clusters of say 10 machines or fewer could be implemented on a single machine. Even simple operations on smallish data sets are often much faster in NumPy than Pandas. You don't have to leave Pandas behind, just try using NumPy and Numba for the hot parts of your code. Numba even lets you write Python code that works with the GIL released, which can lead to linear speedup in the number of cores with much less work than multiprocessing without the overhead of copying data to multiple processes. |
|