|
|
|
|
|
by jpivarski
1646 days ago
|
|
It's a C++ library with a Python interface. How much C++ and how much Python is currently in flux (https://indico.cern.ch/event/855454/contributions/4605044/): we're moving toward having the majority of the library be in Python with only the speed-critical parts in C++ behind a C interface. The main library has no JIT, so no LLVM, but there is a secondary method of access through Numba, which has JIT and LLVM. The main method of access is to call array-at-a-time ("vectorized") functions on the arrays, like NumPy. The types of the data in the arrays can be defined at runtime, and are stored in a columnar way, like Apache Arrow. This format allows for operations to be performed on dynamically typed data at compile-time speeds without specialized compilation—it can be done with a suite of precompiled functions. The reason that's possible is because these data structures are never instantiated in the normal "record-oriented" way, they're sets of integer arrays that are only interpreted as data structures, and it's not hard to precompile a suite of functions that only operate on integer arrays. A 3.5 minute explanation of the columnar format and what a columnar operation looks like can be found here: https://youtu.be/2NxWpU7NArk?t=661 |
|