Hacker News new | ask | show | jobs
by jaskyle 1459 days ago
TorchArrow looks pretty cool:

TorchArrow is a machine learning preprocessing library over batch data, providing performant and Pandas-style easy-to-use API for model development. Currently it provides a Python DataFrame that allows extensible UDFs with Velox, with the following features:

- Seamless handoff with PyTorch or other model authoring, such as Tensor collation and easily plugging into PyTorch DataLoader and DataPipes - Zero copy for external readers via Arrow in-memory columnar format - Multiple execution runtimes support: - High-performance C++ UDF support with vectorization