|
|
|
|
|
by jaskyle
1459 days ago
|
|
TorchArrow looks pretty cool: TorchArrow is a machine learning preprocessing library over batch data, providing performant and Pandas-style easy-to-use API for model development. Currently it provides a Python DataFrame that allows extensible UDFs with Velox, with the following features: - Seamless handoff with PyTorch or other model authoring, such as Tensor collation and easily plugging into PyTorch DataLoader and DataPipes
- Zero copy for external readers via Arrow in-memory columnar format
- Multiple execution runtimes support:
- High-performance C++ UDF support with vectorization |
|