|
|
|
|
|
by areddyyt
643 days ago
|
|
We've spent a lot of time thinking about these things, in particular, the 3Ps. Part of making the one line of code work is addressing programmability. If you're on Jetson, we should load the CUDA kernels for Jetson's. If you're using a CPU, we should load the CPU kernels. CPU with AVX512, load the appropriate kernels with AVX512 instruction, etc. The end goal is that when we introduce our custom silicon, one line of code should make it far easier to bring customers over from Jetson/any other platform because we handle loading the correct backend for them. We know this will be bordering impossible, but it's critical to ensure we take on that burden rather than shifting it to the ML engineer. |
|