Hacker News new | ask | show | jobs
by cs702 645 days ago
Watching the video demo was key for me. I highly recommend everyone else here watches it.[a]

From a software development standpoint, usability looks great, requiring only one import,

  import deepsilicon as ds
and then, later on, a single line of Python,

  model = ds.convert(model)
which takes care of converting all possible layers (e.g., nn.Linear layers) in the model to use ternary values. Very nice!

The question for which I don't have a good answer is whether the improvement in real-world performance, using your hardware, will be sufficient to entice developers to leave the comfortable garden of CUDA and Nvidia, given that the latter is continually improving the performance of its hardware.

I, for one, hope you guys are hugely successful.

---

[a] At the moment, the YouTube video demo has some cropping issues, but that can be easily fixed.

2 comments

Thank you!

CUDA and Nvidia are practically impenetrable on the server side. To be very concrete, we did training for our models on AWS with parallel cluster. We used P5 instances (8xH100) that were scheduled with SLURM. A problem we ran into however, was that our training jobs were containerized. Thankfully, pyxis and enroot exist to run containerized jobs on SLURM. And who else, but Nvidia, develop and maintain those plugins. For practically any weird niche use case, Nvidia seems to have some software solution - but only on x86.

Jetson is a whole other beast. There is no guarantee any pip package you install has an aarch64/arm64 wheel. For example, we could not use torch_tensorrt, to compile to TensorRT via Torch Inductor. Why? Because the Bazel build system was only configured to build for Jetpack 4.6 or Jetpack 5.1, and we were using Jetpack 6. While Nvidia provides docker images for x86 systems that come with torch_tensorrt installed, their L4T (Linux for Tegra) images do not. Instead we had to manually write out a new workspace file and compile for Jetpack6 to provide TensorRT compiling support.

tl;dr: Nvidia and CUDA have a great walled garden on x86, not so much on their edge computing devices

My understanding is that, so far, most deployments of AI on edge devices are on mass-market mobile and entertainment devices relying on software and hardware tightly controlled by a handful of mega-corporations, such as Apple (iOS), Google (Android), Samsung (phones, TVs, etc.), and Tesla (proprietary in-car chips for FSD), and so on. Aren't those mega-corporations, not Nvidia, the ones who have the actual walled gardens on AI edge computing?

Do you think otherwise?

You're absolutely right about mobile devices (Apple, Google, etc.). However, most companies, with the exception of Tesla, do use Nvidia for edge computing capabilities. We know for a fact that most of the automotive industry uses automotive rated Orins (the 32GB unified RAM SKU) [1] and Anduril also use Orins. Our primary GTM is with robotics companies, and we have not met a single robotics company not using Jetson, I'm not exaggerating.

[1] Particularly vehicles with advanced self driving capabilities. Qualcomm is another large vendor of hardware for vehicles (though they have even worse support)

> Our primary GTM is with robotics companies, and we have not met a single robotics company not using Jetson, I'm not exaggerating.

Huh. That's a really good sign. I'm rooting for you!

Video cropping issues should be fixed!