Hacker News new | ask | show | jobs
by dale_glass 886 days ago
Sounds excellent! What are the requirements to run this regarding hardware? How much VRAM? Does it work on AMD or Intel Arc?
1 comments

Both models are using around 3GB right now (converted into FP16 for speed). But I checked that the (slower) FP32 version uses 2.3GB so we are probably doing something suboptimal here.

We support CUDA right now although it should not be too hard to port it to whisper/llama.cpp or Apple's MLX. It's a pretty straightforward transformer architecture.