Hacker News new | ask | show | jobs
by aftbit 1098 days ago
I am in fact running my own instance of the Willow Inference Server (née air-infer-api) against a Tesla P4 8GB gifted to me by our mutual friend Richard. It works wonderfully, up to IIRC 3 chunks of audio. We really need to implement streaming so I can use it to close caption videos without subtitles.

For others in this thread, if you haven't tried Willow yet, check it out, as it is an amazing leap forward and can actually run on some pretty small GPUs. LLMs are hogging the AI spotlight but you will struggle to run them on consumer hardware. Image and audio processing models are generally much smaller and more approachable.