| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by exabrial 53 days ago
	A 2.4gb ONNX? That is wild. This format continues to impress me. ONNX uses 32bit single precision floats I believe, so thats something like ~644m float params/constants. I recently dove deep 'traditional ML' side of the ONNX serialization format for the purposes of writing an JVM ML compiler for trees and regressions. ONNX actually quite clever the way it serializes trees into parallel arrays (which is then serialized using protobuf). My trees have capped out at < 32mb. I haven't dove into the neural net side of things yet, mainly because I don't have any models to run in prod.(https://github.com/exabrial/petrify if anyone is interested.)

3 comments

vunderba 53 days ago

Same, I really like the ONNX format. I only wish that they weren't so frustratingly difficult to use on Apple iOS. Their browser engine, WebKit, has become annoyingly restrictive over the years in terms of the working memory cap.

I ran into quite a few out-of-memory iOS safari issues when I was building continuous voice recognition for my blind chess game, so people could play while on the go.

link

bring-shrubbery 53 days ago

Interesting, what use cases are you using onnx for btw?

link

vunderba 53 days ago

So I use a VAD onnx (Silero [1]) to automatically detect when someone is talking, and then it sends the audio into one of the voice recognition libraries.

I originally tried to get away with just Whisper Tiny in the chess game [2], but it performs worse on the kinds of short phrases (knight E4, c takes d5, etc) used to dictate chess notation. Even with hotword-based phrasing and corrections, I found its accuracy on brief inputs noticeably poorer. So I switched over to Sherpa [3] trained on gigaspeech. It’s significantly more accurate, but it also comes with a correspondingly larger memory footprint.

Ideally, I would have used just one engine, but I needed a fallback for iOS devices (especially older ones) which can easily OOM.

[1] - https://github.com/snakers4/silero-vad

[2] - https://shahkur.specr.net

[3] - https://github.com/k2-fsa/sherpa-onnx

link

Tsarp 53 days ago

RNNoise has a VAD inbuilt that works much better than silero.

https://github.com/xiph/rnnoise

link

bring-shrubbery 53 days ago

Yeah it's pretty cool what a 2gb NN can do from a single image

link

ollin 53 days ago

Most ONNX files are fp32, but the ONNX format actually allows fp16, int8, etc. as well (see onnx.proto for the full list of dtypes [1] - they even have fp8/fp4 these days!). I ended up switching over to fp16 ONNX models for my own web-based inference project since the quality is ~identical and page loads get 2x faster.

[1] https://github.com/onnx/onnx/blob/main/onnx/onnx.proto#L605

link

exabrial 53 days ago

Thanks for the pointer actually. I need to take a look at this version of the spec.

link