Hacker News new | ask | show | jobs
by noosphr 20 days ago
Racket is an amazing language for prototyping ideas that you don't understand yet.

At $dayjob I'm using it to test what novel geometries of deep learning models would look like. Being able to redefine any part of the stack for any reason is a superpower you don't know you need until you do.

A great place to start is the little learner which holds your hand until you get opinionated about what the underlying primitives should look like. E.g. what if we used sparse tensor representation?

2 comments

You might like having a go at Lush. It has fallen out of favor of late but is a very interesting language/system.

https://scottlocklin.wordpress.com/2024/11/19/lush-my-favori...

Sounds interesting but I'm using very spare very high rank tensors, e.g. rank 3 neuron equivalents.

As such pretty much all numerical optimisations are useless for my work. Racket however chugs along happily, if slowly.

That sounds kind of amazing. But you're not actually doing the machine learning in Racket, are you? Is your Racket code generating other code like PyTorch?
I'm doing the learning in racket because the bottleneck is human understanding.

That mnist takes 30 minutes per epoch isn't a worry when I don't even know what vector addition should look like.

> I don't even know what vector addition should look like.

I think you're trying to imply you're inventing something new and racket enables you to explore... But what I read (as someone with a PhD in deep learning that has worked on sparsity) is you actually don't know the prior art and you're using racket as an excuse to reinvent a whole bunch of stuff that already exists in plenty of mature libraries in more mundane languages (including python/pytorch). Which is of course fine for personal growth but please don't oversell racket as a "superpower" - to wit I can manipulate any part of my stack too because it's all written in cpp.

I once replaced IEEE 754 floating point numbers in a model by balanced ternary floating point numbers.

It took me 20 minutes.

Tell me how you'd do that in cpp?

lol the same way we implement all of the reduced precision fp8, fp4 types today: by storing them in the corresponding uint:

https://github.com/ggml-org/llama.cpp/discussions/15095

Balanced ternary fp is not a reduced precision type of binary fp: https://arxiv.org/abs/2512.10964

>Unlike their binary counterparts, posits and takums, tekums simultaneously accommodate both ∞ and NaR, while retaining the simplicity of negation by flipping the underlying trit string. Perhaps most strikingly, tekums enable rounding by truncation, a property that eradicates at a stroke some notorious problems of rounding in binary arithmetic: double rounding errors, cascading carries in hardware, and the attendant inefficiencies.

This is a complete tangent, but since you mentioned MNIST: I accidentally discovered Tsetlin machines this week when someone on r/Julia asked if anyone with an AMD GPU could run the benchmark in their package called Tsetlin.jl. I've got an AMD GPU so I was happy to oblige. Then I looked at what the benchmark was doing: it was training an MNIST classifier to 98% accuracy in 9 seconds - that seemed like a couple of orders of magnitude too fast. I was flabbergasted and wondered what the heck this thing was and that's when I learned about Tsetlin machines. I went on (with the help of Claude) to implement one in an FPGA and again was flabbergasted when it only took 2k LUTs to implement a Tsetlin machine for MNIST classification in hardware.
Well yes, you have to use one of the newer mnist variants these days if you want to get anything meaningful. A linear classifier gets something like 87% on the original one.