Hacker News new | ask | show | jobs
by nathan-barry 216 days ago
Actually NVIDIA made one earlier this year, check out their Fast-dLLM paper
1 comments

Thanks I’ll check it out!
Did I miss something? https://github.com/NVlabs/Fast-dLLM/blob/main/llada/chat.py

That’s inference code, but where is the high perf web server?