Hacker News new | ask | show | jobs
by tipoffdosage904 35 days ago
I tested a few local and cloud models by asking them to build the same visual HTML animations: cherry blossom tree, solar system, ocean sunset, and wildflower meadow.

This is not meant to be a rigorous benchmark, nor is it a new concept. But it's a very hard coding challenge on one hand which truly tests a model's capability, and it's very easy for humans to assess by looking at the results at a glance.

Qwen 3.6 27B (base + MTP) produced the best quality, but they are very slow on my 48GB M4 Pro Macbook. Qwen 3.6 35B A3B absolutely hits the sweet spot between speed & quality. Using Pi coding agent for a minimal harness and llama.cpp for the most efficient inference backend.

The post includes the prompts, comparison videos, a small benchmark workbench, and a live gallery of the outputs.