Hacker News new | ask | show | jobs
by arkonrad 307 days ago
On cookies: we use an HTTP cookie (ark_session_id) purely as an opaque session identifier. The cookie is how the client ties subsequent requests to the same pinned session/worker/GPUs on the provider side so the provider can keep the model activations/state in GPU memory between calls. Not a magic for the model; it’s a routing key that enables true session affinity.

On “thinking steps” and contamination: good point - naively persisting raw chain-of-thought tokens can degrade outputs. ARKLABS Stateful approach is not a blanket “store everything” policy.

And my criticism targets higher-level provider practices: things like response caching, aggressive prompt-matching / deduplication heuristics, or systems that return previously generated outputs when a new prompt is “similar enough.” Those high-level caches absolutely can produce the behaviour I described - a subtle prompt change that nevertheless gets routed to a cached reply.

The platform has been launched — we’re collecting data, but early results are very promising: we’re seeing linear complexity, lower latency, and ~80% input-token savings. At the same time we’d love to hear more feedback on whether this approach could be useful in real-world projects.

And about going against the grain, as you mentioned at the end… well — if startups didn’t think differently from everyone else, what would be the point of being a startup?