| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Alifatisk 141 days ago

Pricing for step-3.5-flash:

- Input: $0.1/1M tokens

- Output: $0.3/1M tokens

Its an MoE 196B param model with 11b active params. It handles a context window of 256k tokens.

> by employing a 3:1 Sliding Window Attention (SWA) ratio—integrating three SWA layers for every one full-attention layer. This hybrid approach ensures consistent performance across massive datasets or long codebases while significantly reducing the computational overhead typical of standard long-context models.

I wonder how this models performs in the needle in a haystack bench.

Another interesting bit is the following:

> Step 3.5 Flash brings elite level intelligence to local environments. It runs securely on high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), ensuring data privacy without sacrificing performance

The chinese is pushing hard to democratize llms!