|
|
|
|
|
by Alifatisk
141 days ago
|
|
Pricing for step-3.5-flash: - Input: $0.1/1M tokens - Output: $0.3/1M tokens Its an MoE 196B param model with 11b active params. It handles a context window of 256k tokens. > by employing a 3:1 Sliding Window Attention (SWA) ratio—integrating three SWA layers for every one full-attention layer. This hybrid approach ensures consistent performance across massive datasets or long codebases while significantly reducing the computational overhead typical of standard long-context models. I wonder how this models performs in the needle in a haystack bench. Another interesting bit is the following: > Step 3.5 Flash brings elite level intelligence to local environments. It runs securely on high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), ensuring data privacy without sacrificing performance The chinese is pushing hard to democratize llms! |
|