|
|
|
|
|
by nicohayes
263 days ago
|
|
Could you clarify whether the 2B active parameter concept refers to per-token inference and how this scales with context length? Specifically how MoE affects activation during inference and any practical implications for latency. |
|