|
|
|
|
|
by maxrmk
360 days ago
|
|
> While the specific internal workings of DeepSeek LLM are still being elucidated, it appears to maintain or approximate the self-attention paradigm to some extent. Totally nonsensical. Deepseeks architecture is well documented, multiple implementations are available online. |
|