|
|
|
|
|
by randomifcpfan
502 days ago
|
|
The DeepSeek R1 paper explains how they trained their model in enough detail that people can replicate the process. Many people around the world are doing so, using various sizes of models and training data. Expect to see many posts like this over the next three months. The attempts that use small models will get done first. The larger models take much longer. Small r1 style models are pretty limited, so this is interesting primarily from an “I reproduced the results” point of view, not a “here is a new model that’s useful” pov. |
|