|
|
|
|
|
by ursAxZA
172 days ago
|
|
I wasn’t talking about human reinforcement. The discussion has been about CoT in LLMs, so I’ve been referring to the model in isolation from the start. Here’s how I currently understand the structure of the thread (apologies if I’ve misread anything): “Is CoT actually thinking?” (my earlier comment) → “Yes, it is thinking.” → “It might be thinking.”
→ “Under that analogy, self-training on its own CoT should work — but empirically it doesn’t.”
→ “Maybe it would work if you add external memory with human or automated filtering?”
Regarding external memory:without an external supervisor, whatever gets written into that memory is still the model’s own self-generated output — which brings us back to the original problem. |
|