Dude, you literally wrote the exact motivation paragraph for Roampal right around the same time I posted this
Thorndike's Law of Effect is the entire reason I built the outcome-scoring (+0.2 for worked, −0.3 for failed) and shift weighting toward proven memories. You're not half-baked — you're 100% right. I just happened to ship the PoC first.
Would love to hear your take on the cold-start problem and whether those reward magnitudes feel right in practice. Shooting you a connection request on LinkedIn if you want to swap notes.
Thanks for the connection mate. Would you mind if I take the opportunity to run some academic memory benchmark on roampal in my local to see whether your idea can beat the other RL based methods?
Thorndike's Law of Effect is the entire reason I built the outcome-scoring (+0.2 for worked, −0.3 for failed) and shift weighting toward proven memories. You're not half-baked — you're 100% right. I just happened to ship the PoC first.
Would love to hear your take on the cold-start problem and whether those reward magnitudes feel right in practice. Shooting you a connection request on LinkedIn if you want to swap notes.