| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by spion 1063 days ago
	Its greedy and random :) Instead of a paper, I would recommend the algorithms of most LMM implementations (rwkv.cpp has a relatively clean implementation in python https://github.com/saharNooby/rwkv.cpp/blob/master/rwkv/samp...)

1 comments

painted-now 1063 days ago

I guess I need to sit down and study this stuff in more detail, but do I understand correctly that the code you shared makes the decisions for each position independently? I am just astonished that this produces any coherent output. Also it is not clear to me how the length of the output sequence is determined.

link

pizza 1063 days ago

Once the stop token is likeliest

link