| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nwnwhwje 648 days ago

Silly question time.

Is this a fined tuned LLM, for example drop in replacement for Llama etc.

Or is it some algorithm on top of an LLM, doing some chain of reasoning?

1 comments

peakji 648 days ago

It is an LLM fine-tuned using a new type of dataset and RL reward. It's good at reasoning, but I would not recommend to replace Llama for general tasks.

link