Hacker News new | ask | show | jobs
by kiraaa 979 days ago
the paper does not live up to the quality of model lol
2 comments

Maybe these models should start writing themselves up.

Provide the model with an outline of a 20-or-so page research paper about itself and have it fill in the blanks. The researchers might have to provide textual description of the figures in the “experiments” section.

Is it better than llama 2?
It is better than llama 2 7b and 13b. I tried the OpenOrca fine tune and it is very good, even when 4-bit quantized
What does OpenOrca do? It’s just instruction tuning it?
Yes, it is a instruction tune dataset: https://huggingface.co/datasets/Open-Orca/OpenOrca

It felt different from the official Mistral7B-Instruct. One of the highlights with the OpenOrca version is that you can steer the model with a system prompt (eg "You are a 5 year old")

For its size, yes. In absolute terms it is obviously less capable than llama-2-70B
For now. Huggingface[0] mentioned a DPO-fine-tuned version, Zephyr 7B, which it claims is competitive with Llama2-70B[1].

[0]: https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat

[1]: https://twitter.com/huggingface/status/1711780979574976661

Oh, they uploaded the weights. I missed this one, cheers!
I found llama-2-70B to be a bit worse than GPT-4. (So, pretty good!) But I did not compare with GPT-3.

How do llama-2-70B and Mistral 7B compare with GPT-3?

Yes