| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by osanseviero 801 days ago

Zephyr 141B is a Mixtral 8x22B fine-tune. Here are some interesting details

- Base model: Mixtral 8x22B, 8 experts, 141B total params, 35B activated params

- Fine-tuned with ORPO, a new alignment algorithm with no SFT step (hence much faster than DPO/PPO)

- Trained with 7K open data instances -> high-quality, synthetic, multi-turn

- Apache 2

Everything is open:

- Final Model: https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v...

- Base Model: https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1

- Fine-tune data: https://huggingface.co/datasets/argilla/distilabel-capybara-...

- Recipe/code to train the model: https://huggingface.co/datasets/argilla/distilabel-capybara-...

- Open-source inference engine: https://github.com/huggingface/text-generation-inference

- Open-source UI code https://github.com/huggingface/chat-ui

Have fun!

3 comments

loudmax 801 days ago

I like that they say how the model was trained for 1.3 hours on 4 nodes of 8 x H100s. By my rough calculation, that should probably have cost around $100 or so. (At $2 per hour, x 8 gpus x 4 nodes). Not free, but pretty cheap in the scheme of things. At least, once you know what you're doing.

link

dloss 801 days ago

I wanted to write that TGI inference engine is not Open Source anymore, but they have reverted the license back to Apache 2.0 for the new version TGI v2.0: https://github.com/huggingface/text-generation-inference/rel...

Good news!

link

leblancfg 801 days ago

What does ORPO stand for? Can't seem to find related links.

link

cateye 801 days ago

Odds Ratio Preference Optimization (ORPO): https://arxiv.org/abs/2403.07691

link