| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by falaki 1205 days ago
	This is the blog post with more details and background: https://www.databricks.com/blog/2023/04/12/dolly-first-open-... Disclosure: I work at Databricks.

4 comments

falaki 1205 days ago

We also open sourced the Dolly model itself with a license that allows commercial use.

link

choppaface 1205 days ago

can you compare your dolly offering with https://github.com/microsoft/DeepSpeedExamples/blob/master/a...

link

oidar 1205 days ago

How hard would it be to get dolly running on llama.cpp?

link

ankitmathur 1205 days ago

Hey there! I worked on Dolly, and I work on Model Serving at Databricks. DollyV1 is GPT-J-based, so it'll run easily on llama.cpp. DollyV2 is Pythia-based, which is built with the GPT-NeoX library

GPT-NeoX is not that different than GPT-J (it also has the rotary embeddings, which llama.cpp supports for GPT-J). I would imagine it's not too heavy of a lift to add NeoX architecture support

link

samstave 1205 days ago

Because the firehost of AI/GPT is a lot to try to take in, please ELI5 unpack and provide more definitions for this comment.

Thank you.

Just so I am clear, "parameters" refers to the number of total node-relation-connections btwn a single node and its neighbors for that Prompt/Label? Or how would you explain this ELI5 style?

link

ankitmathur 1205 days ago

Sure! I'll try to briefly summarize though almost certainly will oversimplify. There are a couple of open source language models trained by Eleuther AI - the first one was called GPT-J, and it used some newer model architecture concepts. Subsequently, they released a model architected in the likeness of GPT-3, called GPT-NeoX-20B. Functionally, it was quite similar architecturally to GPT-J, but just with more parameters. Pythia is a model with the same architecture and the same dataset but with different parameter sizes to test scaling laws.

DollyV2 is a Pythia model fine tuned on the Databricks 15K dataset

link

ankitmathur 1204 days ago

Augmenting the answer to address your followup: parameters are any trainable variable in a model's definition. Model training is a process where you basically tweak the parameters in your model and then re-evaluate the model on a metric judging its quality. A lot of models consist of matrix multiplication, so if you are multiplying matrix A of size 2x2 with matrix B of size 2x2 and both matrices can we tweaked, then you've got 8 parameters, since you've got 8 numbers that can be tweaked

link

anentropic 1205 days ago

it's probably simple for Dolly v1 (?) since it was a fine-tuned version of GPT-J

https://github.com/ggerganov/ggml/tree/master/examples/gpt-j

AFAIK there is no .cpp version of Pythia-12B yet

link

thewataccount 1205 days ago

Would you consider adding Pythia12B, LLaMa and Alpaca since that's what you're directly compared against/based on?

GPT3.5/GPT4 is what everyone would also love to see but I understand you're performance is inline with GPT-neoX.

Vicuna/GPT4all would be intersting but IMO are less important.

RWKV would be interesting because it's a completely different model from the transformers.

EDIT: Also thanks for the opensource contributions! Highly appreciated!

link

brianjking 1205 days ago

Thank you and congrats to you and the team. This is fantastic

link

ingenieroariel 1205 days ago

Thank you, thank you, thank you!

If possible, could you share how Dolly v2 compares to RWKV-4 14B ctx 8019?

link