Hacker News new | ask | show | jobs
by falaki 1158 days ago
This is the blog post with more details and background: https://www.databricks.com/blog/2023/04/12/dolly-first-open-...

Disclosure: I work at Databricks.

4 comments

We also open sourced the Dolly model itself with a license that allows commercial use.
How hard would it be to get dolly running on llama.cpp?
Hey there! I worked on Dolly, and I work on Model Serving at Databricks. DollyV1 is GPT-J-based, so it'll run easily on llama.cpp. DollyV2 is Pythia-based, which is built with the GPT-NeoX library

GPT-NeoX is not that different than GPT-J (it also has the rotary embeddings, which llama.cpp supports for GPT-J). I would imagine it's not too heavy of a lift to add NeoX architecture support

Because the firehost of AI/GPT is a lot to try to take in, please ELI5 unpack and provide more definitions for this comment.

-

Thank you.

Just so I am clear, "parameters" refers to the number of total node-relation-connections btwn a single node and its neighbors for that Prompt/Label? Or how would you explain this ELI5 style?

Sure! I'll try to briefly summarize though almost certainly will oversimplify. There are a couple of open source language models trained by Eleuther AI - the first one was called GPT-J, and it used some newer model architecture concepts. Subsequently, they released a model architected in the likeness of GPT-3, called GPT-NeoX-20B. Functionally, it was quite similar architecturally to GPT-J, but just with more parameters. Pythia is a model with the same architecture and the same dataset but with different parameter sizes to test scaling laws.

DollyV2 is a Pythia model fine tuned on the Databricks 15K dataset

Augmenting the answer to address your followup: parameters are any trainable variable in a model's definition. Model training is a process where you basically tweak the parameters in your model and then re-evaluate the model on a metric judging its quality. A lot of models consist of matrix multiplication, so if you are multiplying matrix A of size 2x2 with matrix B of size 2x2 and both matrices can we tweaked, then you've got 8 parameters, since you've got 8 numbers that can be tweaked
it's probably simple for Dolly v1 (?) since it was a fine-tuned version of GPT-J

https://github.com/ggerganov/ggml/tree/master/examples/gpt-j

AFAIK there is no .cpp version of Pythia-12B yet

Would you consider adding Pythia12B, LLaMa and Alpaca since that's what you're directly compared against/based on?

GPT3.5/GPT4 is what everyone would also love to see but I understand you're performance is inline with GPT-neoX.

Vicuna/GPT4all would be intersting but IMO are less important.

RWKV would be interesting because it's a completely different model from the transformers.

EDIT: Also thanks for the opensource contributions! Highly appreciated!

Thank you and congrats to you and the team. This is fantastic
Thank you, thank you, thank you!

If possible, could you share how Dolly v2 compares to RWKV-4 14B ctx 8019?