| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kir-gadjello 1186 days ago

Impressive model, thank you for releasing it under a business-friendly license!

Have you considered using Google's sparse "scaling transformer" architecture as the base? Even at 3B scale it can generate 3-4x more tokens per FLOP while being competitive at perplexity with a dense transformer. I think OpenAI uses a variant of it in their ChatGPT-3.5-Turbo product.

Here is the paper https://arxiv.org/abs/2111.12763 and the implementation https://github.com/google/trax/blob/master/trax/models/resea... if you are interested.

Hope you get to look into this!

3 comments

b33j0r 1186 days ago

Thank you for releasing the weights along with the announcement. The posts that made great headlines, but “weights are on their way!”

Like why did we even get excited? This? Great work.

link

swyx 1186 days ago

> I think OpenAI uses a variant of it in their ChatGPT-3.5-Turbo product.

is that a guess or is there a source? im curious to read more

link

kir-gadjello 1186 days ago

It is a guess informed by some familiarity with the literature and by going over the papers authored by researchers credited in the OpenAI's "GPT-4 contributors" web page.

I have an expanded list of foundational research that is likely to serve as basis for gpt4 here in my blog: https://kir-gadjello.github.io/posts/gpt4-some-technical-hyp...

Hope it helps!

link

blueblimp 1186 days ago

Interesting resource. I had been wondering whether anyone had tried to compile such a list.

link

swyx 1185 days ago

thank you! glad i asked

link

chaxor 1186 days ago

I don't think it's a business friendly license?

link

kir-gadjello 1186 days ago

It allows for modifications and commercial use: https://creativecommons.org/licenses/by-sa/4.0/

>You are free to:

>Share — copy and redistribute the material in any medium or format

>Adapt — remix, transform, and build upon the material

>for any purpose, even commercially.

Compare this to the latest release from StabilityAI lab DeepFloyd, "IF", which in addition to various restrictive clauses strictly prohibits commercial use: https://github.com/deep-floyd/IF/blob/develop/LICENSE-MODEL

Repl.it's release is as open as it gets these days, in my book.

link

LukeShu 1185 days ago

It's a copyleft license; and lots of folks on HN seem to think that copyleft, while being open, isn't business friendly.

link

cosmojg 1183 days ago

Wow! I sincerely wonder how all those folks manage to do business in the tech industry without ever touching Linux, Git, Bash, GCC, glibc, WordPress, Ansible, Grafana, MongoDB, 7-Zip, Vim, Emacs, Firefox, Thunderbird, StackOverflow, Wikipedia, most web fonts, most ad blockers, and all the rest!

link