Hacker News new | ask | show | jobs
by kir-gadjello 1138 days ago
Impressive model, thank you for releasing it under a business-friendly license!

Have you considered using Google's sparse "scaling transformer" architecture as the base? Even at 3B scale it can generate 3-4x more tokens per FLOP while being competitive at perplexity with a dense transformer. I think OpenAI uses a variant of it in their ChatGPT-3.5-Turbo product.

Here is the paper https://arxiv.org/abs/2111.12763 and the implementation https://github.com/google/trax/blob/master/trax/models/resea... if you are interested.

Hope you get to look into this!

3 comments

Thank you for releasing the weights along with the announcement. The posts that made great headlines, but “weights are on their way!”

Like why did we even get excited? This? Great work.

> I think OpenAI uses a variant of it in their ChatGPT-3.5-Turbo product.

is that a guess or is there a source? im curious to read more

It is a guess informed by some familiarity with the literature and by going over the papers authored by researchers credited in the OpenAI's "GPT-4 contributors" web page.

I have an expanded list of foundational research that is likely to serve as basis for gpt4 here in my blog: https://kir-gadjello.github.io/posts/gpt4-some-technical-hyp...

Hope it helps!

Interesting resource. I had been wondering whether anyone had tried to compile such a list.
thank you! glad i asked
I don't think it's a business friendly license?
It allows for modifications and commercial use: https://creativecommons.org/licenses/by-sa/4.0/

>You are free to:

>Share — copy and redistribute the material in any medium or format

>Adapt — remix, transform, and build upon the material

>for any purpose, even commercially.

Compare this to the latest release from StabilityAI lab DeepFloyd, "IF", which in addition to various restrictive clauses strictly prohibits commercial use: https://github.com/deep-floyd/IF/blob/develop/LICENSE-MODEL

Repl.it's release is as open as it gets these days, in my book.

It's a copyleft license; and lots of folks on HN seem to think that copyleft, while being open, isn't business friendly.
Wow! I sincerely wonder how all those folks manage to do business in the tech industry without ever touching Linux, Git, Bash, GCC, glibc, WordPress, Ansible, Grafana, MongoDB, 7-Zip, Vim, Emacs, Firefox, Thunderbird, StackOverflow, Wikipedia, most web fonts, most ad blockers, and all the rest!