Hacker News new | ask | show | jobs
by patrick-fitz 816 days ago
Looking at the license restrictions: https://github.com/databricks/dbrx/blob/main/LICENSE

"If, on the DBRX version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Databricks, which we may grant to you in our sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Databricks otherwise expressly grants you such rights."

I'm glad to see they aren't calling it open source, unlike some LLM projects. Looking at you LLama 2.

6 comments

Well, it does still claim "Open" in the title, for which certain other vendors might potentially get flak around here, in a comparably not-open-in-the-way-we-demand-it-to-be kinda setup.
Its literally described as open source all over.

https://www.databricks.com/blog/announcing-dbrx-new-standard...

Its even implied in comparisons everywhere:

> Figure 1: DBRX outperforms established open source models on language understanding (MMLU), Programming (HumanEval), and Math (GSM8K).

> The aforementioned three reasons lead us to believe that open source LLMs will continue gaining momentum. In particular, we think they provide an exciting opportunity for organizations to customize open source LLMs that can become their IP, which they use to be competitive in their industry.

Just search "open source".

Yes, there are using different wording in different articles:

https://www.databricks.com/blog/introducing-dbrx-new-state-a...

The only mention of open source is:

> DBRX outperforms established open source models

https://www.databricks.com/blog/announcing-dbrx-new-standard...

Open source is mentioned 10+ times

> Databricks is the only end-to-end platform to build high quality AI applications, and the release today of DBRX, the highest quality open source model to date, is an expression of that capability

https://github.com/databricks/dbrx

On Github it's described as an open license, not an open source license:

> DBRX is a large language model trained by Databricks, and made available under an open license.

The release notes on the databricks console definitely says open source. If you click the gift box you will see: Try DBRX, our state-of-the-art open source LLM!
Ironically, the LLaMA license text [1] this is lifted verbatim from is itself probably copyrighted [2] and doesn't grant you the permission to copy it or make changes like s/meta/dbrx/g lol.

[1] https://github.com/meta-llama/llama/blob/main/LICENSE#L65 [2] https://opensource.stackexchange.com/q/4543

I do wonder what value those companies who have >700 million users might get from this?

Pretty much all of the companies with >700 million users could easily reproduce this work in a matter of weeks if they wanted to - and they probably do want to, if only so they can tweak and improve the design before they build products on it.

Given that, it seems silly to lose the "open source" label just for a license clause that doesn't really have much impact.

The point of the more than 700 million user restriction. Is so Amazon, Google cloud or Microsoft Azure. Can not setup an offering where they host and sell access to the model without an agreement with them.

This point is probably inspired by the open source software vendors that have switched license over competition from the big cloud vendors.

Also aren't claiming they are the best LLM out there when they clearly aren't like Inflection. Overall solid