Hacker News new | ask | show | jobs
by AndrewKemendo 1093 days ago
First hit is always free.

Don’t forget what your dealing with here: The faceless, amoral, infinitely ravenous, maw of the most efficient personal data succubus in history. Make no mistake this is something like “goodwill capture” instead of “regulatory capture.”

I see no way that this diminishes Meta’s power in any way - arguably it strengthens it by making it easier to choose a Meta architecture instead of creating a competing FOSS architecture.

So arguably all this does is raise the FOSS bar technically further entrench Meta - AND with the most important thing, having thousands of developers prime their data architectures for Meta models to eventually serve from a Meta account.

And once it’s widespread enough to lock you in, those commercial terms, whoops they changed!

1 comments

As opposed to simply being locked into openai api's as the only option?
A false dilemma, also referred to as false dichotomy or false binary, is an informal fallacy based on a premise that erroneously limits what options are available.[1]

[1]https://en.wikipedia.org/wiki/False_dilemma

These models cost millions to train. The only reason open-source LLMs have a heartbeat is they’re standing on Meta’s weights. The only third path is a public option.
> The only reason open-source LLMs have a heartbeat is they’re standing on Meta’s weights.

Not necessarily.

RWKV, for example, is a different architecture that wasn't based on Facebook's weights whatsoever. I don't know where BlinkDL (the author) got the training data, but they seem to have done everything mostly independently otherwise.

https://github.com/BlinkDL/RWKV-LM

disclaimer: I've been doing a lot of work lately on an implementation of CPU inference for this model, so I'm obviously somewhat biased since this is the model I have the most experience in.

My personal bet is specialised models have a niche. Do you think one of these could compete with GPT if e.g. trained on a law firm’s correspondence and contracts?
Probably not, honestly—because it's an RNN, old information gradually deteriorates as new information is fed into the model, which is undesirable compared to e.g. transformers that can reference any part of the context without degradation, but have a hard limit on context size (RWKV can ingest a theoretically infinite number of tokens, but after around 16k it will start to degrade into madness until restarted, so practically it does sort of have a limit).

(The reason why it degrades is because a single internal state is updated in-place per token, and the currently models have only been trained with up to 8192 tokens of context, so once you start getting double past that or so, the state starts to diverge from "sanity", with no known way to correct this. And then priming a new instance of the model with 8192 tokens or so of the new context takes a really long time because you can't compute the next token of an RNN until you also have the previous one!)

With some fine-tuning (which, even that is ... still out of reach for most people unfortunately, but I digress) it can be turned into a pretty good chat model, generate story completions, generate boilerplate code etc. and the base model is reasonably okay at most of these things already.

I think it's definitely a competitor in some areas, though I don't remember if there have already been benchmarks putting it up against the other models. I do know that it's better than the majority of other open-source models, including transformer-based ones, but this is probably more the fault of training data than architecture.

Didn't the whole "we have no moat" paper show how this is actually not the case and that the future is far brighter for open-source LLMs?