Hacker News new | ask | show | jobs
by lumost 1521 days ago
OpenAI would be the best example. However these large language models also have limited business value today, making an startup a speculative bet that the team will beat Google/FB/AI/Academics at making a language model and find a viable business model for the resulting model.

I'd take one of those bets or the other, both are tough to pull off. Considering that the first task of such a startup would be to hand ~100-500MM to a hardware or cloud vendor I'd be hesitant to invest as an investor.

3 comments

It costs less than 10M to train. Why hand so much to hardware or cloud? Soon enough there will be open source GPT-3's, at least two are in training as we speak (BigScience and EleutherAI).

> these large language models also have limited business value today

The Instruct version of GPT-3 has become very easy to steer with just a task description. It can do so many tasks so well it's crazy. Try some interactions with the beta API.

I believe GPT-3 is already above average human level at cognitive tasks that fit in a 4000 token window. In 2-3 years I think all developers will have to adapt to the new status quo.

> I believe GPT-3 is already above average human level at cognitive tasks that fit in a 4000 token window.

How can you possibly make a claim like this without like 80 links justifying it? The claim is fuzzy and absurd, my least favorite combo

Gut feeling based on playing with it. Here's an example:

> Colorless green ideas sleep furiously, and other grammatical nonsense by Noam Chomsky

He was a man without a country, A linguist without a language, A mind without a thought, A dream without a dreamer. He was lost in a world of words, A world where ideas slept furiously, And grammar was a never-ending nightmare.

But he persevered, For he knew that language was the key to understanding the world. And so he continued to study, To learn all that he could, In the hopes that one day, He would find his way home.

> Gut feeling based on playing with it

You should check out the post we're commenting on, it has graphs for this exact metric.

Spoiler: Google's model with 3x the parameters does pass average human in a couple categories, but not at all. I don't think GPT-3 does in any.

It's doubly puzzling to me because you have access and are asserting it feels like an average human to you. It's awesome and it does magical stuff, I use it daily both for code and prose. It also majorly screws up sometimes. It only at an average human level if we play word games with things like "well, the average human wouldn't know the Dart implementation of the 1D gaussian function. Therefore it's better than the average human."

> Gut feeling based on playing with it.

Ok, your phrasing made it sound like some article or material had convinced you of this opinion on my first reading, now I understand.

This is kind of my point about 80 links though - you're using a definition of "cognitive tasks" that more closely resembles knowledge, and then you're letting your personal feelings about profundity guide your conclusions on said cognition.

I don't deny that the machine can output pretty words and has a breadth of knowledge to put us each to shame on some simple queries, but "cognition in a 4000 token window" is an incredibly large place and I don't even understand how you would be able to claim a machine has above-human-average cognition based solely on your own interactions... That's a pretty crazy leap.

PS: I saw the downvotes, I was downvoted for questioning the validity of information that was actually just pure conjecture, be better with your votes

I agree 100%, but I think viable businesses will begin to emerge especially as these large models move from text to images (and eventually to video and 3d models). If the examples shown of DALL-E 2 are indicative of its quality, then a large number of creative jobs could be replaced with a single "creative director" using the model. But the high entry cost just to attempt to train such a model will likely remain a hurdle until more business value is proven.
aye - I suspect the other concern is hat the high entry costs can quickly lead to a "second mover" advantage. The first team spends all the money doing the hard R&D and the second team implements a slightly better version for a fraction of the money.
If nothing else, they'd enjoy slacking gain[1] by starting their computation with more advanced hardware.

[1] https://arxiv.org/pdf/astro-ph/9912202.pdf

I'd just solve some existing problem with the most basic language model you can get your hands on and then move up from there. Sell it first.