| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by terran57 1225 days ago

From the article:

"Of course, you need a sufficiently large model to be able to learn from all this data, which is why GPT-3 is 175 billion parameters and probably cost between $1m-10m in compute cost to train.[2]"

So, perhaps better title would be "GPT in 60 Lines of Numpy (and $1m-$10m)"

5 comments

rvz 1225 days ago

And it will be even more expensive to train it again on larger amounts of data and with a model with 10 times more parameters.

Only Big Tech giants like Microsoft, Google, etc can afford to foot the bill and throw away millions into training LLMs, whilst we celebrate and hype about ChatGPT and LLMs getting bigger and significantly more expensive to train when they get confused, hallucinate over silly inputs and confidently generate bullshit.

That can't be a good thing. OpenAI's ClosedAI model needs to be disrupted like how Stable Diffusion challenged DALLE-2 with an open source AI model.

Kranar 1225 days ago

I disagree, I run a small tech company that has a group that's been experimenting with stable diffusion and we noticed that an extreme version of the Pareto Principle applies here as well where you can get ~90% of the benefits for like 5% of the cost, combined with the fact that computing power is continuously getting cheaper.

Based on that groups success, they've recently proposed a mini project inspired by GPT that I am considering funding; the data its trained on is all publicly available for free, and most it comes from Common Crawl. I suspect that it will also yield similar results, where you can tailor your own version of GPT and get reasonably good models for a fraction of the price as well. We're no where close to the scale of Big Tech giants, but I've noticed for the better part of 15 years that small companies can actually derive a great deal of the benefits that larger companies have for a fraction of the cost if they play it smart and keep things tight.

99_00 1225 days ago

Do you think it is possible for the AI to request information to fill in gaps in it's model?

For example, the AI doesn't have enough information about a companies process, or a regulation. It chats with an expert to fill in the gaps.

I have no understanding of AI

simonw 1225 days ago

This is happening already. The trick is to run a search against an existing search engine, then copy and paste the search results into the language model and ask it to answer questions based on what you provide it.

This is how the new Bing Assistant works. It's also how search engines like https://you.com/ and https://www.perplexity.ai/ work - as exposed by a prompt leak attack against Perplexity a few weeks ago: https://simonwillison.net/2023/Jan/22/perplexityai/

I wrote a tutorial about one way of implementing this pattern yourself here: https://simonwillison.net/2023/Jan/13/semantic-search-answer...

crosen99 1225 days ago

A small difference between the pattern you describe and the one of the inquiry is where responsibility lies for retrieving and incorporating the augmentation. You describe the pattern where an orchestration layer sits in front of the model, performs the retrieval, and then determines how to serve that information down to the model. The inquiry asks about whether the AI/model itself can perform the retrieval and incorporation function.

It’s a small difference, perhaps, but with some significance since the retrieval and incorporation occurring outside the model has a different set of trade offs. I’m not specifically aware of any work where model architectures are being extended to perform this function directly, but I am keen to learn of such efforts.

HellsMaddy 1225 days ago

Yes, check out LangChain [0]. It enables you to wire together LLMs with other knowledge sources or even other LLMs. For example, you can use it to hook GPT-3 up to WolframAlpha. I’m sure you could pretty easily add a way for it to communicate with a human expert, too.

[0]: https://github.com/hwchase17/langchain

alfor 1225 days ago

Yes.

It’s trained on completing the text.

If an expert write a long test and you and "in summary: " at the end, the model will complete with something approximating truth (depend on size of model, training, etc)

Humains do a similar things. We have a model in our head of the subject discussed and we can summarize, but we will forget some parts, make errors, etc. GPT is very similar.

TheCoreh 1225 days ago

It is! You can specify on its prompt that it should "request additional info via search query, using the following syntax: [[search terms here]], before coming to a final conclusion" then you integrate it with a traditional knowledge base textual look up, and run it again with that information concatenated

int_19h 1225 days ago

Stable Diffusion could do it because the task turned out to be amenable to reasonably small models. But there's no evidence of that being the case with GPT.

That said, other organizations that can afford to foot the bill for it are the governments. This is hardly ideal, since such models will also come with plenty of strings attached - indeed, probably more than the private ones - but at least these policies are somewhat checked by democratic mechanisms.

Long-term I think the demand for more AI compute power will lead to much more investment in GPU design and manufacture, driving the prices down. Since the underlying tech itself is well-understood, I fully expect to see the day when one can train and run a customized GPT-3 instance for one's private use, although the major players will likely be far ahead by then.

pumanoir 1225 days ago

I saw this [1] presentation where they use scheme to train GPT on a single consumer GPU. I've had no luck finding the 'scorch' compiler they mentioned in the video.

1. https://youtu.be/rDke29MbKQA?list=PLyrlk8Xaylp7NvZ1r-eTIUHdy...

zeknife 1225 days ago

There are GPT-2 checkpoints small enough to run on basically any modern computer

MuffinFlavored 1225 days ago

Will one business model be for OpenAI to "license" out access to their trained model?

How large is the model on disk(s) once it is trained?

theptip 1225 days ago

Perhaps I’m missing your point, but isn’t that what they do with their API right now? You pay for text completions, and can fine-tune their model with your data.

veqq 1225 days ago

But you can't run the code on your own machine.

est 1224 days ago

> But you can't run the code on your own machine.

iirc GPT-3 itself alone is some 500TB in size. You need a really, really big machine to run LLMs, the first L means Large.

mattnewton 1225 days ago

Of course, if they leaked the model weight’s and a local inference binary for it they would lose the ability to charge for it. Clones with the weights would crop up all over the place.

shagie 1225 days ago

From various sources, the model itself is about 800 GB on disk.

hackernewds 1225 days ago

They must have time traveled to today in the past and read your comment, since this is precisely their business model!

99_00 1225 days ago

Anyone know what the minimum cost for creating a model is and what the limitation would be?

sharemywin 1225 days ago

this is pretty small:

https://github.com/karpathy/nanoGPT