Hacker News new | ask | show | jobs
by factoidforrest 1180 days ago
I'd like to know that as well. I went pretty far into looking at alternatives for the post. Best thing you can really do is try them yourself with a tool like this https://github.com/oobabooga/text-generation-webui

Benchmarks are one thing but I suspect if any were truly on par we would know about it.

Obviously it's partly compute cost, but I also suspect there's a lot of R&D that would need to be redone in the open.

How many tricks does openai's pretraining have that aren't found in some paper somewhere?

1 comments

Taking a paper and turning it into working production code is a non-trivial process, 100%.

Training big models takes a lot of random reads/writes and those tend to be pretty latency sensitive. There _may_ be a way to train this BitTorrent style with donated compute, but it's hard to say how many orders of magnitude slower that would be. (Do you need 2x more compute to do it distributes? 10? 100x?)

It is an interesting question to be able to explore this space more!