Hacker News new | ask | show | jobs
by neverrroot 895 days ago
This looks like an interesting opportunity to learn something new. Anyone knows about any open source projects in this area?
4 comments

GPTs are essentially a proprietary moat-building of several open-source workflows, most especially system prompt engineering, tool-usage, and RAG.

Unfortunately there's no magic open-source solution since there's a lot of moving pieces involved that are bespoke to a given use case, and the ones that claim to be magic are libraries like LangChain, which aren't.

This is the most accurate and clear-eyed take I've seen on GPTs so far. They might be useful, but they're not magic and they're intended to enhance OpenAI's moat-building operations to make it harder for people/companies to walk away from the platform amid future competition.
I believe GPTs are an attempt by OpenAI to generate training data. How can you get data at level N+1 when you have a model at level N? You give it more resources - more tokens (CoT), more dialogue rounds, code execution, web search, local KB, human-in-the-loop. A model with feedback from human and tools can do so much more. And by training on this data they can incorporate these skills in the next generation. It's like RLHF in the sense that the training data contains portions generated by the model itself (specifically model errors) and feedback. It's on policy data, generated with the involvement of the model, not something you can scrape from the web.

Let's do an estimation - if they have 100M users and each of them generates 10K tokens in a month, that's 1T tokens per month. In a year they have generated 12T tokens, which is very close to the GPT-4 training set size of 13T. Looks like they can generate serious data with this method. They don't even need to train directly on it, they could rewrite it as high quality training examples, without copyright and PII risks, because LLMs are great at rewriting and rewording and MS has already shown that synthetic data is better.

Google lost the start and they don't have the human-AI chat logs OpenAI sits on. So they are trying to do the same trick but without the human in the loop. Hence the declarations that Gemini will use some techniques from AlphaZero. They are teaching models by feedback too.

There really should be open source versions.

It's perplexing why there aren't, especially when the individual components are relatively straightforward (code interpreter, RAG, search, function calling, image generation).

It depends on what you are looking for here. If you want to build with LLMs, there are a number of open source options which can be self hosted (although they may not perform on par with GPT-4).

If you are referring to the assistants API which adds some more complex behavior, there is LangChain as others mentioned, but also some more turn key, self hosted options (which I have not tried) such as

- https://github.com/stellar-amenities/assistants

- https://github.com/transitive-bullshit/OpenOpenAI

If you are referring to the marketplace itself, most developers are currently rolling their own web apps with billing and auth while they wait for OpenAI’s offering.

Finally, as a shameless plug, I’ve been working with some friends on a marketplace which provides auth and billing but decouples you from a specific model provider and the high platform fees they may plan to charge. It isn’t open source but we think it might strike the right balance. https://market.interactwith.ai/

GPTs is just a glue layer. It adds some but small values on top of GPT4, emphasizing on small.

If you don’t have GPT4, no value is added

> Anyone knows about any open source projects in this area?

^^ Look no further! LOOKS LIKE A PROBLEM FOR AI!