| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by threeseed 794 days ago
	But we know from Google that unless you can definitively solve the "is this sentence real or a joke" datasets like Twitter, Reddit etc are going to be more trouble than they are worth. And Elon's recent polarising nature and the callous nature with which he disbanded the Tesla Supercharger team means that truly talented people aren't going to be as attracted to him as in his early days. They are only going to be there for the money.

3 comments

jokethrowaway 794 days ago

The datasets should not be used for knowledge but to train a language model.

Using it for knowledge is bonkers.

Why not buy some educational textbook company and use 99.9% correct data? Oh and use RAG while you are at it so you can point to the origin of the information.

The real evolution still has to come though, we need to build a reasoning engine (Q*?) which will just use RAG for knowledge and language models to convert its thought into human language

link

dagmx 794 days ago

How does one differentiate knowledge from the language model in an LLM? At least in a way that would provide a benefit?

link

kolinko 794 days ago

You use formal verification for logic and rags for source data.

In other words - say you have a model that is semi-smart, often makes mistakes in logic, but sometimes gives valid answers. You use it to “brainstorm” physical equations and then use formal provers to weed out the correct answer.

Even if the llm is correct 0.001% of the time, it’s still better than the current algorithms which are essentially brute forcing.

link

dagmx 794 days ago

I’m still confused as to the value of training on tweets though in that scenario?

If you need to effectively provide this whole secondary dataset to have better answers, what value do the tweets add to training other than perhaps sentiment analysis or response stylization?

link

lynx23 794 days ago

I still fondly remember the story an OpenAI rep told about fine-tuning with company slack history. Given a question like "Can you do this and that please." the system answered (after being fine-tuned with said history) "Sure, I'll do it tomorrow." Teaches you to carefully select your training data.

link

LegitShady 794 days ago

>Twitter Supercharger team

interesting.

link