| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by byt143 1053 days ago
	What tasks?

1 comments

treprinum 1053 days ago

For processing trillion documents for example NER can be done much better.

link

chaxor 1053 days ago

This tradeoff is ridiculous, even if it is "better" by .01% F score. I would much rather have a dataset created in 1 day from BERT at 98% F-score than 1000 years at 98.01% F-score from a 540B parameter model, or even a 33B parameter model. The performance in million parameter models for NER is still excellent, and works at speed that are usable. Running things through OpenAI is also useless, as it would cost a few million $.

link

treprinum 1053 days ago

It's more like 100% accuracy vs 95% accuracy, and the super large models are now able to extract non-trivial derived info from a regular human speech as well. While cost-wise it's not efficient right now, this will change over time (you skate to where puck will be, not where it is now), making the current fine-tuning way obsolete. Academically I am not thrilled as I built my research on fine-tuning, but as a producer of a product this solves so many issues at the same time, making me pretty happy.

link

byt143 1053 days ago

It's really depressing that a handful of big corporations will be able to exert such control over labor and productivity

link

treprinum 1053 days ago

That's why the LLaMA 2 release is so significant. You can run the full 70B in 8-bit on two prosumer A6000 Ampere (cost around $10k together) which is within the reach of most companies and some devs. This could further accelerate all research to make it both efficient and available even to regular folks.

link

amkkma 1053 days ago

But it's still not comparable to GPT 4, nor will it likely be for some time at least. And by the time we have GPT 4 class open source models, I'd imagine there'd be significant advancements in closed source models, such as inference time symbolic reasoning using MCTS, something google is working on for gemini...or just bigger/better architectures, data etc

link

treprinum 1053 days ago

Yeah, I don't really see a solution. Even Stanford can't keep up with the latest AI research. Still, having LLaMA 2 is better than not having it.

link

jerrygenser 1053 days ago

You are literally using trillion documents? Or are you exaggerating?

link

treprinum 1053 days ago

chaxor above mentioned it so I quickly recalled a task I saw a super large LLM demolishing fine-tuned models on documents.

link