Hacker News new | ask | show | jobs
by visarga 1045 days ago
In my tests LLaMa2-13B is useable for information extraction tasks and LLaMA2-70B is almost as good as GPT-4 (for IE). These models are the real thing. We can fine-tune LLaMAs, unlike OpenAI's models. Now we can have privacy, control and lower prices. We can introduce guidance, KV caching and other tricks to improve the models.

The enthusiasm around it reminds me of JavaScript framework wars of 10 years ago - tons of people innovating and debating approaches, lots of projects popping up, so much energy!

7 comments

> “The enthusiasm around it reminds me of JavaScript framework wars of 10 years ago”

Hmm. If LLMs turned out like JS frameworks, that would mean that in ten years people will be saying:

“Maybe we don’t really need all this expensive ceremony, honestly this could be done with vanilla if/else heuristics…?”

I can imagine a bloated world where 500B param models are used for tasks where 7B param modes perform adequately.

At that time, there could be complaints on hacker news about messaging apps with autocomplete models that take up gigabytes.

I could see a world where people turn to an LLM for a task where the hand-rolled solution could be a simple state machine or some nested switch statements.

The irony would be that the LLM could write you that code, but if you don’t know to ask…

The real irony would be if the AI uses an LLM to write the code the second time it sees the same request repeated, deploys it to effortlessly deal with all future requests, and goes back to playing Quake while collecting the full paycheck.
A key difference is that the impact of these enormous models is obscured away. I don't feel the impact of having to run GPT-3.5-turbo at scale because it's just an API call away, and it's even more reliable now.

The main critiques outside of data privacy I've read are related to energy consumption, but even then, it's...not compelling? I read an article[0] that estimated the training of ChatGPT (3.5) to emit as much C02 as more than 3 round-trip flights between SF and NYC. That's not good! But also, really highlights that if we're to reduce emissions, there's clearly bigger targets than the largest ML models in the world.

[0]: https://themarkup.org/news/2023/07/06/ai-is-hurting-the-clim...

"How can I sum this column of numbers?"

"IDK, throw it at the LLM"

I do stuff like this sometimes when I have some csv or something and need it in JSON.

Could easily be done with 1 line of bash or js or python or whatever but... it's easier to just let the LLM do it for me :|

Or you could ask the AI for a script to convert it (whose resultant JSON would be free of hallucinations) and ask the AI to run the code, even, since I believe ChatGPT now has that ability, because I've definitely noticed that ChatGPT will sometimes mess up the data when converting from one format to another.
That would be awesome, but we've tried for decades and haven't gotten there with basic if/else. I do think it's pretty plausible that if you combine some very slimmed down models with strong heuristics you could get far though. At the moment I think expense doesn't really matter -- an hour of a knowledge worker's time is worth 416,000 tokens of GPT-4, the most expensive model out there. For llama-2 it's even less time. Unless you're processing truly epic numbers of tokens, by far the most important is whether we can use these things for real.
> "That would be awesome, but we've tried for decades and haven't gotten there with basic if/else."

Oh, I know — I was trying to throw some shade on the state of JS frameworks rather than LLMs. With the pendulum now swinging back to vanilla DOM manipulation, it feels like the enormous effort spent on devising ways to wrap web UIs in endless variations of abstractions might have been somewhat of a waste.

Well these models at least tell in name what they do eg llama-2-70b-Guanaco-QLoRA-fp16.
Subtle :)
Any information about how long it takes to achieve a fine tune comparable to OpenAI's current fine tunes of ada models? On consumer hardware vs cloud? OpenAI's fine tune times are on the order of hours for tens of thousands of samples, but expensive. Any information on the effort and time involved in fine tuning compared to OpenAI's current process would be appreciated.
> information extraction task

I do that with orca-mini-3b in ggml format and it's pretty good at it, at twice the speed. Of all the LLMs I've tried, this one gave me the best results. It just requires a properly written prompt.

Could you elaborate on the prompting strategies you have used that are more effective?
> The enthusiasm around it reminds me of Javascript wars 10 years ago... so much energy!

I kind of have the same feeling as well. With all this energy it's really hard to keep up with all new ideas, implementations, frameworks and services.

Really excited for what this will bring us the next coming years

> With all this energy it's really hard to keep up with all new ideas, implementations, frameworks and services.

The majority of them are mostly irrelevant. You just need to figure out which.

Can you share an example of information extraction prompts? I specifically am interested in using LLMs as basically general purpose web scrapers that can take html and extract matching data per a prompt into structured json schema…do you think this is possible with llama 2?
> ... and lower prices.

Not sure about this. atm, the cost of any cloud GPU (spot or not) far exceeds the cost of OpenAI's API. I'd be glad to be proven wrong because I, too, want to run L2 (the 70b model).

Also, buying a GPU, even 4090, is not feasible for most people. And it's not just about GPU—you'd have to build a PC for it to work, and there's the hidden maintenance cost of running desktop Linux (to use GPTQ for instance). It's not surprising that most users prefer someone else (OpenAI) to do it for them.

I have to admit, I wouldn't have imagined even a few months ago that I'd be reading this comment.

Sure, you can run something comparable to OpenAI's flagship product at home, but it's moderately expensive and slightly inconvenient so people will still pay for the convenience.

It looks like it will always be a war like Android VS iOS only now it's with AI models.