Hacker News new | ask | show | jobs
by mholubowski 1022 days ago
Hey, I have a genuine question:

What is the point of a new model that isn’t better than the best possible model (example: OpenAI GPT-4)?

What’s the point in having a smaller model? Who cares?

—-

This is a real, genuine question that I don’t have a clear answer to. Excuse my ignorance, plz enlighten your boi.

7 comments

GPT4 is expensive to run, even more expensive to finetune, and for all practical purposes can’t be run offline (because the model is too big to run outside of a huge data center). Evaluation latency is also an issue for many usecases, and you have to share your query with openai, so you can’t run sensitive queries. The output is also controlled/censored by OpenAI.

Here’s a few usecases that I wouldn’t want to use OpenAI/GPT for

- Advanced autocomplete for texting and private communications

- Querying sensitive document databases like emails

- Traveling in low connectivity areas

- Politically incorrect usecases (generating erotic content for example)

List kinda goes on and on

> GPT4 is expensive to run, even more expensive to finetune

GPT4 can't even be finetuned at the moment (though I expect that to change).

It can be finetuned. Bing is a finetuned GPT-4.
I'd assume that that "can't" there is about what's publicly available, not what's technically possible.
It’s obviously technically feasible, it’s just not commercially offered…
IMO, the main reasons are (but are definitely not limited to):

- You can fine tune these models for very specific tasks, which GPT-4 might not be as good at.

- Open source models are free. You can use them as much as you want without worrying about a $xx,xxx bill at the end of the month which makes tinkering with them easier.

- Smaller models like this can run on consumer hardware, even phones, and can run offline.

- Privacy and not having to abide by a third parties terms. You don't have to deal with "As a large language model...", especially with uncensored models.

- Tools like jsonformer https://github.com/1rgs/jsonformer are not possible with OpenAIs API.

- It's also just really cool, let's be honest.

1) people can run a 1.6B model for free on consumer hardware

2) any model that's run on computational resources you are owning or leasing will have more privacy than an explicit cloud offering. running completely on your own local hardware will be private. this means you don't have to think twice about asking the LLM about the proprietary code or information you are working on.

3) smaller models gain the performance improvements from all the other improvements in interpreters and quantizing, allowing for even more consumer friendly offline use

4) oh yeah, offline use. could expand use cases to having LLM's baked into operating systems directly, including leading phones

5) showing what's possible, pushing towards the benchmarks of the best possible model while using less computational resources. this also makes the hosts of the best possible model realize that they could either A) be using less computational resources and increasing the bandwidth for their users B) further improve their own model because of competition. Basically if ChatGPT 4 was using similar improvements in technology across all areas of reasoning/whatever, there never would have been a rate limit on ChatGPT 4.

6) more demand for other computational resources. Nvidia is backordered till maybe Q2 2024 right now. If people realize AMD or even their ARM chips can offer same performance with the right combination of hardware and software, It alleviates pressure on other ventures that want computation power.

The other answers are great, but to add more

- You can run it behind an air-gap, where your systems are disconnected from the world.

- You can run it on the edge with low or no internet connectivity

- You do not need to worry about breaching geographic data restrictions, e.g.: medical data from Country X cannot leave Country X

Your questions sounds like why do we need Alpine linux when we have Ubuntu? Why do we have SQLite when we have Postgres?

I think the point is to reach a baseline of something being super lightweight yet still useful that could be production for a number of use cases.

You can use it 100% locally, and it doesn't cost anything.
Imagine being on Mars and running on a small PV panel and needing to code a bugfix in your oxygen supply system through the wire with Microsoft Earth(tm) or something