Other than portability and privacy, are there any benefits to running a local model with a 4090, versus running the same model on-demand on a cloud service with the same or more powerful card?
There are always going to be pros and cons. That's why solutions like managed databases are reality. From an expert perspective it seems like there's more to lose but from the perspective of a company with employee turn over, possible data loss, security etc. the benefits start to far outweigh the costs.
This reasoning can mostly be applied here. If you want to learn about and pull the LLM apart. Perhaps fine-tune and tinker then 100% go ahead running locally. You however won't be able to scale this up easily for a consumer base and the electricity use and heat output starts to become a problem.
At some point it's more beneficial to pay the provider for inference, this includes upkeep, latest models, faster generation, stability, hosting etc.
Pros and cons! Choice is important and Meta is doing the right thing by the AI community and tech community in general by being realistic with these programs. The ecosystem is giving back by being able to access these high quality models.
Meta seem to be thinking 10 years ahead where anyone can run these models at the edge.
Perhaps it's not about where the model is hosted but what can be built on it.
Meta have added Llama3 across the board on all their apps.
Chat is fun but in the wrong context it's useless. However the training data return on millions of users is something interesting to pay attention to.... Llama4 might be a significant jump!
Increased model intelligence and innovative applications of language technology will be where the real value appears. Open-sourcing and allowing public amplification of abilities and enhancements is a very smart move.
The marketing department is also commendable. What happened to Grok? LLMs are everywhere - we're running them on home computers, that's where we should be pondering the next moves.
Eventually these models will need to run on mobile devices, so commodity desktop GPUs are a good stepping stone. Alexnet / Caffe got traction because they could be run on commodity desktop machines. Then a few years later phones could run object detection etc.
This reasoning can mostly be applied here. If you want to learn about and pull the LLM apart. Perhaps fine-tune and tinker then 100% go ahead running locally. You however won't be able to scale this up easily for a consumer base and the electricity use and heat output starts to become a problem.
At some point it's more beneficial to pay the provider for inference, this includes upkeep, latest models, faster generation, stability, hosting etc.
Pros and cons! Choice is important and Meta is doing the right thing by the AI community and tech community in general by being realistic with these programs. The ecosystem is giving back by being able to access these high quality models.