Hacker News new | ask | show | jobs
by JimmyRuska 1069 days ago
It's difficult to compete. A small business might answer 10,000 requests to their chat bot. The options are

- Pay openai less than $50mo

- Manage cloud gpus, hire ml engineers > $1000/mo

- Buy a local 4090 and put it under someone's desk, $no reliability +$1500 fixed

Any larger business will need scalability and you still can't compete with openai pricing.

Maybe one of you startup inclined people can make an openllama startup that charges by request and allows for finetuning, vector storage

8 comments

I’ve got an expensive GPU at home I’m not even using because there aren’t that many things to do with it. Give me more local options.
Let other people pay you to run their stuff on your hardware with Vast.ai.
Even if you are not into coding there are many good AI tools that run local. Two very easy examples:

I've had great fun with the " Easiest 1-click way to install and use Stable Diffusion on your computer."

https://github.com/easydiffusion/easydiffusion

And while Whisper is OpenAI, it is trivial to use locally and extremely usefull

https://github.com/chidiwilliams/buzz

It depends heavily on the use case, not org size. I consult for a ~70 people org that needs to process ~1M tokens per day. That costs $30K per day on OpenAI ChatGPT API. I'm sure this is not an extraordinary case.
Each person in the org needs 1M GPT-4 token and semantic search can’t be used to trim queries? I would be super curious to know more about this use case.
The data doesn't scale according to employee size. If they manage to cut the headcount in half, they'd still need to process the same amount of info.

The use case is based on public information on the internet. News articles, PRs, social media posts, etc.

LLMs are used to extract info from text in a structured format. It used to have several classification and NLP models to do the job, but now a single LLM can do it faster and with better accuracy.

I have a 4080, let’s do a startup. #cancode #hashomelab
> Maybe one of you startup inclined people can make an openllama startup that charges by request

I'm currently building www.lalamon.us specifically to provide a fully hosted open source model experience. One slight difference is that I'm providing a private chat instance for each user, so charging based on hours of active chat usage seemed to make more sense. Per-request charging seems more unpredictable for users, but I'd be interested in hearing the case either way.

Feel free to reach out with more questions if interested; my email is in my profile.

Doing this. We soft launched yesterday with a paid Falcon-40B playground - 3 models for now Falcon 40b instruct, uncensored, and base. Adding API and per token pricing this week.

https://api.llm-utils.org/

And more models coming soon.

Vector storage isn’t on the roadmap (what stops using a separate vector store from working well? Could add to roadmap but want to add understand more first), and we could add fine tuning if it’s a common request.

Lots of people using LLMs to make chat bots from their existing datasets: customer service troubleshooting, FAQs, billing, scheduling. Being able to upload their own pdfs, spreadsheets, docx, crawl their home page, lets the chat bot become personalized to their use case. While you could locally query your own vectordb before prompting, people buy paid service so they won't have to manage any of the technical details.

If people can drag and drop some files from their nas, you parse them with apache tika or similar https://tika.apache.org/ , they can start using personalized branded bots. It also lets you do things like refusing to answer, if the vector database returns nothing and the use case requires a specific answer from the docs only (not the llm to make stuff up).

For those use cases the “custom ChatGPT” tools I linked here might be better https://news.ycombinator.com/item?id=36649777
Shouldn't you use a .com tld?

Will your pricing be competitive with Replicate?

Not secure... NET::ERR_CERT_COMMON_NAME_INVALID Subject: *.safezone.mcafee.com

Issuer: McAfee OV SSL CA 2

Expires on: Aug 3, 2023

Current date: Jul 8, 2023

PEM encoded chain: -----BEGIN CERTIFICATE----- MIIGfzCCBWegAwIBAgIQKt9VNrFtaozA1bILX1OcfzANBgkqhkiG9w0BAQsFADBk MQswCQYDVQQGEwJVUzELMAkGA1UECBMCQ0

FastChat-T5 can work for such a use case and it runs on (beefy) CPUs. With a 700$/month instance, it can do 4 conversations simultaneously, without needing GPUs.

The instant a company has sensitive data, this becomes very viable.

Wait until winter time and heat your house!
Good double use of that low entropy energy. Heat pumps excepted.
People don't scale. This is personal. Only 3 is a good choice for people in a site with the name hacker something.