| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by JimmyRuska 1116 days ago

It's difficult to compete. A small business might answer 10,000 requests to their chat bot. The options are

- Pay openai less than $50mo

- Manage cloud gpus, hire ml engineers > $1000/mo

- Buy a local 4090 and put it under someone's desk, $no reliability +$1500 fixed

Any larger business will need scalability and you still can't compete with openai pricing.

Maybe one of you startup inclined people can make an openllama startup that charges by request and allows for finetuning, vector storage

8 comments

xigency 1116 days ago

I’ve got an expensive GPU at home I’m not even using because there aren’t that many things to do with it. Give me more local options.

link

JimmyRuska 1116 days ago

https://github.com/oobabooga/text-generation-webui

https://github.com/bentoml/OpenLLM

https://www.reddit.com/r/LocalLLaMA/top/?t=month

link

ignoramous 1116 days ago

And

https://github.com/go-skynet/LocalAI

https://github.com/juliooa/secondbrain

https://github.com/louisgv/local.ai

fragmede 1116 days ago

Let other people pay you to run their stuff on your hardware with Vast.ai.

link

PeterStuer 1116 days ago

Even if you are not into coding there are many good AI tools that run local. Two very easy examples:

I've had great fun with the " Easiest 1-click way to install and use Stable Diffusion on your computer."

https://github.com/easydiffusion/easydiffusion

And while Whisper is OpenAI, it is trivial to use locally and extremely usefull

https://github.com/chidiwilliams/buzz

link

rmbyrro 1116 days ago

It depends heavily on the use case, not org size. I consult for a ~70 people org that needs to process ~1M tokens per day. That costs $30K per day on OpenAI ChatGPT API. I'm sure this is not an extraordinary case.

link

serjester 1116 days ago

Each person in the org needs 1M GPT-4 token and semantic search can’t be used to trim queries? I would be super curious to know more about this use case.

link

rmbyrro 1115 days ago

The data doesn't scale according to employee size. If they manage to cut the headcount in half, they'd still need to process the same amount of info.

The use case is based on public information on the internet. News articles, PRs, social media posts, etc.

LLMs are used to extract info from text in a structured format. It used to have several classification and NLP models to do the job, but now a single LLM can do it faster and with better accuracy.

link

whoiscroberts 1116 days ago

I have a 4080, let’s do a startup. #cancode #hashomelab

link

lalamon 1114 days ago

> Maybe one of you startup inclined people can make an openllama startup that charges by request

I'm currently building www.lalamon.us specifically to provide a fully hosted open source model experience. One slight difference is that I'm providing a private chat instance for each user, so charging based on hours of active chat usage seemed to make more sense. Per-request charging seems more unpredictable for users, but I'd be interested in hearing the case either way.

Feel free to reach out with more questions if interested; my email is in my profile.

link

tikkun 1116 days ago

Doing this. We soft launched yesterday with a paid Falcon-40B playground - 3 models for now Falcon 40b instruct, uncensored, and base. Adding API and per token pricing this week.

https://api.llm-utils.org/

And more models coming soon.

Vector storage isn’t on the roadmap (what stops using a separate vector store from working well? Could add to roadmap but want to add understand more first), and we could add fine tuning if it’s a common request.

link

JimmyRuska 1116 days ago

Lots of people using LLMs to make chat bots from their existing datasets: customer service troubleshooting, FAQs, billing, scheduling. Being able to upload their own pdfs, spreadsheets, docx, crawl their home page, lets the chat bot become personalized to their use case. While you could locally query your own vectordb before prompting, people buy paid service so they won't have to manage any of the technical details.

If people can drag and drop some files from their nas, you parse them with apache tika or similar https://tika.apache.org/ , they can start using personalized branded bots. It also lets you do things like refusing to answer, if the vector database returns nothing and the use case requires a specific answer from the docs only (not the llm to make stuff up).

link

tikkun 1116 days ago

For those use cases the “custom ChatGPT” tools I linked here might be better https://news.ycombinator.com/item?id=36649777

link

jasfi 1114 days ago

Shouldn't you use a .com tld?

Will your pricing be competitive with Replicate?

link

tartakovsky 1116 days ago

Not secure... NET::ERR_CERT_COMMON_NAME_INVALID Subject: *.safezone.mcafee.com

Issuer: McAfee OV SSL CA 2

Expires on: Aug 3, 2023

Current date: Jul 8, 2023

PEM encoded chain: -----BEGIN CERTIFICATE----- MIIGfzCCBWegAwIBAgIQKt9VNrFtaozA1bILX1OcfzANBgkqhkiG9w0BAQsFADBk MQswCQYDVQQGEwJVUzELMAkGA1UECBMCQ0

link

rolisz 1116 days ago

FastChat-T5 can work for such a use case and it runs on (beefy) CPUs. With a 700$/month instance, it can do 4 conversations simultaneously, without needing GPUs.

The instant a company has sensitive data, this becomes very viable.

link

swader999 1116 days ago

Wait until winter time and heat your house!

link

quickthrower2 1116 days ago

Good double use of that low entropy energy. Heat pumps excepted.

link

srcthr 1116 days ago

People don't scale. This is personal. Only 3 is a good choice for people in a site with the name hacker something.

link