Hacker News new | ask | show | jobs
by rig666 1034 days ago
I host an LLM because it's cheaper for my use case. To many people focus on how an LLM interfaces with users but I believe the best most reliable use for an LLM is for analyzing free form text and having it put that data into quantifiable fields or tagging. Things like this would have taken an interns or overseas laborers weeks to months to do can now finally be automated.
8 comments

I've had a similar thought. I want to feed LLMs (and friends) messy data from my house and let it un-mess as best it can. A big hurdle in managing home data (chat logs, emails, browser history, etc) is making use of it. i don't want to have to tag all of my data. LLMs seem really attractive for that to me.

I have this urge to toy with the idea but i also find "Prompt Engineering" to be very unattractive. It feels like something i'd have to re-tailor towards any new model i change to. Not very re-usable and difficult to make "just work".

Ya prompt engineering can be a more difficult than it looks. Especially when dealing with less intelligent models. That's why I recommend having an error checking stage were the model gives a model should be able to return a simple "True" or "Yes" when presented it's last response. This eats up more GPU time but the signal to noise ratio improves drastically.
> That's why I recommend having an error checking stage were the model gives a model should be able to return a simple "True" or "Yes" when presented it's last response.

Mind elaborating on that? Looks like a typo but i'm having difficulty knowing for sure. Thanks!

Would also be interested in learning more about this
Thank you for this hint. I have saved a history file from a remote session i did yesterday, some interesting stuff there but also a lot of clutter from all the mistakes I did and asking the free chatgpt to unclutter it for me just made it a lot easier to read and went from 133 lines to 39 lines with added comments and line spacing.

It's not perfect but it removed a lot of useless stuff like all the times i mistyped '--recurse-submodules' or the times I went into a folder only to realize it was not the correct folder for what I was trying to do.

> I want to feed LLMs (and friends) messy data from my house and let it un-mess as best it can.

What would be your goal for doing that?

To then be able to have secondary steps of lookup. I often want a search engine for "my life". Most commonly from text messages where i discussed something. In a perfect world it would be able to link context between emails, browser history, chat conversations, etc. I'd love a flexible system that could record what i have in boxes, in the fridge, etc.

Sounds a bit silly, but of course it's mostly just for fun. However on the more practical side, i do often find myself needing to dig through old text conversations trying to find that one message. Not having a flexible, deep search behind it sucks. I often find myself wanting to do the same with my browser history. Find that one website i visited, etc.

I have the thought that it would be great to make my data points more rich. Don't just tag my browser history with isolated tags, such as Programming, Rust, etc - but infer meaning from my searching. Be able to see that i'm working on ProjectX actively via CLI Git activity, and that i'm searching for Y. Be able to correlate commit Z with search Y. etcetc

It feels to me there's a ton of small, edge case utility that can be gained by dumping everything to a local server and having it link the data. But i don't want to do any of that manually.

Likewise, i've wanted to manage "Home Inventory" before - what's in boxes, etc. Managing that myself is tedious, though. LLMs seem ripe for figuring out associations - even dumb LLMs. My hope is that i eventually can start wiring things together and having the LLMs start making rich data out of messy untagged data.

Would be neat, /shrug

This would be truly great. I have been pushing back the task of itemizing all of my belongings into a spreadsheet in the event of a natural disaster/fire/theft and having an assistant that I could say, "I have this road bike I built" and have it look through my emails and gather all the components and associated costs, then add it to a spreadsheet, would be a boon to many people. Of course, having it do it automagically from my purchases would be even better.
have you heard of rewind.ai? it sounds like it might be a possible solution for what you're looking for (not affiliated with it though, and also don't have it on my Mac, so not sure how well it works in reality)
Just leaving this, there was an effort to build a truly open source one: https://github.com/dhamaniasad/cytev2
Yea, though it's not local. They claim it is, but then use ChatGPT .. which is odd.

Personally i want to build a fairly dumb system though. Ie make a system which can be useful with LLama2 13B or w/e. Something that doesn't require state of the art GPT4+.

If that means compromising on some features that's fine, but at least then it can be truly and fully local.

Exactly this. It's so fast to spin up classifiers now when it used to take weeks to get something working.

What LLMs are you using?

Absolutely, just look at the number of manual data entry jobs on Upwork. IMO one of the superpowers of LLMs is not generating text or images, it's understanding and transforming unstructured data.
What type of analysis do you do on the text? And how is the performance/cost of running vs more specialized models trained for the task?
This isn't our field but its something similar. So say some of your clients are old publications. Some have articles dating back to the 1800s. Nearly all the work is digitized but searching for something in the great categorized mess is a nightmare. As most old publications are downsizing they don't have the man power to curate there archives but are inundated with research requests nearly 24/7. As a service to help these publications maintain there image as an organized informative keeper of historical records you could do the following. 1. have an LLM make a series of tags for all the articles. 2. make a summary for all the articles to improve search results. 3. provide a service to them or up sold to their clients were a question/prompt can be ran across every article or a section of articles.

> how is the performance/cost of running vs more specialized models trained for the task. most models are GNU licensed so thats not an issue. But I imagine you meant the age old question of hosting yourself vs using openAI. Truth is as of now it currently is not foretasted to beat using one of the less intelligent models on openAI. hardware cost alone yes but Dev time is very expensive. Lucky were a small company & our CEO sees this as training. Because LLMs are so new there really isn't a large labor market for it yet. If our devs and engineers get in this early then we can beat others to market as the technology develops and new opportunities come to light. on top of having possible HIPPA, GDPR, or other security laws to follow that OpenAI has been very shooty about, we do not want be at the whim of OpenAI or another SaaS provider on a mission critical part of a vertical. They have talked about depreciating old model. As well they have had content changes in there models to placate political critics, well not realizing that this pulls the rug out from under developers that need any sense of stability from there product.

Do you have any suggestions about how to start implementing something like this in-house? I'm sitting on thousands of PDFs (that can be trivially turned into text) and it would be really useful to train an LLM on them for information retrieval.

But the dev and computing cost of this feels so huge that I'm not even sure where to start.

my first way of showcasing this was by taking a spare computer sitting around the office then writing a little python script that used and LLM to parse information out of file names that our finance team would use to label rebilling invoices. the invoices included the client, payment date, amount, late payment status, etc write in a concluded an completely non consistent file name. the little office PC had 16gb of ram so it was usable for an LLM via the CPU and I just let it run for like 2 days. I continued with my normal work and when it finished I had an intern spend 1 whole day validating just 6% of the data and found it to be 97 percent accurate. I made some obvious changes an was able to fill in that 3% gap. (later we did find a hand full of errors but over all you could consider the validation 99% accurate)

While it really resonated with my management I felt worried I wouldn't be able to replicate these kind of results on other projects.

THE ONLY REAL ADVICE I CAN GIVE ON AI PROJECTS IS . . . don't let your managements expectation of LLMs out weigh its capabilities.

I'm sure I speak for many people here when your non-tech fluent directors get together and think GPT4 is some sort of deity. GPT4 smart (or used to be at least) ill give it that, but small locally hosted 7b/13b LLMs are very limited and people for whatever reason get AI infatuation the second they finally see you show direct value in it they will lose there shit in its assumed capabilities. you got to be direct with them that no matter what dumb video they saw on Sam Altman, what your are proposing is not that. Be very clear in its possible scope because there is some idiot in our organization that will assume assume you can programmatically answer prayers. I actually had this guy from our networking team try and raise a concern about the LLM going sentient and us having a "Skynet" problem. granted this was back in march/2023 so AI histira was a little more rampant but still.

tl;dr my recommendation for your pdf project is run https://github.com/oobabooga/text-generation-webui. if your can get a 30 series GPU in your company Then run a 13B 4bit model that can pull info, assign tags, run minor analysis on your text. else find a spare 16gb machine and do the same but but over a longer time scale.

run a prompt that checks for hallucinations. "does the following text make sense? previous prompt + text if yes then keep else make intern do it.

GPT-j-7b is still one of the best models because it has indexing & categorizing at the main prosperous. other models are great but core idea behind LLMs is that its just a high level auto complete

How accurate is an LLM for this task? I was thinking of using one for analyzing free form PDF text to find a specific element, but I was worried about hallucinations.
Extractive tasks are part of where LLMs shine, and where you get the least amount of hallucination as long as you fine-tune your model.

By fine-tuning the model to extract a specific desired output from the text you give it, it learns that the output always comes from the input, and so you get less random outputs than just by prompting an instruction-tuned model (which was fine-tuned to find the answer in its weights, instead of copying it from the input).

I'm pretty ignorant on which is the best self hosted LLM for such a task or how to fine-tune it. Do you know of any resources on how to set that up?

It seems like llama2 is the biggest name on HN when it comes to self hosting but I have no idea how it actually performs.

You could just try it out if you have the hardware at home.

Grab KoboldCPP and a GGML model from TheBloke that would fit your RAM/VRAM and try it.

Make sure you follow the prompt structure for the model that you will see on TheBloke's download page for the model (very important).

KoboldCPP: https://github.com/LostRuins/koboldcpp

TheBloke: https://huggingface.co/TheBloke

I would start with a 13b or 7b model quantized to 4-bits just to get the hang of it. Some generic or story telling model.

Just make sure you follow the prompt structure that the model card lists.

KoboldCPP is very easy to use. You just drag the model file onto the executable, wait till it loads and go to the web interface.

Won't you run out of context size though? The older models only went up to 2000 tokens, newer ones up to 16k.

Ie how do you feed the LLM the text along with your question without it forgetting most of the text? I assume the text you want to feed it is longer than 16,000 words.

For my use-case the PDFs are only a few pages long generally, so I think the 16k word limit would be well within my needs. I'm trying to find a list of device names from an FDA 510k summary (for medical device clearances). Currently I'm doing this manually and it's quite time consuming. I have around 15,000 PDFs to get through manually, but it's pretty slow work.
I assume asking for "quantifiable fields" is akin to requesting "return the data in JSON format", yes?

How do you do the tagging bits, though?

Cheaper than GPT-3? Can you give a comparison of the costs?
GPt-3 is very expensive if you use it frequently compared to just running in a desktop machine you already have. Of course, if you’re buying new hardware just to run a model for yourself locally, that’s a different cost analysis, but for me I had other reasons to have a decent gpu.

If you have a product that uses an LLM and can get away with one of the open source ones, it’s probably cheaper (and def lower latency/response time) to host yourself too somewhere like azure or aws.

May I ask what your stack is?
Nothing complex actually but it is a little messy and cobbled together

I run oobabooga's API on docker with a 13B 4bit quantized model. https://github.com/oobabooga/text-generation-webui

We use GTX 3060s because there the best bang for there buck in terms of VRAM. Our current set up is mostly proof of concept or used of inner office work well we work on scaling to get a fluid handler built so it can distribute workloads around the multiple GPUs.

Lucky the crypto mining community laid the ground work for some of the hardware.

> Lucky the crypto mining community laid the ground work for some of the hardware.

Are you using riser cards to connect the GPU's to the motherboards then? I thought about trying a setup like yours, but was worried that the riser card interfaces would create a bottleneck. Ideally I'd like to run some cards in a separate box and connect them to my main computer through some kind of cable interface, but I'm not sure if that's possible without seriously affecting performance.