Hacker News new | ask | show | jobs
by unshavedyak 1038 days ago
I've had a similar thought. I want to feed LLMs (and friends) messy data from my house and let it un-mess as best it can. A big hurdle in managing home data (chat logs, emails, browser history, etc) is making use of it. i don't want to have to tag all of my data. LLMs seem really attractive for that to me.

I have this urge to toy with the idea but i also find "Prompt Engineering" to be very unattractive. It feels like something i'd have to re-tailor towards any new model i change to. Not very re-usable and difficult to make "just work".

3 comments

Ya prompt engineering can be a more difficult than it looks. Especially when dealing with less intelligent models. That's why I recommend having an error checking stage were the model gives a model should be able to return a simple "True" or "Yes" when presented it's last response. This eats up more GPU time but the signal to noise ratio improves drastically.
> That's why I recommend having an error checking stage were the model gives a model should be able to return a simple "True" or "Yes" when presented it's last response.

Mind elaborating on that? Looks like a typo but i'm having difficulty knowing for sure. Thanks!

Would also be interested in learning more about this
Thank you for this hint. I have saved a history file from a remote session i did yesterday, some interesting stuff there but also a lot of clutter from all the mistakes I did and asking the free chatgpt to unclutter it for me just made it a lot easier to read and went from 133 lines to 39 lines with added comments and line spacing.

It's not perfect but it removed a lot of useless stuff like all the times i mistyped '--recurse-submodules' or the times I went into a folder only to realize it was not the correct folder for what I was trying to do.

> I want to feed LLMs (and friends) messy data from my house and let it un-mess as best it can.

What would be your goal for doing that?

To then be able to have secondary steps of lookup. I often want a search engine for "my life". Most commonly from text messages where i discussed something. In a perfect world it would be able to link context between emails, browser history, chat conversations, etc. I'd love a flexible system that could record what i have in boxes, in the fridge, etc.

Sounds a bit silly, but of course it's mostly just for fun. However on the more practical side, i do often find myself needing to dig through old text conversations trying to find that one message. Not having a flexible, deep search behind it sucks. I often find myself wanting to do the same with my browser history. Find that one website i visited, etc.

I have the thought that it would be great to make my data points more rich. Don't just tag my browser history with isolated tags, such as Programming, Rust, etc - but infer meaning from my searching. Be able to see that i'm working on ProjectX actively via CLI Git activity, and that i'm searching for Y. Be able to correlate commit Z with search Y. etcetc

It feels to me there's a ton of small, edge case utility that can be gained by dumping everything to a local server and having it link the data. But i don't want to do any of that manually.

Likewise, i've wanted to manage "Home Inventory" before - what's in boxes, etc. Managing that myself is tedious, though. LLMs seem ripe for figuring out associations - even dumb LLMs. My hope is that i eventually can start wiring things together and having the LLMs start making rich data out of messy untagged data.

Would be neat, /shrug

This would be truly great. I have been pushing back the task of itemizing all of my belongings into a spreadsheet in the event of a natural disaster/fire/theft and having an assistant that I could say, "I have this road bike I built" and have it look through my emails and gather all the components and associated costs, then add it to a spreadsheet, would be a boon to many people. Of course, having it do it automagically from my purchases would be even better.
have you heard of rewind.ai? it sounds like it might be a possible solution for what you're looking for (not affiliated with it though, and also don't have it on my Mac, so not sure how well it works in reality)
Just leaving this, there was an effort to build a truly open source one: https://github.com/dhamaniasad/cytev2
Yea, though it's not local. They claim it is, but then use ChatGPT .. which is odd.

Personally i want to build a fairly dumb system though. Ie make a system which can be useful with LLama2 13B or w/e. Something that doesn't require state of the art GPT4+.

If that means compromising on some features that's fine, but at least then it can be truly and fully local.