|
|
|
|
|
by devjab
1098 days ago
|
|
This is very interesting to me. We spent a significant time “labelling” data when I was in the public sector digitalisation. Basically what was done, was to do the LLM part manually and then have engines like this on top of it. Having used ChatGPT to write JSDoc documentation for a while now, and been very impressed with how good it is when it understands your code through good use of naming conventions, I’m fairly certain it’ll be the future of “librarian” styled labelling of case files. But the key issue is going to be privacy. I’m not big on LLM, so I’m sorry if this is obvious, but can I use something like this without sending my data outside my own organisation? |
|
https://github.com/ggerganov/llama.cpp
You need to be careful about liscencing - some of these models its a legal grey area whether you can use them for commercial projects.
The 'best' models require some quite large hardware to run, but a popular compression methodology at the moment is 'quantization', using lower precision model weights. I find it a bit hard to evaluate which open source models are better than others, and how they are impacted by quantization.
You can also use the Open-AI API. They don't use the data. They store for 30 days, which they use for fraud-monitoring, and then delete. It doesn't seem hugely different to using something like Slack/Google doc/AWS.
I think some people imagine their data will end up in the knowledge-base of GPT-5 if they use Open-AI products, but this would be a clear breach of TOS.
https://openai.com/policies/api-data-usage-policies