Thank you, but what do you use the llm for? Writing new documents based on previous ones? Tagging/categorization/summarization/lookup? RAG? Extracting structured data from them?
Me personally, i’m using paperless-ngx to manage documents.
i use ollama to generate a document title, with 8 words or less. I then go through and make any manual edits at my leisure. Saves me time which i appreciate!
Paperless-ngx already does a pretty good job auto-tagging, i think it uses some built in classifiers? not 100% sure.
No one cares about your 'secrets' as much as you think. They're only potentially valuable if you're doing unpatented research or they can tie them back to you as an individual. The rest is paranoia.
Having said that, I'm paranoid too. But if I wasn't they'd have got me by now.
step back for a bit. some people actually work with sensitive documents as part of their JOB. Like accountants, lawyers, people in medical industry, etc.
Sending a document with a social security number to OpenAI is just a dumb idea. As an example.