You can chat with the models directly in the same way you can chat with GPT 3.5.
Many of the opensource tools that run these models let you also edit the system prompt, which lets you tweak their personality.
The more advanced tools let you train them, but most of the time, people are downloading pre-existing models and using them directly.
If you are training models, it depends what you are doing. Finetuning an existing pre-trained model requires lots of examples but you can often do a lot with, say, 1000 examples in a dataset.
If you are training a large model completely from scratch, then, yes, you need tons of data and very few people are doing that on their local machines.
+1 on these questions.
Can I run a local llm that will, for example
- visit specified URLs and collect tabular data into csv format?
- ingest a series of documents on a topic and answer questions about it
- ingest all my PDF/MD/Word docs and answer questions about them?
Some of the tools offer a path to doing tool use (fetching URLs and doing things with them) or RAG (searching your documents). I think Oobabooga https://github.com/oobabooga/text-generation-webui offers the latter through plugins.
Many of the opensource tools that run these models let you also edit the system prompt, which lets you tweak their personality.
The more advanced tools let you train them, but most of the time, people are downloading pre-existing models and using them directly.
If you are training models, it depends what you are doing. Finetuning an existing pre-trained model requires lots of examples but you can often do a lot with, say, 1000 examples in a dataset.
If you are training a large model completely from scratch, then, yes, you need tons of data and very few people are doing that on their local machines.