|
|
|
|
|
by Kutsuya
974 days ago
|
|
this is super cool! I wish there was an easy to understand and follow guide on how to make your own embedding, for llama2 for example. All I can find are various guides that already assume you know everything there is to training an embedding. I just want to make an embedding between a conversation of me and my friend and simulate talking to them. Is this a hard thing to train to begin with? If anyone knows or could help me with this, I would be very grateful! |
|
What you are asking for sounds like fine tuning an existing LLM...where the data will be tokenized but the outcomes are different? There is a lot of writeups on how people have done it. You should especially follow some of the work on Huggingface. To replicate talking to your friend though, you will need a very large dataset to train off of I would think and its unclear to me if you can just fine-tune it or you would need to train a model from scratch. So a dataset with 10s of thousands of examples and then you need to train it on a GPU.
https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehe...