Best framework to create synthetic data for finetuning small models?

Y	Hacker News new \| ask \| show \| jobs

1 points by DanyWin 993 days ago

Hi everyone,

With the different data points, such as phi-1.5 performance being as good as 7b models on some tasks, it seems to be plausible that small models can be quite capable on specific tasks.

I am working on BlindChat, an open-source and private solution to run small LLMs on your browser and I am interested in fine-tuning a phi-1.5 on some domain specific data.

I am thinking of having an approach similar to the researchers of the phi paper, which is creating a high quality dataset using GPT3.5 / GPT4.

Do you know good open-source frameworks that make it easy to create a high quality data for a specific task using an existing large model, like GPT3.5/4 or Llama 2 70b?