| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by decodebytes 167 days ago

DeepFabric - Generate High-Quality Synthetics, Fine-Tune, Measure, and Evaluate models in a Single Pipeline

Recently used the project to train a 4B model to outperform Claude Sonnet 4.5 and Gemini Pro 2.5 at Tool Calling. Colab here to run a free T4 GPU:

https://colab.research.google.com/drive/1EG1V40v5xkJKLf6Ra6W...

What sets DeepFabric apart from other dataset generation tools is its ability to ensure high diversity yet domain-anchored relevance through unique topic graph generation algorithms. This guides sample creation to cover all necessary subtopics while avoiding redundancy, which is where other tools often fall short, resulting in model overfit.

Constrained decoding and response validation, along with real tool executions within isolated webassembly environments, ensure that generated samples strictly adhere to structured schema, variable constraints, and execution correctness, ensuring datasets have exact syntax and structure for use in model training pipelines. Tool definitions can be directly imported from MCP server schemas and then mocked, or rans as real life tool functions. Using real tools means the model has to adapt and correct when it makes the wrong choice or hallucinationates which makes for much better training data.

Once your dataset is generated, it can be automatically uploaded to Hugging Face and directly imported into popular training frameworks like TRL, Unsloth, and Axolotl.

Post-training, DeepFabric's built-in evaluation engine assesses model performance, whereby models prove their capabilities on unseen tasks derived from training splits—covering evaluation-only questions, answers, and tool traces.

https://github.com/always-further/deepfabric