| HN Mirror

Thanks for the feedback! Yes, it is similar to ludwig but we do think that our toolkit is a more lightweight solution to fine-tuning and ablation studies. In most cases, finding the right LLM with the right config on your dataset requires multiple runs (grid search). Our toolkit offers this capability via one yaml file.

As for the test coverage, right now, the toolkit includes property-based unit tests. For instance, for an LLM fine-tuned on summarization, a property-test will evaluate if the summarized text is smaller in length compared to the actual input text.

Similar to the above test, we have a handful of property-based tests. Of course, the list is not exhaustive at this time. As more progress is being made on the testing side, we aim to distill the most relevant tests depending on use-cases.

Hope this helps.